public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Systemtap do_filp_open failure on a few linux packages
@ 2013-06-15 21:52 Henrik /KaarPoSoft
  2013-06-17 18:16 ` David Smith
  0 siblings, 1 reply; 3+ messages in thread
From: Henrik /KaarPoSoft @ 2013-06-15 21:52 UTC (permalink / raw)
  To: systemtap

Dear all,

I have experienced a very strange issue related to systemtap.
Any insights or help you might be able to provide to help me debug this
further would be most appreciated.

I am developing a linux distribution called KaarPux:
http://kaarpux.kaarposoft.dk/

Using a few scripts, some 600+ linux packages are build and installed.
Generally, this works like a charm.

In order to automatically collect package dependencies, I have created
a small systemtap script to show files opened for reading:
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/chroot_scripts/kx_open.stp

This script is basically a probe on kernel.function("do_filp_open").return

The script is compiled with
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/chroot_scripts/install_kx_open_stp.sh

The script is executed with the functions in
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/shinc/linux_functions.shinc

So, basically a "staprun -o $PIPE -c script_to_build_package"
into a $PIPE created previously

If I try to build all the 600+ packages with this probe enabled,
it ALMOST works.

For most of the 600+ packages, building is successfull, and the probe
returns what seems to be reasonable results.

However, for a few packages, building fails:
- firefox
- thunderbird
- libreoffice
- ghc-binary
- ghc

I am a bit puzzeled.
If I have made some stupid beginners mistake, I would have expected all,
most, or at least a significant number of package builds to fail.
But only those 5 out of 600+ fails...

I have experienced similar problems for the last 6 to 12 months with
different kernel versions, systemtap versions, qemu versions, and 
KaarPux versions.
So it does not seem to be a glitch with the current version combination.

BTW, I also experienced similar problems with an earlier script:
http://sourceforge.net/p/kaarpux/code/ci/e80f14f67fc7688a4d85661befb2b96a565b206a/tree/master/chroot_scripts/kx_open.stp

I never bothered to debug further, but now I have tried to dig further...

Currently I have:
linux: 3.9.3
systemtap: 2.2.1
firefox: 21.0
thunderbird: 17.0.6
ghc-binary: 7.4.1
ghc: 7.4.1

Host: i7-3970X on P9X79 WS
Virtual Machine: qemu kvm 1.5.0

When building firefox with and without systemtap,
I get 36000+ identical lines in the log (except for some build identifiers),
then with systemtap:

---------- [BEGIN] ----------

Executing: c++ -o plugin-container -Wall -Wpointer-arith 
-Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body 
-Wno-invalid-offsetof -Wcast-align -fno-exceptions -fno-strict-aliasing 
-fno-rtti -ffunction-sections -fdata-sections -fno-exceptions 
-std=gnu++0x -pthread -pipe -DNDEBUG -DTRIMMED -g -Os -freorder-blocks 
-fomit-frame-pointer 
/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/ipc/app/tmpgyKSzm.list 
-lpthread -Wl,-z,noexecstack -Wl,--build-id 
-Wl,-rpath-link,/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/dist/bin 
-Wl,-rpath-link,/opt/kaarpux/firefox-21.0/lib -L../../dist/bin 
-L../../dist/lib -ldl 
-L/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/dist/bin 
-lxpcom -lmozalloc -lxul -L//lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl 
-Wl,--whole-archive ../../dist/lib/libmozglue.a 
../../dist/lib/libmemory.a -Wl,--no-whole-archive -rdynamic -ldl
/home/kaarpux/kaarpux/linux/build/opt/firefox-21.0/mozilla-release/obj-x86_64-unknown-linux-gnu/ipc/app/tmpgyKSzm.list:
     INPUT("MozillaRuntimeMain.o")

/bin/ld: warning: libhunspell-1.3.so.0, needed by 
../../dist/bin/libxul.so, not found (try using -rpath or -rpath-link)
../../dist/bin/libxul.so: undefined reference to `Hunspell::spell(char 
const*, int*, char**)'
../../dist/bin/libxul.so: undefined reference to 
`Hunspell::Hunspell(char const*, char const*, char const*)'
../../dist/bin/libxul.so: undefined reference to 
`Hunspell::suggest(char***, char const*)'
../../dist/bin/libxul.so: undefined reference to 
`Hunspell::get_dic_encoding()'
../../dist/bin/libxul.so: undefined reference to `Hunspell::~Hunspell()'
collect2: error: ld returned 1 exit status

---------- [END] ----------

But libhunspell-1.3.so.0 IS indeed there.
If I retry building WITH systemtap, I get the same result again and again.
Then if I rebuild WITHOUT systemtap, everything is fine.

For thunderbird, the experience is simlar.

For ghc-binary I get

---------- [BEGIN] ----------

configure GHC_BINARY

checking for path to top of build tree... 
utils/ghc-pwd/dist-install/build/tmp/ghc-pwd: error while loading shared 
libraries: libgmp.so.3: cannot open shared object file: No such file or 
directory
configure: error: cannot determine current directory
Warning: child process exited with status 1

---------- [END] ----------

I was thinking this might have something to do with symbolic links, as
http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/packages/g/ghc-binary.yaml
creates two symlinks before configure.

However, again:
If I retry building WITH systemtap, I get the same result again and again.
Then if I rebuild WITHOUT systemtap, everything is fine.

(and there must be many packages with double symlinks anyway...)

For ghc I get

---------- [BEGIN] ----------

configure: WARNING: unrecognized options: --disable-dependency-tracking
checking for gfind... no
checking for find... /bin/find
checking for sort... /bin/sort
checking version of ghc... unknown
configure: error: Cannot determine the version of 
/home/kaarpux/kaarpux/linux/build/opt/ghc-binary-7.4.1/bin/ghc.  Is it 
really GHC?
Warning: child process exited with status 1

---------- [END] ----------

And again:
If I retry building WITH systemtap, I get the same result again and again.
Then if I rebuild WITHOUT systemtap, everything is fine.


So, now I am stuck.

I could understand that the output of my probe would not be as I expeced.
Fine.
But how could a simple probe like this make building a package fail ???

Any input, help or comments would be most appreciated.

/Henrik

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Systemtap do_filp_open failure on a few linux packages
  2013-06-15 21:52 Systemtap do_filp_open failure on a few linux packages Henrik /KaarPoSoft
@ 2013-06-17 18:16 ` David Smith
  2013-06-18 20:20   ` Henrik /KaarPoSoft
  0 siblings, 1 reply; 3+ messages in thread
From: David Smith @ 2013-06-17 18:16 UTC (permalink / raw)
  To: Henrik /KaarPoSoft; +Cc: systemtap

On 06/15/2013 04:52 PM, Henrik /KaarPoSoft wrote:
> Dear all,
> 
> I have experienced a very strange issue related to systemtap.
> Any insights or help you might be able to provide to help me debug this
> further would be most appreciated.
> 
> I am developing a linux distribution called KaarPux:
> http://kaarpux.kaarposoft.dk/
> 
> Using a few scripts, some 600+ linux packages are build and installed.
> Generally, this works like a charm.
> 
> In order to automatically collect package dependencies, I have created
> a small systemtap script to show files opened for reading:
> http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/chroot_scripts/kx_open.stp

The script looks reasonable. One small note, you shouldn't need the
'@defined($return)' check since $return should always be defined in
'kernel.function("do_filp_open").return'.

...

> For most of the 600+ packages, building is successfull, and the probe
> returns what seems to be reasonable results.
> 
> However, for a few packages, building fails:
> - firefox
> - thunderbird
> - libreoffice
> - ghc-binary
> - ghc

Here's my thought. I don't think it is the probe itself that is causing
the problem, my guess would be that it is our staprun loader (a setuid
executable). When the module gets loaded, we unset several environment
variables, like 'IFS', 'CDPATH', 'ENV', 'BASH_ENV' for security reasons.

I'd guess those failing packages depend on something set in the environment.

If this is the case, to work around this problem you might be able to
modify the script you use to run staprun (linux_functions.shinc) to add
the environment variable back in. Here's that line from
linux_functions.shinc (split up a bit):

====
 staprun /lib/modules/$(uname -r)/systemtap/kx_open.ko \
  -c "./scripts/${PASS}/${PKG}_${STEP}.sh" \
  -o "${PIPE}" > ./log/${PASS}/${PKG}_${STEP}.log 2>&1
====

You could change that to

====
  staprun /lib/modules/$(uname -r)/systemtap/kx_open.ko \
   -c "env 'FOO=BAR' ./scripts/${PASS}/${PKG}_${STEP}.sh" \
   -o "${PIPE}" > ./log/${PASS}/${PKG}_${STEP}.log 2>&1
====

Your other option would be to put the missing environment variable in
scripts/${PASS}/${PKG}_${STEP}.sh.

This might not be the problem, but it is certainly worth investigating.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Systemtap do_filp_open failure on a few linux packages
  2013-06-17 18:16 ` David Smith
@ 2013-06-18 20:20   ` Henrik /KaarPoSoft
  0 siblings, 0 replies; 3+ messages in thread
From: Henrik /KaarPoSoft @ 2013-06-18 20:20 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap

Hi David,

Thank you very much for going through my input and providing an 
excellent answer.
Your input nailed it for me (although with a twist; see below)

On 06/17/2013 08:16 PM, David Smith wrote:
> On 06/15/2013 04:52 PM, Henrik /KaarPoSoft wrote:
>> Dear all,
>>
>> I have experienced a very strange issue related to systemtap.
>> Any insights or help you might be able to provide to help me debug this
>> further would be most appreciated.
>>
>> I am developing a linux distribution called KaarPux:
>> http://kaarpux.kaarposoft.dk/
>>
>> Using a few scripts, some 600+ linux packages are build and installed.
>> Generally, this works like a charm.
>>
>> In order to automatically collect package dependencies, I have created
>> a small systemtap script to show files opened for reading:
>> http://sourceforge.net/p/kaarpux/code/ci/be342bf5667253421f562b7bc29bab8e0a2560aa/tree/master/chroot_scripts/kx_open.stp
>
> The script looks reasonable. One small note, you shouldn't need the
> '@defined($return)' check since $return should always be defined in
> 'kernel.function("do_filp_open").return'.
>

OK, thanks, will remove it later...
Kept it for now: better safe than sorry.

>
>> For most of the 600+ packages, building is successfull, and the probe
>> returns what seems to be reasonable results.
>>
>> However, for a few packages, building fails:
>> - firefox
>> - thunderbird
>> - libreoffice
>> - ghc-binary
>> - ghc
>
> Here's my thought. I don't think it is the probe itself that is causing
> the problem, my guess would be that it is our staprun loader (a setuid
> executable). When the module gets loaded, we unset several environment
> variables, like 'IFS', 'CDPATH', 'ENV', 'BASH_ENV' for security reasons.
>
> I'd guess those failing packages depend on something set in the environment.
>
> If this is the case, to work around this problem you might be able to
> modify the script you use to run staprun (linux_functions.shinc) to add
> the environment variable back in. Here's that line from
> linux_functions.shinc (split up a bit):
>
> ====
>   staprun /lib/modules/$(uname -r)/systemtap/kx_open.ko \
>    -c "./scripts/${PASS}/${PKG}_${STEP}.sh" \
>    -o "${PIPE}" > ./log/${PASS}/${PKG}_${STEP}.log 2>&1
> ====
>
> You could change that to
>
> ====
>    staprun /lib/modules/$(uname -r)/systemtap/kx_open.ko \
>     -c "env 'FOO=BAR' ./scripts/${PASS}/${PKG}_${STEP}.sh" \
>     -o "${PIPE}" > ./log/${PASS}/${PKG}_${STEP}.log 2>&1
> ====
>
> Your other option would be to put the missing environment variable in
> scripts/${PASS}/${PKG}_${STEP}.sh.
>
> This might not be the problem, but it is certainly worth investigating.
>

My problem turned out to be environment variables indeed!
Thank you for pointing me in this direction.

However, the culprit was that LD_LIBRARY_PATH got killed.

Simply adding env LD_LIBRARY_PATH=... to the staprun solved the problem:
http://sourceforge.net/p/kaarpux/code/ci/master/tree/master/shinc/linux_functions.shinc

So, again:
THANKS A MILLION FOR THE HINT !!!

/Henrik

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-06-18 20:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-15 21:52 Systemtap do_filp_open failure on a few linux packages Henrik /KaarPoSoft
2013-06-17 18:16 ` David Smith
2013-06-18 20:20   ` Henrik /KaarPoSoft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).