public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
@ 2012-03-30 21:15 William Cohen
  2012-04-02 19:30 ` David Smith
  0 siblings, 1 reply; 8+ messages in thread
From: William Cohen @ 2012-03-30 21:15 UTC (permalink / raw)
  To: systemtap; +Cc: David Smith

When running tests I found the following patch causing problems on ARM and IA64 machines

http://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=c0456b6fda16307070e0896d3cca86f523214a5e;hp=ca73974d84f94065dbabce2a2044d17a83bade57

On ARM with a really simple example end up with:

$ ../install/bin/stap  -k ../systemtap/testsuite/systemtap.base/add.stp 
WARNING: "copy_to_user_page" [/tmp/stapgL32Wo/stap_7644.ko] undefined!
Error inserting module '/tmp/stapgL32Wo/stap_7644.ko': Unknown symbol in module
WARNING: /media/greatplains/wcohen/systemtap_write/install/bin/staprun exited with status: 1
Pass 5: run failed.  Try again with another '--vp 00001' option.
Keeping temporary directory "/tmp/stapgL32Wo"


On IA64 same example ends up with:

$ ../install/bin/stap  -k ../systemtap/testsuite/systemtap.base/add.stp 
WARNING: "flush_icache_range" [/tmp/staprfDqn5/stap_29610.ko] undefined!
Error inserting module '/tmp/staprfDqn5/stap_29610.ko': Unknown symbol in module
WARNING: /home/wcohen/systemtap_write/install/bin/staprun exited with status: 1
Pass 5: run failed.  Try again with another '--vp 00001' option.
Keeping temporary directory "/tmp/staprfDqn5"


It looks like the reordering of the includes is causing a problem on these machines.

-Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
  2012-03-30 21:15 Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64 William Cohen
@ 2012-04-02 19:30 ` David Smith
  2012-04-02 20:17   ` William Cohen
  0 siblings, 1 reply; 8+ messages in thread
From: David Smith @ 2012-04-02 19:30 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On 03/30/2012 04:15 PM, William Cohen wrote:

> When running tests I found the following patch causing problems on ARM and IA64 machines
> 
> http://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=c0456b6fda16307070e0896d3cca86f523214a5e;hp=ca73974d84f94065dbabce2a2044d17a83bade57
> 
> On ARM with a really simple example end up with:
> 
> $ ../install/bin/stap  -k ../systemtap/testsuite/systemtap.base/add.stp 
> WARNING: "copy_to_user_page" [/tmp/stapgL32Wo/stap_7644.ko] undefined!
> Error inserting module '/tmp/stapgL32Wo/stap_7644.ko': Unknown symbol in module
> WARNING: /media/greatplains/wcohen/systemtap_write/install/bin/staprun exited with status: 1
> Pass 5: run failed.  Try again with another '--vp 00001' option.
> Keeping temporary directory "/tmp/stapgL32Wo"

>

> 
> On IA64 same example ends up with:
> 
> $ ../install/bin/stap  -k ../systemtap/testsuite/systemtap.base/add.stp 
> WARNING: "flush_icache_range" [/tmp/staprfDqn5/stap_29610.ko] undefined!
> Error inserting module '/tmp/staprfDqn5/stap_29610.ko': Unknown symbol in module
> WARNING: /home/wcohen/systemtap_write/install/bin/staprun exited with status: 1
> Pass 5: run failed.  Try again with another '--vp 00001' option.
> Keeping temporary directory "/tmp/staprfDqn5"
> 
> 
> It looks like the reordering of the includes is causing a problem on these machines.
> 
> -Will


This is fixed for IA64 in commit 414b89b.  It might fix ARM also, but I
don't have access to an ARM machine, so you'll have to test that one.

Long story short, it wasn't a problem with includes, but with some new
code.  New code in sym.c/vma.c was calling __access_process_vm() in
certain situations.  The __access_process_vm() function ends up calling
flush_icache_range() (__access_process_vm() calls copy_to_user_page(),
which calls flush_icache_user_range(), which calls
flush_icache_range()), which isn't exported.

I fixed the problem by ifdef'ing out the __access_process_vm() call when
we've got in-kernel utrace.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
  2012-04-02 19:30 ` David Smith
@ 2012-04-02 20:17   ` William Cohen
  2012-04-02 22:10     ` David Smith
  0 siblings, 1 reply; 8+ messages in thread
From: William Cohen @ 2012-04-02 20:17 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap

On 04/02/2012 03:30 PM, David Smith wrote:
> ../install/bin/stap  -k ../systemtap/testsuite/systemtap.base/add.stp 

Hi David,

Thanks for taking a look at this problem.

The git checkin fixes the problem for ia64.

The arm machine is using a kernel from the linus torvald's git repo. This particular kernel doesn't have utrace support in it, so it doesn't have CONFIG_UTRACE set.  Thus, things still fail in the same way on the ARM machine.

Does this code work with stock x86_64 kernel?  I am wondering why this problem wasn't seen on the x86 machine. Does the x86 not need to do the explicit flushes unlike the ia64 and ARM? 

-Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
  2012-04-02 20:17   ` William Cohen
@ 2012-04-02 22:10     ` David Smith
  2012-04-03 14:18       ` William Cohen
                         ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Smith @ 2012-04-02 22:10 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On 04/02/2012 03:16 PM, William Cohen wrote:

> On 04/02/2012 03:30 PM, David Smith wrote:
>> ../install/bin/stap  -k ../systemtap/testsuite/systemtap.base/add.stp 
> 
> Hi David,
> 
> Thanks for taking a look at this problem.
> 
> The git checkin fixes the problem for ia64.
> 
> The arm machine is using a kernel from the linus torvald's git repo. This particular kernel doesn't

> have utrace support in it, so it doesn't have CONFIG_UTRACE set.
> Thus, things still fail in the same way on the ARM machine.

> 
> Does this code work with stock x86_64 kernel?  I am wondering why this problem wasn't seen on the x86

> machine. Does the x86 not need to do the explicit flushes unlike the
> ia64 and ARM?

Right.  From what I've been looking at on the stock x86_64 kernel,
copy_to_user_page() boils down to a memcpy() call.

Since x86_64 doesn't have an arch-specific cacheflush.h file, it
inherits the following from asm-generic/cacheflush.h:

====
....
#define flush_icache_user_range(vma,pg,adr,len) do { } while (0)
....
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
        do { \
                memcpy(dst, src, len); \
                flush_icache_user_range(vma, page, vaddr, len); \
        } while (0)
====

Arm has a arch-specific copy_to_user_page() (defined in
arch/arm/include/cacheflush.h) which isn't exported.  Sigh.

It looks like on current kernels access_process_vm() is exported, which
means we could use the real function instead of our copy (can you check
and make sure this is exported on arm?).  However, we've added the
__access_process_vm_noflush() variant which isn't present upstream.

I'm not sure there are easy answers here.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
  2012-04-02 22:10     ` David Smith
@ 2012-04-03 14:18       ` William Cohen
  2012-04-03 14:35       ` David Smith
  2012-04-04 20:57       ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
  2 siblings, 0 replies; 8+ messages in thread
From: William Cohen @ 2012-04-03 14:18 UTC (permalink / raw)
  To: David Smith; +Cc: systemtap

On 04/02/2012 06:10 PM, David Smith wrote:
> On 04/02/2012 03:16 PM, William Cohen wrote:
> 
>> On 04/02/2012 03:30 PM, David Smith wrote:
>>> ../install/bin/stap  -k ../systemtap/testsuite/systemtap.base/add.stp 
>>
>> Hi David,
>>
>> Thanks for taking a look at this problem.
>>
>> The git checkin fixes the problem for ia64.
>>
>> The arm machine is using a kernel from the linus torvald's git repo. This particular kernel doesn't
> 
>> have utrace support in it, so it doesn't have CONFIG_UTRACE set.
>> Thus, things still fail in the same way on the ARM machine.
> 
>>
>> Does this code work with stock x86_64 kernel?  I am wondering why this problem wasn't seen on the x86
> 
>> machine. Does the x86 not need to do the explicit flushes unlike the
>> ia64 and ARM?
> 
> Right.  From what I've been looking at on the stock x86_64 kernel,
> copy_to_user_page() boils down to a memcpy() call.
> 
> Since x86_64 doesn't have an arch-specific cacheflush.h file, it
> inherits the following from asm-generic/cacheflush.h:
> 
> ====
> ....
> #define flush_icache_user_range(vma,pg,adr,len) do { } while (0)
> ....
> #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
>         do { \
>                 memcpy(dst, src, len); \
>                 flush_icache_user_range(vma, page, vaddr, len); \
>         } while (0)
> ====
> 
> Arm has a arch-specific copy_to_user_page() (defined in
> arch/arm/include/cacheflush.h) which isn't exported.  Sigh.
> 
> It looks like on current kernels access_process_vm() is exported, which
> means we could use the real function instead of our copy (can you check
> and make sure this is exported on arm?).  However, we've added the
> __access_process_vm_noflush() variant which isn't present upstream.
> 
> I'm not sure there are easy answers here.
> 

I looked through Module.symvers for the currently running kernel on the ARM machine and didn't see access_process_vm listed. Also seems that all the uses of access_process_vm() were for things that are built into the kernel rather than modules.

-Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
  2012-04-02 22:10     ` David Smith
  2012-04-03 14:18       ` William Cohen
@ 2012-04-03 14:35       ` David Smith
  2012-04-04 20:57       ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
  2 siblings, 0 replies; 8+ messages in thread
From: David Smith @ 2012-04-03 14:35 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On 04/02/2012 05:10 PM, David Smith wrote:

> It looks like on current kernels access_process_vm() is exported...


Which isn't true. On 3.4.0-0.rc0.git1.2.tip0.uprobes.fc18.x86_64, 'stap
-L' gets confused about which symbols are exported.  I keep getting
excited about the new exports, then my hopes get cruelly dashed.

I'm still not sure how to solve this problem.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64)
  2012-04-02 22:10     ` David Smith
  2012-04-03 14:18       ` William Cohen
  2012-04-03 14:35       ` David Smith
@ 2012-04-04 20:57       ` David Smith
  2012-04-05 15:29         ` David Smith
  2 siblings, 1 reply; 8+ messages in thread
From: David Smith @ 2012-04-04 20:57 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

On 04/02/2012 05:10 PM, David Smith wrote:

> On 04/02/2012 03:16 PM, William Cohen wrote:


(stuff from Will deleted that describes a problem on IA64/ARM where some
new code I added that calls our __access_process_vm() comes up with an
undefined symbol)

Here's some more background and context here.  The kernel has
_acesss_process_vm() but it isn't exported.  So, we have an internal
copy called __access_process_vm(). We also have a variant, called
__access_process_vm_noflush(), that doesn't flush the instruction cache.
 This variant isn't present upstream.

Our __access_process_vm() calls copy_from_user_page() and
copy_to_use_page() to do the actual reading/writing. There is a default
definition of those functions present in
include/asm-generic/cacheflush.h that looks like this:

====

....
#define flush_icache_user_range(vma,pg,adr,len) do { } while (0)
....
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
        do { \
                memcpy(dst, src, len); \
                flush_icache_user_range(vma, page, vaddr, len); \
        } while (0)

#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
        memcpy(dst, src, len)
====

So, copy_to_user_page()/copy_from_user_page() boil down to just memcpy().

Here's how each arch we're interested in handles
copy_to_user_page()/copy_from_user_page() in the current upstream kernel
source.

- x86, x86_64: Just uses the default.
- s390x: Just uses the default.
- ia64: flush_icache_user_range() is a define that calls
flush_icache_range(), which isn't exported
- mips: has unexported copy_to_user_page()/copy_from_user_page() functions
- arm: copy_from_user_page() is just #defined to be memcpy().
copy_to_user_page() is an unexported function

So, we've got problems on ia64, mips, and arm.

Note that I'm a bit worried that __access_process_vm_noflush() isn't
quite right on mips, since it doesn't call its arch specific
copy_from_user_page() (all other arches just use memcpy to read).  But,
the mips arch-specific copy_from_user_page() isn't exported.  We'd
someone who understands mips details to look at the internals of
copy_from_user_page() to know if the differences are important.

So, that's the status of __access_process_vm() on all the arches that
systemtap runs on.

However...


I realized while looking at this (prompted by Will) that the build-id
checking code is only reading user memory, not writing it.  When reading
user memory, __access_process_vm() and __access_process_vm_noflush() are
fairly equivalent.  So, I've switch the build-id checking code to use
__access_process_vm_noflush() (commit f1410a8).

This should hopefully fix ia64, mips, and arm on current kernels.

Will, if this doesn't fix ARM, let me know.

-- 
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64)
  2012-04-04 20:57       ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
@ 2012-04-05 15:29         ` David Smith
  0 siblings, 0 replies; 8+ messages in thread
From: David Smith @ 2012-04-05 15:29 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

OK, obviously I needed more caffeine yesterday, because...

On 04/04/2012 03:56 PM, David Smith wrote:

...
> Here's how each arch we're interested in handles
> copy_to_user_page()/copy_from_user_page() in the current upstream kernel
> source.
> 
> - x86, x86_64: Just uses the default.
> - s390x: Just uses the default.
> - ia64: flush_icache_user_range() is a define that calls
> flush_icache_range(), which isn't exported
> - mips: has unexported copy_to_user_page()/copy_from_user_page() functions
> - arm: copy_from_user_page() is just #defined to be memcpy().
> copy_to_user_page() is an unexported function
> 
> So, we've got problems on ia64, mips, and arm.
> 
> Note that I'm a bit worried that __access_process_vm_noflush() isn't
> quite right on mips, since it doesn't call its arch specific
> copy_from_user_page() (all other arches just use memcpy to read).  But,
> the mips arch-specific copy_from_user_page() isn't exported.  We'd
> someone who understands mips details to look at the internals of
> copy_from_user_page() to know if the differences are important.
> 
> So, that's the status of __access_process_vm() on all the arches that
> systemtap runs on.


Systemtap has never run on mips.  I don't know how ppc got confused with
mips in my head.

Here's how ppc handles copy_to_user_page()/copy_from_user_page():
flush_icache_user_range() is an exported function

The good news here is that memcpy() can be used just fine (in
__access_process_vm_noflush()) to read user memory on all the (real)
arches we care about.

Now I'm off to drink a diet coke...

-- 

David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-04-05 15:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-30 21:15 Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64 William Cohen
2012-04-02 19:30 ` David Smith
2012-04-02 20:17   ` William Cohen
2012-04-02 22:10     ` David Smith
2012-04-03 14:18       ` William Cohen
2012-04-03 14:35       ` David Smith
2012-04-04 20:57       ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
2012-04-05 15:29         ` David Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).