* Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
@ 2012-03-30 21:15 William Cohen
2012-04-02 19:30 ` David Smith
0 siblings, 1 reply; 8+ messages in thread
From: William Cohen @ 2012-03-30 21:15 UTC (permalink / raw)
To: systemtap; +Cc: David Smith
When running tests I found the following patch causing problems on ARM and IA64 machines
http://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=c0456b6fda16307070e0896d3cca86f523214a5e;hp=ca73974d84f94065dbabce2a2044d17a83bade57
On ARM with a really simple example end up with:
$ ../install/bin/stap -k ../systemtap/testsuite/systemtap.base/add.stp
WARNING: "copy_to_user_page" [/tmp/stapgL32Wo/stap_7644.ko] undefined!
Error inserting module '/tmp/stapgL32Wo/stap_7644.ko': Unknown symbol in module
WARNING: /media/greatplains/wcohen/systemtap_write/install/bin/staprun exited with status: 1
Pass 5: run failed. Try again with another '--vp 00001' option.
Keeping temporary directory "/tmp/stapgL32Wo"
On IA64 same example ends up with:
$ ../install/bin/stap -k ../systemtap/testsuite/systemtap.base/add.stp
WARNING: "flush_icache_range" [/tmp/staprfDqn5/stap_29610.ko] undefined!
Error inserting module '/tmp/staprfDqn5/stap_29610.ko': Unknown symbol in module
WARNING: /home/wcohen/systemtap_write/install/bin/staprun exited with status: 1
Pass 5: run failed. Try again with another '--vp 00001' option.
Keeping temporary directory "/tmp/staprfDqn5"
It looks like the reordering of the includes is causing a problem on these machines.
-Will
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
2012-03-30 21:15 Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64 William Cohen
@ 2012-04-02 19:30 ` David Smith
2012-04-02 20:17 ` William Cohen
0 siblings, 1 reply; 8+ messages in thread
From: David Smith @ 2012-04-02 19:30 UTC (permalink / raw)
To: William Cohen; +Cc: systemtap
On 03/30/2012 04:15 PM, William Cohen wrote:
> When running tests I found the following patch causing problems on ARM and IA64 machines
>
> http://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=c0456b6fda16307070e0896d3cca86f523214a5e;hp=ca73974d84f94065dbabce2a2044d17a83bade57
>
> On ARM with a really simple example end up with:
>
> $ ../install/bin/stap -k ../systemtap/testsuite/systemtap.base/add.stp
> WARNING: "copy_to_user_page" [/tmp/stapgL32Wo/stap_7644.ko] undefined!
> Error inserting module '/tmp/stapgL32Wo/stap_7644.ko': Unknown symbol in module
> WARNING: /media/greatplains/wcohen/systemtap_write/install/bin/staprun exited with status: 1
> Pass 5: run failed. Try again with another '--vp 00001' option.
> Keeping temporary directory "/tmp/stapgL32Wo"
>
>
> On IA64 same example ends up with:
>
> $ ../install/bin/stap -k ../systemtap/testsuite/systemtap.base/add.stp
> WARNING: "flush_icache_range" [/tmp/staprfDqn5/stap_29610.ko] undefined!
> Error inserting module '/tmp/staprfDqn5/stap_29610.ko': Unknown symbol in module
> WARNING: /home/wcohen/systemtap_write/install/bin/staprun exited with status: 1
> Pass 5: run failed. Try again with another '--vp 00001' option.
> Keeping temporary directory "/tmp/staprfDqn5"
>
>
> It looks like the reordering of the includes is causing a problem on these machines.
>
> -Will
This is fixed for IA64 in commit 414b89b. It might fix ARM also, but I
don't have access to an ARM machine, so you'll have to test that one.
Long story short, it wasn't a problem with includes, but with some new
code. New code in sym.c/vma.c was calling __access_process_vm() in
certain situations. The __access_process_vm() function ends up calling
flush_icache_range() (__access_process_vm() calls copy_to_user_page(),
which calls flush_icache_user_range(), which calls
flush_icache_range()), which isn't exported.
I fixed the problem by ifdef'ing out the __access_process_vm() call when
we've got in-kernel utrace.
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
2012-04-02 19:30 ` David Smith
@ 2012-04-02 20:17 ` William Cohen
2012-04-02 22:10 ` David Smith
0 siblings, 1 reply; 8+ messages in thread
From: William Cohen @ 2012-04-02 20:17 UTC (permalink / raw)
To: David Smith; +Cc: systemtap
On 04/02/2012 03:30 PM, David Smith wrote:
> ../install/bin/stap -k ../systemtap/testsuite/systemtap.base/add.stp
Hi David,
Thanks for taking a look at this problem.
The git checkin fixes the problem for ia64.
The arm machine is using a kernel from the linus torvald's git repo. This particular kernel doesn't have utrace support in it, so it doesn't have CONFIG_UTRACE set. Thus, things still fail in the same way on the ARM machine.
Does this code work with stock x86_64 kernel? I am wondering why this problem wasn't seen on the x86 machine. Does the x86 not need to do the explicit flushes unlike the ia64 and ARM?
-Will
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
2012-04-02 20:17 ` William Cohen
@ 2012-04-02 22:10 ` David Smith
2012-04-03 14:18 ` William Cohen
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: David Smith @ 2012-04-02 22:10 UTC (permalink / raw)
To: William Cohen; +Cc: systemtap
On 04/02/2012 03:16 PM, William Cohen wrote:
> On 04/02/2012 03:30 PM, David Smith wrote:
>> ../install/bin/stap -k ../systemtap/testsuite/systemtap.base/add.stp
>
> Hi David,
>
> Thanks for taking a look at this problem.
>
> The git checkin fixes the problem for ia64.
>
> The arm machine is using a kernel from the linus torvald's git repo. This particular kernel doesn't
> have utrace support in it, so it doesn't have CONFIG_UTRACE set.
> Thus, things still fail in the same way on the ARM machine.
>
> Does this code work with stock x86_64 kernel? I am wondering why this problem wasn't seen on the x86
> machine. Does the x86 not need to do the explicit flushes unlike the
> ia64 and ARM?
Right. From what I've been looking at on the stock x86_64 kernel,
copy_to_user_page() boils down to a memcpy() call.
Since x86_64 doesn't have an arch-specific cacheflush.h file, it
inherits the following from asm-generic/cacheflush.h:
====
....
#define flush_icache_user_range(vma,pg,adr,len) do { } while (0)
....
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
memcpy(dst, src, len); \
flush_icache_user_range(vma, page, vaddr, len); \
} while (0)
====
Arm has a arch-specific copy_to_user_page() (defined in
arch/arm/include/cacheflush.h) which isn't exported. Sigh.
It looks like on current kernels access_process_vm() is exported, which
means we could use the real function instead of our copy (can you check
and make sure this is exported on arm?). However, we've added the
__access_process_vm_noflush() variant which isn't present upstream.
I'm not sure there are easy answers here.
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
2012-04-02 22:10 ` David Smith
@ 2012-04-03 14:18 ` William Cohen
2012-04-03 14:35 ` David Smith
2012-04-04 20:57 ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
2 siblings, 0 replies; 8+ messages in thread
From: William Cohen @ 2012-04-03 14:18 UTC (permalink / raw)
To: David Smith; +Cc: systemtap
On 04/02/2012 06:10 PM, David Smith wrote:
> On 04/02/2012 03:16 PM, William Cohen wrote:
>
>> On 04/02/2012 03:30 PM, David Smith wrote:
>>> ../install/bin/stap -k ../systemtap/testsuite/systemtap.base/add.stp
>>
>> Hi David,
>>
>> Thanks for taking a look at this problem.
>>
>> The git checkin fixes the problem for ia64.
>>
>> The arm machine is using a kernel from the linus torvald's git repo. This particular kernel doesn't
>
>> have utrace support in it, so it doesn't have CONFIG_UTRACE set.
>> Thus, things still fail in the same way on the ARM machine.
>
>>
>> Does this code work with stock x86_64 kernel? I am wondering why this problem wasn't seen on the x86
>
>> machine. Does the x86 not need to do the explicit flushes unlike the
>> ia64 and ARM?
>
> Right. From what I've been looking at on the stock x86_64 kernel,
> copy_to_user_page() boils down to a memcpy() call.
>
> Since x86_64 doesn't have an arch-specific cacheflush.h file, it
> inherits the following from asm-generic/cacheflush.h:
>
> ====
> ....
> #define flush_icache_user_range(vma,pg,adr,len) do { } while (0)
> ....
> #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
> do { \
> memcpy(dst, src, len); \
> flush_icache_user_range(vma, page, vaddr, len); \
> } while (0)
> ====
>
> Arm has a arch-specific copy_to_user_page() (defined in
> arch/arm/include/cacheflush.h) which isn't exported. Sigh.
>
> It looks like on current kernels access_process_vm() is exported, which
> means we could use the real function instead of our copy (can you check
> and make sure this is exported on arm?). However, we've added the
> __access_process_vm_noflush() variant which isn't present upstream.
>
> I'm not sure there are easy answers here.
>
I looked through Module.symvers for the currently running kernel on the ARM machine and didn't see access_process_vm listed. Also seems that all the uses of access_process_vm() were for things that are built into the kernel rather than modules.
-Will
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64
2012-04-02 22:10 ` David Smith
2012-04-03 14:18 ` William Cohen
@ 2012-04-03 14:35 ` David Smith
2012-04-04 20:57 ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
2 siblings, 0 replies; 8+ messages in thread
From: David Smith @ 2012-04-03 14:35 UTC (permalink / raw)
To: William Cohen; +Cc: systemtap
On 04/02/2012 05:10 PM, David Smith wrote:
> It looks like on current kernels access_process_vm() is exported...
Which isn't true. On 3.4.0-0.rc0.git1.2.tip0.uprobes.fc18.x86_64, 'stap
-L' gets confused about which symbols are exported. I keep getting
excited about the new exports, then my hopes get cruelly dashed.
I'm still not sure how to solve this problem.
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64)
2012-04-02 22:10 ` David Smith
2012-04-03 14:18 ` William Cohen
2012-04-03 14:35 ` David Smith
@ 2012-04-04 20:57 ` David Smith
2012-04-05 15:29 ` David Smith
2 siblings, 1 reply; 8+ messages in thread
From: David Smith @ 2012-04-04 20:57 UTC (permalink / raw)
To: William Cohen; +Cc: systemtap
On 04/02/2012 05:10 PM, David Smith wrote:
> On 04/02/2012 03:16 PM, William Cohen wrote:
(stuff from Will deleted that describes a problem on IA64/ARM where some
new code I added that calls our __access_process_vm() comes up with an
undefined symbol)
Here's some more background and context here. The kernel has
_acesss_process_vm() but it isn't exported. So, we have an internal
copy called __access_process_vm(). We also have a variant, called
__access_process_vm_noflush(), that doesn't flush the instruction cache.
This variant isn't present upstream.
Our __access_process_vm() calls copy_from_user_page() and
copy_to_use_page() to do the actual reading/writing. There is a default
definition of those functions present in
include/asm-generic/cacheflush.h that looks like this:
====
....
#define flush_icache_user_range(vma,pg,adr,len) do { } while (0)
....
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
memcpy(dst, src, len); \
flush_icache_user_range(vma, page, vaddr, len); \
} while (0)
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy(dst, src, len)
====
So, copy_to_user_page()/copy_from_user_page() boil down to just memcpy().
Here's how each arch we're interested in handles
copy_to_user_page()/copy_from_user_page() in the current upstream kernel
source.
- x86, x86_64: Just uses the default.
- s390x: Just uses the default.
- ia64: flush_icache_user_range() is a define that calls
flush_icache_range(), which isn't exported
- mips: has unexported copy_to_user_page()/copy_from_user_page() functions
- arm: copy_from_user_page() is just #defined to be memcpy().
copy_to_user_page() is an unexported function
So, we've got problems on ia64, mips, and arm.
Note that I'm a bit worried that __access_process_vm_noflush() isn't
quite right on mips, since it doesn't call its arch specific
copy_from_user_page() (all other arches just use memcpy to read). But,
the mips arch-specific copy_from_user_page() isn't exported. We'd
someone who understands mips details to look at the internals of
copy_from_user_page() to know if the differences are important.
So, that's the status of __access_process_vm() on all the arches that
systemtap runs on.
However...
I realized while looking at this (prompted by Will) that the build-id
checking code is only reading user memory, not writing it. When reading
user memory, __access_process_vm() and __access_process_vm_noflush() are
fairly equivalent. So, I've switch the build-id checking code to use
__access_process_vm_noflush() (commit f1410a8).
This should hopefully fix ia64, mips, and arm on current kernels.
Will, if this doesn't fix ARM, let me know.
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64)
2012-04-04 20:57 ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
@ 2012-04-05 15:29 ` David Smith
0 siblings, 0 replies; 8+ messages in thread
From: David Smith @ 2012-04-05 15:29 UTC (permalink / raw)
To: William Cohen; +Cc: systemtap
OK, obviously I needed more caffeine yesterday, because...
On 04/04/2012 03:56 PM, David Smith wrote:
...
> Here's how each arch we're interested in handles
> copy_to_user_page()/copy_from_user_page() in the current upstream kernel
> source.
>
> - x86, x86_64: Just uses the default.
> - s390x: Just uses the default.
> - ia64: flush_icache_user_range() is a define that calls
> flush_icache_range(), which isn't exported
> - mips: has unexported copy_to_user_page()/copy_from_user_page() functions
> - arm: copy_from_user_page() is just #defined to be memcpy().
> copy_to_user_page() is an unexported function
>
> So, we've got problems on ia64, mips, and arm.
>
> Note that I'm a bit worried that __access_process_vm_noflush() isn't
> quite right on mips, since it doesn't call its arch specific
> copy_from_user_page() (all other arches just use memcpy to read). But,
> the mips arch-specific copy_from_user_page() isn't exported. We'd
> someone who understands mips details to look at the internals of
> copy_from_user_page() to know if the differences are important.
>
> So, that's the status of __access_process_vm() on all the arches that
> systemtap runs on.
Systemtap has never run on mips. I don't know how ppc got confused with
mips in my head.
Here's how ppc handles copy_to_user_page()/copy_from_user_page():
flush_icache_user_range() is an exported function
The good news here is that memcpy() can be used just fine (in
__access_process_vm_noflush()) to read user memory on all the (real)
arches we care about.
Now I'm off to drink a diet coke...
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-04-05 15:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-30 21:15 Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64 William Cohen
2012-04-02 19:30 ` David Smith
2012-04-02 20:17 ` William Cohen
2012-04-02 22:10 ` David Smith
2012-04-03 14:18 ` William Cohen
2012-04-03 14:35 ` David Smith
2012-04-04 20:57 ` Improve build-id checking when the task we're interested in isn't 'current'. git commit causing problems on ARM and IA64) David Smith
2012-04-05 15:29 ` David Smith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).