From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-1009-listarch-systemtap=sources.redhat.com@sources.redhat.com>
Received: (qmail 11814 invoked by alias); 1 Aug 2005 08:44:18 -0000
Mailing-List: contact systemtap-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:systemtap-subscribe@sources.redhat.com>
List-Post: <mailto:systemtap@sources.redhat.com>
List-Help: <mailto:systemtap-help@sources.redhat.com>, <http://sources.redhat.com/lists.html#faqs>
Sender: systemtap-owner@sources.redhat.com
Received: (qmail 11794 invoked by uid 22791); 1 Aug 2005 08:44:13 -0000
In-Reply-To: <20050731220304.GJ3726@bragg.suse.de>
Subject: Re: Hitachi djprobe mechanism
Sensitivity: 
To: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>, Mathieu Desnoyers <compudj@krystal.dyndns.org>,
        Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>,
        Karim Yaghmour <karim@opersys.com>,
        Masami Hiramatsu <masami.hiramatsu@gmail.com>,
        michel.dagenais@polymtl.ca, Roland McGrath <roland@redhat.com>,
        Satoshi Oshima <soshima@redhat.com>, sugita@sdl.hitachi.co.jp,
        systemtap@sources.redhat.com
X-Mailer: Lotus Notes Release 6.5.1IBM February 19, 2004
Message-ID: <OF331D042E.CCADD212-ON41257050.002DC63A-41257050.002F8168@uk.ibm.com>
From: Richard J Moore <richardj_moore@uk.ibm.com>
Date: Mon, 01 Aug 2005 08:44:00 -0000
X-MIMETrack: Serialize by Router on D06ML065/06/M/IBM(Release 6.53HF247 | January 6, 2005) at
 01/08/2005 09:44:09
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
X-SW-Source: 2005-q3/txt/msg00208.txt.bz2


There is another issue to consider when looking into using probes other
then int3:

Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
practice of modifying code on one processor where another has prefetched
the unmodified version of the code. Intel states that unpredictable general
protection faults may result if a synchronizing instruction (iret, int,
int3, cpuid, etc ) is not executed on the second processor before it
executes the pre-fetched out-of-date copy of the instruction.

When we became aware of this I had a long discussion with Intel's
microarchitecture guys. It turns out that the reason for this erratum
(which incidentally Intel does not intend to fix) is because the trace
cache - the stream of micorops resulting from instruction interpretation -
cannot guaranteed to be valid. Reading between the lines I assume this
issue arises because of optimization done in the trace cache, where it is
no longer possible to identify the original instruction boundaries. If the
CPU discoverers that the trace cache has been invalidated because of
unsynchronized cross-modification then instruction execution will be
aborted with a GPF. Further discussion with Intel revealed that replacing
the first opcode byte with an int3 would not be subject to this erratum.

So, is cmpxchg reliable? One has to guarantee more than mere atomicity.


- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072


             Andi Kleen                                                    
             <ak@suse.de>                                                  
                                                                        To 
             31/07/2005              Mathieu Desnoyers                     
             23:03                   <compudj@krystal.dyndns.org>          
                                                                        cc 
                                     Andi Kleen <ak@suse.de>, Karim        
                                     Yaghmour <karim@opersys.com>, Masami  
                                     Hiramatsu                             
                                     <masami.hiramatsu@gmail.com>, Masami  
                                     Hiramatsu                             
                                     <hiramatu@sdl.hitachi.co.jp>, Roland  
                                     McGrath <roland@redhat.com>, Richard  
                                     J Moore/UK/IBM@IBMGB,                 
                                     systemtap@sources.redhat.com,         
                                     sugita@sdl.hitachi.co.jp, Satoshi     
                                     Oshima <soshima@redhat.com>,          
                                     michel.dagenais@polymtl.ca            
                                                                       bcc 
                                                                           
                                                                   Subject 
                                     Re: Hitachi djprobe mechanism         
                                                                           
                                                                           
On Sat, Jul 30, 2005 at 12:47:47PM -0400, Mathieu Desnoyers wrote:
> * Andi Kleen (ak@suse.de) wrote:
> > > As I see it, the write in memory is atomic, but not the instruction
fetching. In
> > > that case, the reader would see an inconsistent last jmp address
byte.
> >
> > Yes, you're right. cmpxchg only helps when the replaced instruction
> > is >= the new instruction. For smaller instructions only a IPI to
> > stop all CPUs works.
> >
>
> It was not exactly the point of my comment. If we try to overwrite an
existing
> instruction, without any marker, two cases may show up :
>
> * the instruction to replace is >= the jmp instruction (5 bytes)
>
> It has been suggested that using cmpxchg8 would solve this problem.
cmpxchg8
> does indeed commit 8 bytes of data to memory atomically, even on 32 bits
> architectures.
>
> My question is related to the instruction we want to replace : how is it
read by
> the CPU ? If it's 5 bytes in size, il has to be read in two chunks by the
cpu in
> a 32 bits arch. Does the CPU lock the memory bus between those two read ?

32bit ISA has nothing to do how the CPU fetches instructions
("32bit" x86s usually have a much wider memory interface)

In general these things are done on cache lines between 32 and 128 bytes
depending on the CPU. Of course cache lines can be crossed by instructions,
but the
CPU should handle that atomically.

However is no guarantee afaik for that in the architecture though so you
cannot
really rely on it. If let's say the 386 had this behaviour then it is
probably
safe to assume later x86s implement it too for compatibility (modulo bugs)

In practice it's more complicated. The CPU fetches the instruction
some time before actually executing it into its pipeline, and then sniffs
the bus for any modifications of it and then cancels and reexecutes the
instruction if needed.

However when you look at CPU errata sheets you will find quite a lot
of bugs in this area, so I would not really rely on frequent patching for
production.

I think just using the IPI is much simpler and easier.


> * the instruction to replace is < the jmp instruction (4 bytes or less)
>
> If our goal is to overwrite code which has not been surrounded by a
marker, an
> IPI wouldn't save us here. The marker is necessary in order to disable
> interruptions and make the IPI meaningful.

You lost me here.


>
>
> > Actually there may be tricks possible to first int3 (or equivalent
single
> > byte replacement on other archs) the second instruction,
> > then the first, then wait for a RCU period of all CPUs to quiescence
and then
> > write the longer jump. But an IPI is probably easier because it doesn't
need
> > a full disassembler for this and setting probes should not be
performance
> > critical.
> >
>
> Well, in fact, there is still a problem. (on no, not again!) ;)  The RCU
does
> require the reader to disable preemption, otherwise there is no guarantee
that
> they won't be scheduled out in the middle of the critical section, and
the RCU
> does only guarantee that a non schedulable reader will have finished by
the time
> the RCU period is over.
>
> How do you plan to disable unvolountary preemption around the
instructions you
> want to overwrite ?


One way would be to just search the task list for any tasks blocked with an
IP
inside the patched region. If yes rewait for another quiescent period.

-Andi