From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24630 invoked by alias); 11 Jul 2008 10:06:30 -0000 Received: (qmail 24621 invoked by uid 22791); 11 Jul 2008 10:06:28 -0000 X-Spam-Status: No, hits=-1.4 required=5.0 tests=BAYES_00,J_CHICKENPOX_54,J_CHICKENPOX_63,SPF_PASS X-Spam-Check-By: sourceware.org Received: from ti-out-0910.google.com (HELO ti-out-0910.google.com) (209.85.142.184) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 11 Jul 2008 10:06:06 +0000 Received: by ti-out-0910.google.com with SMTP id y6so1765881tia.18 for ; Fri, 11 Jul 2008 03:06:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:cc:subject:content-type; bh=bAaeelfdFRv8a9jGTZyJ4lLntqintsQlNZCpDFGHVuA=; b=MS9qHzWCJFMiZjzb+pMT2wP75XZCL3UNGf2c9xHPKlyg77qfMzFO5BMTdVXNzLASyy DjjzymeUXWnnVp5vzcoUccAYdPbtDAs2PfM4ptc4kEbZTht+TUas7Dd7g7hWW2pWuOlR PSBd9jVmcvRdtpQKk3+JVZ39I0Xw1N31h4qi4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :content-type; b=tbdfW9PwFkRMkej3XI3MYqysFycH5Lfwx+HTqovEWDJpZsQ0RM1ekE+VsKtY7OAH5Z SI/DsoJ9Yx+ZCGer/HxAz8IAvrkuwo3OK6IIzFSh8BfhGI11GQPiPSXK4NkN8WH8kets 6PafFL4Ge4MjQLHiIfvu4atytI9m6Emd+UhxU= Received: by 10.110.15.19 with SMTP id 19mr5879618tio.42.1215770762553; Fri, 11 Jul 2008 03:06:02 -0700 (PDT) Received: from ?9.124.35.39? ( [59.145.136.1]) by mx.google.com with ESMTPS id j5sm127335tid.12.2008.07.11.03.05.56 (version=SSLv3 cipher=RC4-MD5); Fri, 11 Jul 2008 03:06:00 -0700 (PDT) Message-ID: <48773047.1050906@gmail.com> Date: Fri, 11 Jul 2008 10:06:00 -0000 From: Anup C Shan User-Agent: Thunderbird 2.0.0.14 (X11/20080505) MIME-Version: 1.0 To: systemtap@sources.redhat.com CC: kghoshnitk@gmail.com, akinobu.mita@gmail.com, k-tanaka@ce.jp.nec.com Subject: [RFC 1/5] Kernel Fault injection framework using SystemTap Content-Type: multipart/mixed; boundary="------------010908090106050900080502" X-Virus-Checked: Checked by ClamAV on sourceware.org Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2008-q3/txt/msg00139.txt.bz2 This is a multi-part message in MIME format. --------------010908090106050900080502 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-length: 664 Hi. We have designed a tapset for fault injection. It is meant to ease the process of injecting faults into the kernel. As use cases, we have ported in-kernel fault injection for slab and page_alloc using this framework. Refer Documentation/fault-injection/ We have also modified the existing SCSI fault-injection systemtap script (http://sourceforge.net/projects/scsifaultinjtst/) to use this framework. Please find the tapset file and readme attached. The usecase scripts are in the follow-up mails. Comments and suggestions are welcome. Please suggest a right location to place these tapset scripts in SystemTap source tree. Thanks, Kushal & Anup --------------010908090106050900080502 Content-Type: text/plain; name="README.faultinject" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="README.faultinject" Content-length: 4500 Introduction ------------ This tapset provides a framework to facilitate fault injections for testing the kernel. The framework can be used by systemtap scripts to actually inject faults. The framework processes the command line arguments and controls the fault injection process. Following are the generic parameters used to set up the fault injection. a) failtimes - maximum number of times the process can be failed b) interval - number of successful hits between potential failures c) probability - probability of potential failure d) taskfilter - fail all processes or filter processes on pid e) space - number of successful hits before the first failure f) verbosity - control amount of output generated by the script g) totaltime - duration of fault injection session h) debug - print debug information for the script i) pid - process IDs of processes to inject failures into. This can also be specified using the -x option. These parameters are registered in the tapset using the fij_add_option() function which also sets the script specific default values and provides help text. The generic parameters are appended to the params[] array and can be accessed using params["variable_name"]. If you doesn't specify any of the parameters in command line, its default value is used. Using fij_load_param(), your script can also assign script-specific default values to generic parameters. You can define mandatory parameters, which are specific to the script depending upon the kernel subsystem under test. These variables must necessarily be specified on the command line during command execution. E.g: device numbers, inode numbers etc which cannot be given default values. Such parameters can be registered using the fij_add_necessary_option() function. On calling this function, the variable is appended to a mandatoryparams[] array. If these parameters are not specified on the command line, an error is reported and script is aborted. The variable can be accessed at params["variable_name"]. The framework controls the fault injection using fij_should_fail() and fij_done_fail() functions. Your script should probe the relevant kernel routine subjected to fault injection. The user-defined probe handler invokes fij_should_fail(), which returns 1 if it's time to inject a failure, or 0 otherwise. Faults can be injected by your script in various ways like faking the error return by changing the return value, by modifying data structures etc. fij_done_fail() must be called immediately after fault injection to alert the tapset of this. fij_done_fail() must not be called in case no fault was injected. fij_logger() - This is a wrapper for the SystemTap log() function with an added verbosity parameter. The message will be displayed only if the value of global fij_verbosity is equal to or more than the parameter provided to the function. How to use the tapset --------------------- 1) begin probe that adds user defined parameters and default values. 2) Probes for fault injection. Call fij_should_fail() before injecting the fault and fij_done_fail() after fault is injected. Description of code flow ------------------------ 1) begin(less than -1000) in the user script [OPTIONAL] - Preinitialization. As of now, this is not necessary. 2) begin(-1000) in the tapset - This function initialises counters and registers all generic parameters with global defaults. 3) begin in the user script - User defined default parameters are supplied here. Also any script specific parameters are registered at this stage. 4) begin(1000) in the tapset - Command line arguments are parsed and parameters assigned appropriate values. 5) begin(more than 1000) in the user script [OPTIONAL] - This can be used to copy values of arguments from params[] array to local/global variables for easy referencing. 6) Script starts executing. It is interrupted every 10 milliseconds to check if script has run for the stipulated length of time. 7) When function/statement probes are hit, the script must invoke fij_should_fail() function to check if the conditions for failure have been satisfied. 8) Fail the function using suitable methods (changing return values, setting fake values to variables...) 9) Call fij_done_fail() function to inform tapset that fault has been injected. 10) Script will exit either when script calls exit() function or when a timeout is hit. At this point, stats of the experiment are printed. --------------010908090106050900080502 Content-Type: text/plain; name="faultinject.stp" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="faultinject.stp" Content-length: 8789 %{ #include %} global fij_params //Array of all parameters. (except fij_pids_to_fail) global fij_paramshelp //Array of help information for all parameters global fij_mandatoryparams //Array of mandatory parameters global fij_pids_to_fail //Array of pids subject to fault injection global fij_failcount //Number of times failed so far global fij_probehits //Number of times the probe has been hit global fij_intervalcount //Number of successful probe hits global fij_aborted //Boolean value to check whether the fault injection //procedure needs to continue or not //Needed for help option global fij_failtimes global fij_verbosity global fij_debug global fij_taskfilter global fij_interval global fij_probability global fij_space global fij_totaltime function fij_random:long() %{ THIS->__retvalue = random32(); %} function fij_add_process_to_fail(procid:long) { fij_logger(1, sprintf("Adding process %d to the fail list", procid)) fij_pids_to_fail[procid] = 1 } /* * Add an option to the parameters list * This option can be provided on the command line as opt = value */ function fij_add_option(opt:string, defval, help:string) { fij_params[opt] = defval fij_paramshelp[opt] = help } /* * Add an option to the necessary parameters list * This option MUST be provided on the command line */ function fij_add_necessary_option(opt:string, help:string) { fij_mandatoryparams[opt] = 1 fij_paramshelp[opt] = help } function fij_print_help() { fij_logger(0, "Usage : stap script.stp [ option1=value1 [ option2=value2 [ ...]]]") fij_logger(0, "Options : ") fij_logger(0, "\tpid\r\t\t\t\t : PID of a process to be failed. Use this option repeatedly to add multiple processes to fail") foreach (option in fij_params) { fij_logger(0, sprintf("\t%s\r\t\t\t\t : %s", option, fij_paramshelp[option])) } needed_options_counter = 0 foreach (option in fij_mandatoryparams) { if (needed_options_counter == 0) { fij_logger(0, "Necessary options : ") needed_options_counter++ } fij_logger(0, sprintf("\t%s\r\t\t\t\t : %s", option, fij_paramshelp[option])) } fij_logger(0, "For help : stap script.stp help") fij_aborted = 1 } function fij_process_argument(arg:string) { if (isinstr(arg, "=") == 1) { parameter=tokenize(arg, "=") value_in_str = tokenize("", "=") value = strtol(value_in_str, 10) if (parameter in fij_params) { fij_params[parameter] = value fij_logger(1, sprintf("Parameter %s is assigned value %d", parameter, fij_params[parameter])) } else if (parameter in fij_mandatoryparams) { fij_add_option(parameter, value, fij_paramshelp[parameter]) delete fij_mandatoryparams[parameter] } else if (parameter == "pid") { fij_add_process_to_fail(value) } else fij_logger(0, sprintf("WARNING : Argument %s is not found in parameter list. Ignoring..", parameter)) } else fij_logger(0, sprintf("WARNING : Invalid command line argument : %s", arg)) } function fij_show_params() { fij_logger(1, "Status of parameters :") foreach (option in fij_params) fij_logger(1, sprintf("Option %s has value %d", option, fij_params[option])) } /* * Parse command line arguments */ function fij_parse_command_line_args() { for (i = 1; i <= argc ; i++) { if (argv[i] == "help") { fij_print_help() return 0 } else fij_process_argument(argv[i]) } foreach (parameter in fij_mandatoryparams) { fij_logger(0, sprintf("ERROR: Necessary command line parameter %s not specified", parameter)) fij_aborted = 1 } } /* * Load script specific default parameters * This function is called by the script using this tapset to set custom * default values */ function fij_load_param(arg_times:long, arg_interval:long, arg_probability:long, arg_taskfilter:long, arg_space:long, arg_verbose:long, arg_totaltime:long) { fij_add_option("failtimes", arg_times, "Number of times to fail (0 = no limit)") fij_add_option("interval", arg_interval, "Number of successful hits between potential failures (0 to fail everytime)") fij_add_option("probability", arg_probability, "Probability of failure (1<=probability<=100) (0 to disable)") fij_add_option("taskfilter", arg_taskfilter, "0=>Fail all processes, 1=>Fail processes based on pid command line argument or -x option.") fij_add_option("space", arg_space, "Number of successful hits before the first failure (0 to disable)") fij_add_option("verbosity", arg_verbose, "0=>Success or Failure messages, 1=>Print parameter status, 2=>All probe hits, backtrace and register states") fij_add_option("totaltime", arg_totaltime, "Duration of fault injection session in milliseconds (Default : 1000 milliseconds)") } /* * Modified log function with an additional verbosity parameter * The message is printed only if the current fij_verbosity parameter * is greater than the minimum verbosity specified. Minverbosity value of 100 * is reserved only for debugging the script. */ function fij_logger(minverbosity:long, msg:string) { if (fij_verbosity >= minverbosity) log(msg) else if (minverbosity == 100 && fij_debug == 1) log(msg) } /* * Checks whether the specified constraints for failure have been met * Returns 1 if process must be failed, else returns 0 */ function fij_should_fail:long() { fij_probehits++ fij_logger(2, "Probe hit") if (fij_taskfilter != 0) { if(!(pid() in fij_pids_to_fail)) { fij_logger(100, sprintf("Skipping because wrong process %d - %s probed", pid(), execname())) return 0 } else fij_logger(100, sprintf("Continuing with probing process %d - %s %d", pid(), execname(), target())) } if (fij_failcount == 0) { if (fij_space != 0) { if (fij_intervalcount < fij_space) { fij_logger(100, sprintf("Skipping on space : %d", fij_intervalcount)) fij_intervalcount++ return 0 } else { fij_intervalcount = 0 fij_logger(100, sprintf("Done skipping on space")) fij_space = 0 } } } if (fij_failtimes != 0 && fij_failcount >= fij_failtimes) { fij_logger(100, sprintf("Failed %d times already. Skipping..", fij_failcount)) return 0 } if (fij_interval != 0) { if (fij_intervalcount != 0) { fij_logger(100, sprintf("Skipping on interval : %d", fij_intervalcount)) fij_intervalcount++ fij_intervalcount %= fij_interval return 0 } else fij_intervalcount++ } if (fij_probability != 0) { if (fij_random() % 100 > fij_probability) { fij_logger(100, sprintf("Skipping on probability")) return 0 } else fij_logger(100, sprintf("Continuing on probability")) } return 1 } /* * Post injection cleanup * This function MUST be called after the process has been failed */ function fij_done_fail() { fij_failcount++ fij_logger(0, sprintf("Failed process %d - %s", pid(), execname())) if (fij_params["verbosity"] >= 2 && fij_verbosity != 100) { print_backtrace() print_regs() } } function fij_display_stats() { fij_logger(0, sprintf("Probe was hit %d times.", fij_probehits)) fij_logger(0, sprintf("Function was failed %d times.", fij_failcount)) } /* * The first begin function * Initialises counters and adds generic parameters to the parameters list * In case the script requires a begin function to be executed prior to this, * parameter of less than -1000 must be specified to begin() */ probe begin(-1000) { fij_failcount = 0 fij_probehits = 0 fij_intervalcount = 0 fij_aborted = 0 fij_load_param(0, 0, 0, 0, 0, 0, 1000) //Loading default values fij_add_option("debug", 0, "Display debug information. Requires verbosity=100") fij_mandatoryparams["initialize"] = 1; //Needed to register mandatory //params as an array delete fij_mandatoryparams["initialize"] } /* * The last begin function * Does parsing of command line arguments * In case the script requires a begin function to be executed after all * initialization, parameter of greater than 1000 must be specified to begin() * Eg: when you need to copy one of the fij_params[] options into a local variable * after parsing command line args */ probe begin(1000) { if (target()!=0) fij_add_process_to_fail(target()) fij_parse_command_line_args() fij_failtimes = fij_params["failtimes"] fij_interval = fij_params["interval"] fij_probability = fij_params["probability"] fij_taskfilter = fij_params["taskfilter"] fij_space = fij_params["space"] fij_verbosity = fij_params["verbosity"] fij_totaltime = fij_params["totaltime"] fij_debug = fij_params["debug"] if (fij_aborted) exit() else fij_show_params() } probe end { if (!fij_aborted) fij_display_stats() } //Check every 10 ms if the stipulated execution time has expired probe timer.ms(10) { fij_totaltime -= 10 if (fij_totaltime <= 0) exit() } --------------010908090106050900080502--