* [RFC] Using function clones for Pointer Bounds Checker @ 2014-01-14 9:16 Ilya Enkovich 2014-01-14 11:46 ` Richard Biener 2014-05-12 22:12 ` Jeff Law 0 siblings, 2 replies; 15+ messages in thread From: Ilya Enkovich @ 2014-01-14 9:16 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Biener, Jeff Law, Zamyatin, Igor Hi, I've been working for some time on the prototype of the Pointer Bounds Checker which uses function clones for instrumentation (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After several experiments with this approach I want to share my results and ask for some feedback to make a decision about the future steps. Firstly I want to remind the reasons for digging in this direction. In the original approach bounds of call arguments and input parameters are associated with arguments via special built-in calls. It creates implicit data flow compiler is not aware about which confuses some optimizations resulting in miss-optimization and breaks bounds data flow. Thus optimizations have to be fixed to get better pointers protection. Clones approach does not use special built-in function calls to associate bounds with call arguments and input parameters. Each function which should be instrumented gets an additional version and only this special version will be instrumented.This new version gets additional bound arguments to express input bounds. When function call is instrumented, it is redirected to instrumented version and all bounds are passed as explicit call arguments. Thus we have explicit pointer bounds flow similar to regular function parameters. It should allow to avoid changes in optimization, avoid miss-optimizations, allow existing IPA optimizations to work with bound args (e.g. propagate constant bounds value and remove checks in called function). I made a prototype implementation of this approach in the following way: - Add new IPA pass before early local passes to produce versions for all functions to be instrumented. - Put instrumentation pass after SSA pass. - Add new pass after IPA passes to remove bodies of functions which have instrumented versions. Function nodes may still be required for calls in not instrumented code. But we do not emit this code and therefore function bodies are not needed. Positive changes are: - IPA optimizations are not confused by bound parameters - bounds are now more like regular arguments; it makes their processing in expand easier - functions with bounds not attached to any pointer are allowed On simple codes this approach worked well but on a bigger tests some issues were revealed. 1. Nodes reachability. Instrumented version is actually always reachable when original function is reachable because it is always emitted instead of the original. Thus I had to fix reachability analysis to achieve it. Another similar problem is check whether node can be removed after inline when inlining instrumented function. Not hard to fix but probably other similar problems exist. 2. Function processing order. Function processing order is determined before early local passes. But during function instrumentation call graph is modified significantly and used topological order becomes outdated. That causes some troubles. E.g. function marked as 'always inline' cannot be inlined because it is not in SSA form yet. Surely inlining problem may be solved by just putting instrumentation after early inline, but similar problem may exist in other passes too. To resolve this problem I tried to split early local passes into three parts. The first one builds SSA, the second one performs instrumentation, the last one does the rest. Each part is performed on all functions before the next one starts. Thus I get all functions in SSA form and all instrumentation performed before starting early optimizations. Unfortunately such passes order leads to invalid SSA because of local_pure_const optimization affecting callers correctness (in case caller SSA was built before optimization revealed 'pure' or 'const' flag). In general I feel that having special function version for instrumentation has a better potential, should lead to less intrusive changes in the compiler and better code quality. But before continue this implementation I would like to get some feedback and probably some advice on how to order passes to get the best result. Currently I incline to have all functions instrumented before any local optimizations and solve pure_const problem by modifying all callers when attribute is computed for some function. Thanks, Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-01-14 9:16 [RFC] Using function clones for Pointer Bounds Checker Ilya Enkovich @ 2014-01-14 11:46 ` Richard Biener 2014-01-14 12:47 ` Ilya Enkovich 2014-05-12 22:12 ` Jeff Law 1 sibling, 1 reply; 15+ messages in thread From: Richard Biener @ 2014-01-14 11:46 UTC (permalink / raw) To: Ilya Enkovich; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > Hi, > > I've been working for some time on the prototype of the Pointer Bounds > Checker which uses function clones for instrumentation > (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After > several experiments with this approach I want to share my results and > ask for some feedback to make a decision about the future steps. > > Firstly I want to remind the reasons for digging in this direction. In > the original approach bounds of call arguments and input parameters > are associated with arguments via special built-in calls. It creates > implicit data flow compiler is not aware about which confuses some > optimizations resulting in miss-optimization and breaks bounds data > flow. Thus optimizations have to be fixed to get better pointers > protection. > > Clones approach does not use special built-in function calls to > associate bounds with call arguments and input parameters. Each > function which should be instrumented gets an additional version and > only this special version will be instrumented.This new version gets > additional bound arguments to express input bounds. When function call > is instrumented, it is redirected to instrumented version and all > bounds are passed as explicit call arguments. Thus we have explicit > pointer bounds flow similar to regular function parameters. It should > allow to avoid changes in optimization, avoid miss-optimizations, > allow existing IPA optimizations to work with bound args (e.g. > propagate constant bounds value and remove checks in called function). > > I made a prototype implementation of this approach in the following way: > > - Add new IPA pass before early local passes to produce versions for > all functions to be instrumented. > - Put instrumentation pass after SSA pass. > - Add new pass after IPA passes to remove bodies of functions which > have instrumented versions. Function nodes may still be required for > calls in not instrumented code. But we do not emit this code and > therefore function bodies are not needed. > > Positive changes are: > > - IPA optimizations are not confused by bound parameters > - bounds are now more like regular arguments; it makes their > processing in expand easier > - functions with bounds not attached to any pointer are allowed First of all thanks for trying to work in this direction. Comments on the issues you encountered below (also CCed Honza as he should be more familiar with reachability and clone issues). > On simple codes this approach worked well but on a bigger tests some > issues were revealed. > > 1. Nodes reachability. Instrumented version is actually always > reachable when original function is reachable because it is always > emitted instead of the original. Thus I had to fix reachability > analysis to achieve it. Another similar problem is check whether node > can be removed after inline when inlining instrumented function. Not > hard to fix but probably other similar problems exist. I suppose you do not update the callgraph / the call stmts when creating the clones? Btw, is it desirable to inline the uninstrumented function and then instrument the result (thus run cloning and instrumentation after early inlining?)? Especially handling always_inlines before cloning/isntrumentation looks very sensible. > 2. Function processing order. Function processing order is determined > before early local passes. But during function instrumentation call > graph is modified significantly and used topological order becomes > outdated. That causes some troubles. E.g. function marked as 'always > inline' cannot be inlined because it is not in SSA form yet. Surely > inlining problem may be solved by just putting instrumentation after > early inline, but similar problem may exist in other passes too. To > resolve this problem I tried to split early local passes into three > parts. The first one builds SSA, the second one performs > instrumentation, the last one does the rest. Each part is performed on > all functions before the next one starts. Thus I get all functions in > SSA form and all instrumentation performed before starting early > optimizations. Unfortunately such passes order leads to invalid SSA > because of local_pure_const optimization affecting callers correctness > (in case caller SSA was built before optimization revealed 'pure' or > 'const' flag). Generally the processing order of early_local_passes is chosen to get better optimization - changing it shouldn't affect correctness and thus the issues you observe demand fixing anyway. (I've noted in the past that the early_local_passes processing order should more explicitely honor callgraph SCCs, eventually even iterating). Moving SSA build out of the early_local_passes and into a separate lowering stage is possible, the pure-const stuff is handled by keeping pass_fixup_cfg where it is now I think. In theory you can go into SSA form in all_lowering_passes already (but you have to adjust some pieces dealing with late created functions). Note that the idea of having stuff inside a single early-local-passes pipeline is that it increases cache locality and reduces memory usage by reclaiming functions no longer needed after inlining. > In general I feel that having special function version for > instrumentation has a better potential, should lead to less intrusive > changes in the compiler and better code quality. > > But before continue this implementation I would like to get some > feedback and probably some advice on how to order passes to get the > best result. Currently I incline to have all functions instrumented > before any local optimizations and solve pure_const problem by > modifying all callers when attribute is computed for some function. As noted above the pure_const proble is fixed by the fixup_cfg pass. Generally if instrumentation is not strictly required before inlining then doing cloning and instrumentation as a "real" IPA pass would be the way to go. You'd create clones during IPA analysis, fixup clones during IPA propagate (IPA inline should notify you of any decisions it makes), and instantiate and instrument the remaining required clones at IPA transform time. Generally if doing it before early opts is not too intrusive making it a real IPA pass later shouldn't be too difficult (heh ...). Honza may have more comments here, of course. Thanks, Richard. > Thanks, > Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-01-14 11:46 ` Richard Biener @ 2014-01-14 12:47 ` Ilya Enkovich 2014-01-14 14:04 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Ilya Enkovich @ 2014-01-14 12:47 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka 2014/1/14 Richard Biener <richard.guenther@gmail.com>: > On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> Hi, >> >> I've been working for some time on the prototype of the Pointer Bounds >> Checker which uses function clones for instrumentation >> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After >> several experiments with this approach I want to share my results and >> ask for some feedback to make a decision about the future steps. >> >> Firstly I want to remind the reasons for digging in this direction. In >> the original approach bounds of call arguments and input parameters >> are associated with arguments via special built-in calls. It creates >> implicit data flow compiler is not aware about which confuses some >> optimizations resulting in miss-optimization and breaks bounds data >> flow. Thus optimizations have to be fixed to get better pointers >> protection. >> >> Clones approach does not use special built-in function calls to >> associate bounds with call arguments and input parameters. Each >> function which should be instrumented gets an additional version and >> only this special version will be instrumented.This new version gets >> additional bound arguments to express input bounds. When function call >> is instrumented, it is redirected to instrumented version and all >> bounds are passed as explicit call arguments. Thus we have explicit >> pointer bounds flow similar to regular function parameters. It should >> allow to avoid changes in optimization, avoid miss-optimizations, >> allow existing IPA optimizations to work with bound args (e.g. >> propagate constant bounds value and remove checks in called function). >> >> I made a prototype implementation of this approach in the following way: >> >> - Add new IPA pass before early local passes to produce versions for >> all functions to be instrumented. >> - Put instrumentation pass after SSA pass. >> - Add new pass after IPA passes to remove bodies of functions which >> have instrumented versions. Function nodes may still be required for >> calls in not instrumented code. But we do not emit this code and >> therefore function bodies are not needed. >> >> Positive changes are: >> >> - IPA optimizations are not confused by bound parameters >> - bounds are now more like regular arguments; it makes their >> processing in expand easier >> - functions with bounds not attached to any pointer are allowed > > First of all thanks for trying to work in this direction. Comments on the > issues you encountered below (also CCed Honza as he should be more > familiar with reachability and clone issues). > >> On simple codes this approach worked well but on a bigger tests some >> issues were revealed. >> >> 1. Nodes reachability. Instrumented version is actually always >> reachable when original function is reachable because it is always >> emitted instead of the original. Thus I had to fix reachability >> analysis to achieve it. Another similar problem is check whether node >> can be removed after inline when inlining instrumented function. Not >> hard to fix but probably other similar problems exist. > > I suppose you do not update the callgraph / the call stmts when > creating the clones? Btw, is it desirable to inline the uninstrumented > function and then instrument the result (thus run cloning and > instrumentation after early inlining?)? Especially handling always_inlines > before cloning/isntrumentation looks very sensible. Right. Created clones have the same code as the original function and therefore same cgraph edges. I suppose instrumentation after early inlining is OK and may be preferred because inline shouldn't lead to any losses of bounds information. I tried variant when instrumentation works right after early inlining but with cloning still before early local passes. In general it looked OK. > >> 2. Function processing order. Function processing order is determined >> before early local passes. But during function instrumentation call >> graph is modified significantly and used topological order becomes >> outdated. That causes some troubles. E.g. function marked as 'always >> inline' cannot be inlined because it is not in SSA form yet. Surely >> inlining problem may be solved by just putting instrumentation after >> early inline, but similar problem may exist in other passes too. To >> resolve this problem I tried to split early local passes into three >> parts. The first one builds SSA, the second one performs >> instrumentation, the last one does the rest. Each part is performed on >> all functions before the next one starts. Thus I get all functions in >> SSA form and all instrumentation performed before starting early >> optimizations. Unfortunately such passes order leads to invalid SSA >> because of local_pure_const optimization affecting callers correctness >> (in case caller SSA was built before optimization revealed 'pure' or >> 'const' flag). > > Generally the processing order of early_local_passes is chosen > to get better optimization - changing it shouldn't affect correctness > and thus the issues you observe demand fixing anyway. > (I've noted in the past that the early_local_passes processing order > should more explicitely honor callgraph SCCs, eventually even iterating). > > Moving SSA build out of the early_local_passes and into a > separate lowering stage is possible, the pure-const stuff is > handled by keeping pass_fixup_cfg where it is now I think. > In theory you can go into SSA form in all_lowering_passes > already (but you have to adjust some pieces dealing with > late created functions). Currently pass_fixup_cfg is called at the beginning of the early local passes and therefore function processing order mattes much here. If caller is processed before the callee marked as pure then there is no additional pass_fixup_cfg for the caller after pure flag for callee is set. Probably additional IPA pass is required after early_local_passes to fixup all functions? > > Note that the idea of having stuff inside a single early-local-passes > pipeline is that it increases cache locality and reduces memory > usage by reclaiming functions no longer needed after inlining. > >> In general I feel that having special function version for >> instrumentation has a better potential, should lead to less intrusive >> changes in the compiler and better code quality. >> >> But before continue this implementation I would like to get some >> feedback and probably some advice on how to order passes to get the >> best result. Currently I incline to have all functions instrumented >> before any local optimizations and solve pure_const problem by >> modifying all callers when attribute is computed for some function. > > As noted above the pure_const proble is fixed by the fixup_cfg pass. > > Generally if instrumentation is not strictly required before inlining > then doing cloning and instrumentation as a "real" IPA pass would > be the way to go. You'd create clones during IPA analysis, > fixup clones during IPA propagate (IPA inline should notify you > of any decisions it makes), and instantiate and instrument the > remaining required clones at IPA transform time. Instrumentation is not required before inlining but is required before any optimization pass. Optimizations may change pointers flow and therefore break bounds flow, resulting in possible false bound violations. It means instrumentation has to be performed before early_optimizations. Thus I may split early_local_passes into three IPA passes: 1. build SSA + early_inline 2. Make instrumented versions + release bodies of original functions (for functions having instrumented version) 3. Perform early_optimizations. Thanks, Ilya > > Generally if doing it before early opts is not too intrusive making it > a real IPA pass later shouldn't be too difficult (heh ...). > > Honza may have more comments here, of course. > > Thanks, > Richard. > >> Thanks, >> Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-01-14 12:47 ` Ilya Enkovich @ 2014-01-14 14:04 ` Richard Biener 2014-01-14 14:36 ` Ilya Enkovich 0 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2014-01-14 14:04 UTC (permalink / raw) To: Ilya Enkovich; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka On Tue, Jan 14, 2014 at 1:47 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2014/1/14 Richard Biener <richard.guenther@gmail.com>: >> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> Hi, >>> >>> I've been working for some time on the prototype of the Pointer Bounds >>> Checker which uses function clones for instrumentation >>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After >>> several experiments with this approach I want to share my results and >>> ask for some feedback to make a decision about the future steps. >>> >>> Firstly I want to remind the reasons for digging in this direction. In >>> the original approach bounds of call arguments and input parameters >>> are associated with arguments via special built-in calls. It creates >>> implicit data flow compiler is not aware about which confuses some >>> optimizations resulting in miss-optimization and breaks bounds data >>> flow. Thus optimizations have to be fixed to get better pointers >>> protection. >>> >>> Clones approach does not use special built-in function calls to >>> associate bounds with call arguments and input parameters. Each >>> function which should be instrumented gets an additional version and >>> only this special version will be instrumented.This new version gets >>> additional bound arguments to express input bounds. When function call >>> is instrumented, it is redirected to instrumented version and all >>> bounds are passed as explicit call arguments. Thus we have explicit >>> pointer bounds flow similar to regular function parameters. It should >>> allow to avoid changes in optimization, avoid miss-optimizations, >>> allow existing IPA optimizations to work with bound args (e.g. >>> propagate constant bounds value and remove checks in called function). >>> >>> I made a prototype implementation of this approach in the following way: >>> >>> - Add new IPA pass before early local passes to produce versions for >>> all functions to be instrumented. >>> - Put instrumentation pass after SSA pass. >>> - Add new pass after IPA passes to remove bodies of functions which >>> have instrumented versions. Function nodes may still be required for >>> calls in not instrumented code. But we do not emit this code and >>> therefore function bodies are not needed. >>> >>> Positive changes are: >>> >>> - IPA optimizations are not confused by bound parameters >>> - bounds are now more like regular arguments; it makes their >>> processing in expand easier >>> - functions with bounds not attached to any pointer are allowed >> >> First of all thanks for trying to work in this direction. Comments on the >> issues you encountered below (also CCed Honza as he should be more >> familiar with reachability and clone issues). >> >>> On simple codes this approach worked well but on a bigger tests some >>> issues were revealed. >>> >>> 1. Nodes reachability. Instrumented version is actually always >>> reachable when original function is reachable because it is always >>> emitted instead of the original. Thus I had to fix reachability >>> analysis to achieve it. Another similar problem is check whether node >>> can be removed after inline when inlining instrumented function. Not >>> hard to fix but probably other similar problems exist. >> >> I suppose you do not update the callgraph / the call stmts when >> creating the clones? Btw, is it desirable to inline the uninstrumented >> function and then instrument the result (thus run cloning and >> instrumentation after early inlining?)? Especially handling always_inlines >> before cloning/isntrumentation looks very sensible. > > Right. Created clones have the same code as the original function and > therefore same cgraph edges. I suppose instrumentation after early > inlining is OK and may be preferred because inline shouldn't lead to > any losses of bounds information. I tried variant when instrumentation > works right after early inlining but with cloning still before early > local passes. In general it looked OK. > >> >>> 2. Function processing order. Function processing order is determined >>> before early local passes. But during function instrumentation call >>> graph is modified significantly and used topological order becomes >>> outdated. That causes some troubles. E.g. function marked as 'always >>> inline' cannot be inlined because it is not in SSA form yet. Surely >>> inlining problem may be solved by just putting instrumentation after >>> early inline, but similar problem may exist in other passes too. To >>> resolve this problem I tried to split early local passes into three >>> parts. The first one builds SSA, the second one performs >>> instrumentation, the last one does the rest. Each part is performed on >>> all functions before the next one starts. Thus I get all functions in >>> SSA form and all instrumentation performed before starting early >>> optimizations. Unfortunately such passes order leads to invalid SSA >>> because of local_pure_const optimization affecting callers correctness >>> (in case caller SSA was built before optimization revealed 'pure' or >>> 'const' flag). >> >> Generally the processing order of early_local_passes is chosen >> to get better optimization - changing it shouldn't affect correctness >> and thus the issues you observe demand fixing anyway. >> (I've noted in the past that the early_local_passes processing order >> should more explicitely honor callgraph SCCs, eventually even iterating). >> >> Moving SSA build out of the early_local_passes and into a >> separate lowering stage is possible, the pure-const stuff is >> handled by keeping pass_fixup_cfg where it is now I think. >> In theory you can go into SSA form in all_lowering_passes >> already (but you have to adjust some pieces dealing with >> late created functions). > > Currently pass_fixup_cfg is called at the beginning of the early local > passes and therefore function processing order mattes much here. If > caller is processed before the callee marked as pure then there is no > additional pass_fixup_cfg for the caller after pure flag for callee is > set. Probably additional IPA pass is required after early_local_passes > to fixup all functions? You probably need an additional fixup_cfg pass somewhere. >> Note that the idea of having stuff inside a single early-local-passes >> pipeline is that it increases cache locality and reduces memory >> usage by reclaiming functions no longer needed after inlining. >> >>> In general I feel that having special function version for >>> instrumentation has a better potential, should lead to less intrusive >>> changes in the compiler and better code quality. >>> >>> But before continue this implementation I would like to get some >>> feedback and probably some advice on how to order passes to get the >>> best result. Currently I incline to have all functions instrumented >>> before any local optimizations and solve pure_const problem by >>> modifying all callers when attribute is computed for some function. >> >> As noted above the pure_const proble is fixed by the fixup_cfg pass. >> >> Generally if instrumentation is not strictly required before inlining >> then doing cloning and instrumentation as a "real" IPA pass would >> be the way to go. You'd create clones during IPA analysis, >> fixup clones during IPA propagate (IPA inline should notify you >> of any decisions it makes), and instantiate and instrument the >> remaining required clones at IPA transform time. > > Instrumentation is not required before inlining but is required before > any optimization pass. Optimizations may change pointers flow and > therefore break bounds flow, resulting in possible false bound > violations. It means instrumentation has to be performed before > early_optimizations. Ah, ok ... > Thus I may split early_local_passes into three > IPA passes: > 1. build SSA + early_inline > 2. Make instrumented versions + release bodies of original functions > (for functions having instrumented version) > 3. Perform early_optimizations. That defeats the purpose and functioning of early inlining though. early inlining wants the callee to have gone through all early local passes - doing the above breaks this. So I suppose your original ordering is the only sensible thing unless somebody comes up with new great ideas ;) (eventually inlining always-inline functions can be split off and done earlier) Richard. > > Thanks, > Ilya > >> >> Generally if doing it before early opts is not too intrusive making it >> a real IPA pass later shouldn't be too difficult (heh ...). >> >> Honza may have more comments here, of course. >> >> Thanks, >> Richard. >> >>> Thanks, >>> Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-01-14 14:04 ` Richard Biener @ 2014-01-14 14:36 ` Ilya Enkovich 2014-01-30 12:12 ` Ilya Enkovich 0 siblings, 1 reply; 15+ messages in thread From: Ilya Enkovich @ 2014-01-14 14:36 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka 2014/1/14 Richard Biener <richard.guenther@gmail.com>: > On Tue, Jan 14, 2014 at 1:47 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2014/1/14 Richard Biener <richard.guenther@gmail.com>: >>> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> Hi, >>>> >>>> I've been working for some time on the prototype of the Pointer Bounds >>>> Checker which uses function clones for instrumentation >>>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After >>>> several experiments with this approach I want to share my results and >>>> ask for some feedback to make a decision about the future steps. >>>> >>>> Firstly I want to remind the reasons for digging in this direction. In >>>> the original approach bounds of call arguments and input parameters >>>> are associated with arguments via special built-in calls. It creates >>>> implicit data flow compiler is not aware about which confuses some >>>> optimizations resulting in miss-optimization and breaks bounds data >>>> flow. Thus optimizations have to be fixed to get better pointers >>>> protection. >>>> >>>> Clones approach does not use special built-in function calls to >>>> associate bounds with call arguments and input parameters. Each >>>> function which should be instrumented gets an additional version and >>>> only this special version will be instrumented.This new version gets >>>> additional bound arguments to express input bounds. When function call >>>> is instrumented, it is redirected to instrumented version and all >>>> bounds are passed as explicit call arguments. Thus we have explicit >>>> pointer bounds flow similar to regular function parameters. It should >>>> allow to avoid changes in optimization, avoid miss-optimizations, >>>> allow existing IPA optimizations to work with bound args (e.g. >>>> propagate constant bounds value and remove checks in called function). >>>> >>>> I made a prototype implementation of this approach in the following way: >>>> >>>> - Add new IPA pass before early local passes to produce versions for >>>> all functions to be instrumented. >>>> - Put instrumentation pass after SSA pass. >>>> - Add new pass after IPA passes to remove bodies of functions which >>>> have instrumented versions. Function nodes may still be required for >>>> calls in not instrumented code. But we do not emit this code and >>>> therefore function bodies are not needed. >>>> >>>> Positive changes are: >>>> >>>> - IPA optimizations are not confused by bound parameters >>>> - bounds are now more like regular arguments; it makes their >>>> processing in expand easier >>>> - functions with bounds not attached to any pointer are allowed >>> >>> First of all thanks for trying to work in this direction. Comments on the >>> issues you encountered below (also CCed Honza as he should be more >>> familiar with reachability and clone issues). >>> >>>> On simple codes this approach worked well but on a bigger tests some >>>> issues were revealed. >>>> >>>> 1. Nodes reachability. Instrumented version is actually always >>>> reachable when original function is reachable because it is always >>>> emitted instead of the original. Thus I had to fix reachability >>>> analysis to achieve it. Another similar problem is check whether node >>>> can be removed after inline when inlining instrumented function. Not >>>> hard to fix but probably other similar problems exist. >>> >>> I suppose you do not update the callgraph / the call stmts when >>> creating the clones? Btw, is it desirable to inline the uninstrumented >>> function and then instrument the result (thus run cloning and >>> instrumentation after early inlining?)? Especially handling always_inlines >>> before cloning/isntrumentation looks very sensible. >> >> Right. Created clones have the same code as the original function and >> therefore same cgraph edges. I suppose instrumentation after early >> inlining is OK and may be preferred because inline shouldn't lead to >> any losses of bounds information. I tried variant when instrumentation >> works right after early inlining but with cloning still before early >> local passes. In general it looked OK. >> >>> >>>> 2. Function processing order. Function processing order is determined >>>> before early local passes. But during function instrumentation call >>>> graph is modified significantly and used topological order becomes >>>> outdated. That causes some troubles. E.g. function marked as 'always >>>> inline' cannot be inlined because it is not in SSA form yet. Surely >>>> inlining problem may be solved by just putting instrumentation after >>>> early inline, but similar problem may exist in other passes too. To >>>> resolve this problem I tried to split early local passes into three >>>> parts. The first one builds SSA, the second one performs >>>> instrumentation, the last one does the rest. Each part is performed on >>>> all functions before the next one starts. Thus I get all functions in >>>> SSA form and all instrumentation performed before starting early >>>> optimizations. Unfortunately such passes order leads to invalid SSA >>>> because of local_pure_const optimization affecting callers correctness >>>> (in case caller SSA was built before optimization revealed 'pure' or >>>> 'const' flag). >>> >>> Generally the processing order of early_local_passes is chosen >>> to get better optimization - changing it shouldn't affect correctness >>> and thus the issues you observe demand fixing anyway. >>> (I've noted in the past that the early_local_passes processing order >>> should more explicitely honor callgraph SCCs, eventually even iterating). >>> >>> Moving SSA build out of the early_local_passes and into a >>> separate lowering stage is possible, the pure-const stuff is >>> handled by keeping pass_fixup_cfg where it is now I think. >>> In theory you can go into SSA form in all_lowering_passes >>> already (but you have to adjust some pieces dealing with >>> late created functions). >> >> Currently pass_fixup_cfg is called at the beginning of the early local >> passes and therefore function processing order mattes much here. If >> caller is processed before the callee marked as pure then there is no >> additional pass_fixup_cfg for the caller after pure flag for callee is >> set. Probably additional IPA pass is required after early_local_passes >> to fixup all functions? > > You probably need an additional fixup_cfg pass somewhere. > >>> Note that the idea of having stuff inside a single early-local-passes >>> pipeline is that it increases cache locality and reduces memory >>> usage by reclaiming functions no longer needed after inlining. >>> >>>> In general I feel that having special function version for >>>> instrumentation has a better potential, should lead to less intrusive >>>> changes in the compiler and better code quality. >>>> >>>> But before continue this implementation I would like to get some >>>> feedback and probably some advice on how to order passes to get the >>>> best result. Currently I incline to have all functions instrumented >>>> before any local optimizations and solve pure_const problem by >>>> modifying all callers when attribute is computed for some function. >>> >>> As noted above the pure_const proble is fixed by the fixup_cfg pass. >>> >>> Generally if instrumentation is not strictly required before inlining >>> then doing cloning and instrumentation as a "real" IPA pass would >>> be the way to go. You'd create clones during IPA analysis, >>> fixup clones during IPA propagate (IPA inline should notify you >>> of any decisions it makes), and instantiate and instrument the >>> remaining required clones at IPA transform time. >> >> Instrumentation is not required before inlining but is required before >> any optimization pass. Optimizations may change pointers flow and >> therefore break bounds flow, resulting in possible false bound >> violations. It means instrumentation has to be performed before >> early_optimizations. > > Ah, ok ... > >> Thus I may split early_local_passes into three >> IPA passes: >> 1. build SSA + early_inline >> 2. Make instrumented versions + release bodies of original functions >> (for functions having instrumented version) >> 3. Perform early_optimizations. > > That defeats the purpose and functioning of early inlining though. > early inlining wants the callee to have gone through all early local > passes - doing the above breaks this. > > So I suppose your original ordering is the only sensible thing unless > somebody comes up with new great ideas ;) Thanks for your comments! I'll continue my experiments with with my initial early_local_passes splitting. Will just put there original functions bodies release to avoid overhead for their useless optimizations. So, it will be 3 IPA passes: 1. SSA build 2. Make instrumented versions + release bodies of original functions 3. early_inline + early optimizations. Will be back with more results. Thanks, Ilya > > (eventually inlining always-inline functions can be split off and done > earlier) > > Richard. > >> >> Thanks, >> Ilya >> >>> >>> Generally if doing it before early opts is not too intrusive making it >>> a real IPA pass later shouldn't be too difficult (heh ...). >>> >>> Honza may have more comments here, of course. >>> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-01-14 14:36 ` Ilya Enkovich @ 2014-01-30 12:12 ` Ilya Enkovich 0 siblings, 0 replies; 15+ messages in thread From: Ilya Enkovich @ 2014-01-30 12:12 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka > > Thanks for your comments! > > I'll continue my experiments with with my initial early_local_passes > splitting. Will just put there original functions bodies release to > avoid overhead for their useless optimizations. So, it will be 3 IPA > passes: > > 1. SSA build > 2. Make instrumented versions + release bodies of original functions > 3. early_inline + early optimizations. > > Will be back with more results. > Hi, I've implemented the chosen approach and performed a set of experiments. Without LTO it mostly worked fine. Some new reachability analysis issues were met and I finally solved them by introducing a new type of IPA reference between the original function and it's instrumented version. Two significant problems were revealed during LTO tests. The first one was with assembler function names. With no LTO I just used the same assembler name for both original and instrumented function versions. With LTO assembler name becomes much more important identifier and it does not seems possible to share the same name between original and instrumented versions. I started to use suffix for instrumented function names and chained it with the original assembler name (used IDENTIFIER_TRANSPARENT_ALIAS flag, unfortunately not in all cases aliases chain is followed, so I had to additionally fix few places to have original names printed for the instrumented function decls). BTW LTO streamer does not preserve transparent aliases chain for identifiers. Is it intentional? The second problem was resolutions file from linker. Linker has no idea about connection between instrumented and not instrumented functions and therefore may declare instrumented functions as local when external calls to the original exist. It caused more reachability analysis issues (instrumented functions were removed) and visibility issue (instrumented functions were not marked as global). Visibility issue was simply solved by looking at original function decl before globalizing decl. In solving reachability problems I found useful to transform original functions to the special kind of thunks having call edge to the instrumented version. It guarantees we do not remove instrumented version when original is called. It also provides optimization opportunity for not instrumented calls (e.g. IPCP goes through thunks and may propagate constant params; will require additional support to map params correctly though). If my choices look reasonable I would continue with code cleanup, merge and preparing a patch series for stage 1. Thanks, Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-01-14 9:16 [RFC] Using function clones for Pointer Bounds Checker Ilya Enkovich 2014-01-14 11:46 ` Richard Biener @ 2014-05-12 22:12 ` Jeff Law 2014-05-13 8:38 ` Ilya Enkovich 1 sibling, 1 reply; 15+ messages in thread From: Jeff Law @ 2014-05-12 22:12 UTC (permalink / raw) To: Ilya Enkovich, gcc-patches; +Cc: Richard Biener, Zamyatin, Igor On 01/14/14 02:15, Ilya Enkovich wrote: > Hi, > > I've been working for some time on the prototype of the Pointer Bounds > Checker which uses function clones for instrumentation > (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After > several experiments with this approach I want to share my results and > ask for some feedback to make a decision about the future steps. Sorry for the delayed reply, when you sent this message I was ignoring anything that wasn't an open issue for GCC 4.9 -- it all gets thrown into a massive queue of things to come back to once stage1 reopens. So I really like the idea of trying to use function clones to simplify some of the details of implementing MPX support. > > Clones approach does not use special built-in function calls to > associate bounds with call arguments and input parameters. Each > function which should be instrumented gets an additional version and > only this special version will be instrumented. I like this. > This new version gets > additional bound arguments to express input bounds. When function call > is instrumented, it is redirected to instrumented version and all > bounds are passed as explicit call arguments. Thus we have explicit > pointer bounds flow similar to regular function parameters. It should > allow to avoid changes in optimization, avoid miss-optimizations, > allow existing IPA optimizations to work with bound args (e.g. > propagate constant bounds value and remove checks in called function). So from a linking standpoint, presumably you have to mangle the instrumented caller/callee in some manner. Right? Or are you dynamically dispatching somehow? > > 1. Nodes reachability. Instrumented version is actually always > reachable when original function is reachable because it is always > emitted instead of the original. Thus I had to fix reachability > analysis to achieve it. Another similar problem is check whether node > can be removed after inline when inlining instrumented function. Not > hard to fix but probably other similar problems exist. Jan is going to be the best person to discuss these issues with you. He's the architect behind most of the IPA bits we've got right now. >> > In general I feel that having special function version for > instrumentation has a better potential, should lead to less intrusive > changes in the compiler and better code quality. Agreed. It seems *much* cleaner. jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-12 22:12 ` Jeff Law @ 2014-05-13 8:38 ` Ilya Enkovich 2014-05-13 19:21 ` Jeff Law 0 siblings, 1 reply; 15+ messages in thread From: Ilya Enkovich @ 2014-05-13 8:38 UTC (permalink / raw) To: Jeff Law; +Cc: gcc-patches, Richard Biener, Zamyatin, Igor 2014-05-13 1:20 GMT+04:00 Jeff Law <law@redhat.com>: > On 01/14/14 02:15, Ilya Enkovich wrote: >> >> Hi, >> >> I've been working for some time on the prototype of the Pointer Bounds >> Checker which uses function clones for instrumentation >> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After >> several experiments with this approach I want to share my results and >> ask for some feedback to make a decision about the future steps. > > Sorry for the delayed reply, when you sent this message I was ignoring > anything that wasn't an open issue for GCC 4.9 -- it all gets thrown into a > massive queue of things to come back to once stage1 reopens. > > So I really like the idea of trying to use function clones to simplify some > of the details of implementing MPX support. > > >> >> Clones approach does not use special built-in function calls to >> associate bounds with call arguments and input parameters. Each >> function which should be instrumented gets an additional version and >> only this special version will be instrumented. > > I like this. > > > > >> This new version gets >> additional bound arguments to express input bounds. When function call >> is instrumented, it is redirected to instrumented version and all >> bounds are passed as explicit call arguments. Thus we have explicit >> pointer bounds flow similar to regular function parameters. It should >> allow to avoid changes in optimization, avoid miss-optimizations, >> allow existing IPA optimizations to work with bound args (e.g. >> propagate constant bounds value and remove checks in called function). > > So from a linking standpoint, presumably you have to mangle the instrumented > caller/callee in some manner. Right? Or are you dynamically dispatching > somehow? Originally the idea was o have instrumented clone to have the same assembler name as the original function. Since instrumented code is fully compatible with not instrumented code, we always emit only one version. Usage of the same assembler name allows instrumented and not instrumented calls to look similar in assembler. It worked fine until I tried it with LTO where assembler name is used as a unique identifier. With linker resolutions files it became even more harder to use such approach. To resolve these issues I started to use new assembler name with postfix, but linked with the original name using IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for clones and originals during compilation, but both clone and original functions have similar name in output assembler. > > > >> >> 1. Nodes reachability. Instrumented version is actually always >> reachable when original function is reachable because it is always >> emitted instead of the original. Thus I had to fix reachability >> analysis to achieve it. Another similar problem is check whether node >> can be removed after inline when inlining instrumented function. Not >> hard to fix but probably other similar problems exist. > > Jan is going to be the best person to discuss these issues with you. He's > the architect behind most of the IPA bits we've got right now. For now I managed to handle this via introducing a new type of thunk for original functions. If we have instrumented function it means we do not emit original one and therefore may remove its body. At this point I transform the original function into a new kind of thunk which has call edge to the instrumented version. It makes reachability analysis to work OK for instrumented functions with no modifications. The only problem here is to avoid function clones removal before instrumentation pass happens. I achieve it by forbidding to remove not instrumented instrumentation clones by adding new case into cgraph_can_remove_if_no_direct_calls_and_refs_p. > > >>> >> In general I feel that having special function version for >> instrumentation has a better potential, should lead to less intrusive >> changes in the compiler and better code quality. > > Agreed. It seems *much* cleaner. > > jeff Thanks for your review. Ilya ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-13 8:38 ` Ilya Enkovich @ 2014-05-13 19:21 ` Jeff Law 2014-05-14 8:18 ` Ilya Enkovich 0 siblings, 1 reply; 15+ messages in thread From: Jeff Law @ 2014-05-13 19:21 UTC (permalink / raw) To: Ilya Enkovich; +Cc: gcc-patches, Richard Biener, Zamyatin, Igor On 05/13/14 02:38, Ilya Enkovich wrote: >>> propagate constant bounds value and remove checks in called function). >> >> So from a linking standpoint, presumably you have to mangle the instrumented >> caller/callee in some manner. Right? Or are you dynamically dispatching >> somehow? > > Originally the idea was o have instrumented clone to have the same > assembler name as the original function. Since instrumented code is > fully compatible with not instrumented code, we always emit only one > version. Usage of the same assembler name allows instrumented and not > instrumented calls to look similar in assembler. It worked fine until > I tried it with LTO where assembler name is used as a unique > identifier. With linker resolutions files it became even more harder > to use such approach. To resolve these issues I started to use new > assembler name with postfix, but linked with the original name using > IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for > clones and originals during compilation, but both clone and original > functions have similar name in output assembler. OK. So if I read that correctly, it implies that the existence of bounds information does not change the signature of the callee. This is obviously important for C++. Sounds like I need to sit down with the branch and see how this works in the new scheme. jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-13 19:21 ` Jeff Law @ 2014-05-14 8:18 ` Ilya Enkovich 2014-05-14 15:09 ` H.J. Lu 0 siblings, 1 reply; 15+ messages in thread From: Ilya Enkovich @ 2014-05-14 8:18 UTC (permalink / raw) To: Jeff Law; +Cc: gcc-patches, Richard Biener, Zamyatin, Igor 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>: > On 05/13/14 02:38, Ilya Enkovich wrote: >>>> >>>> propagate constant bounds value and remove checks in called function). >>> >>> >>> So from a linking standpoint, presumably you have to mangle the >>> instrumented >>> caller/callee in some manner. Right? Or are you dynamically dispatching >>> somehow? >> >> >> Originally the idea was o have instrumented clone to have the same >> assembler name as the original function. Since instrumented code is >> fully compatible with not instrumented code, we always emit only one >> version. Usage of the same assembler name allows instrumented and not >> instrumented calls to look similar in assembler. It worked fine until >> I tried it with LTO where assembler name is used as a unique >> identifier. With linker resolutions files it became even more harder >> to use such approach. To resolve these issues I started to use new >> assembler name with postfix, but linked with the original name using >> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for >> clones and originals during compilation, but both clone and original >> functions have similar name in output assembler. > > OK. So if I read that correctly, it implies that the existence of bounds > information does not change the signature of the callee. This is obviously > important for C++. > > Sounds like I need to sit down with the branch and see how this works in the > new scheme. Both mpx branch and Wiki (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler) page are up-to-date now and may be tried out either in NOP mode or with simulator. Let me know if you have any troubles with using it. Ilya > > jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-14 8:18 ` Ilya Enkovich @ 2014-05-14 15:09 ` H.J. Lu 2014-05-15 9:03 ` Ilya Enkovich 2014-05-15 11:07 ` Ilya Enkovich 0 siblings, 2 replies; 15+ messages in thread From: H.J. Lu @ 2014-05-14 15:09 UTC (permalink / raw) To: Ilya Enkovich; +Cc: Jeff Law, gcc-patches, Richard Biener, Zamyatin, Igor On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>: >> On 05/13/14 02:38, Ilya Enkovich wrote: >>>>> >>>>> propagate constant bounds value and remove checks in called function). >>>> >>>> >>>> So from a linking standpoint, presumably you have to mangle the >>>> instrumented >>>> caller/callee in some manner. Right? Or are you dynamically dispatching >>>> somehow? >>> >>> >>> Originally the idea was o have instrumented clone to have the same >>> assembler name as the original function. Since instrumented code is >>> fully compatible with not instrumented code, we always emit only one >>> version. Usage of the same assembler name allows instrumented and not >>> instrumented calls to look similar in assembler. It worked fine until >>> I tried it with LTO where assembler name is used as a unique >>> identifier. With linker resolutions files it became even more harder >>> to use such approach. To resolve these issues I started to use new >>> assembler name with postfix, but linked with the original name using >>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for >>> clones and originals during compilation, but both clone and original >>> functions have similar name in output assembler. >> >> OK. So if I read that correctly, it implies that the existence of bounds >> information does not change the signature of the callee. This is obviously >> important for C++. >> >> Sounds like I need to sit down with the branch and see how this works in the >> new scheme. > > Both mpx branch and Wiki > (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler) > page are up-to-date now and may be tried out either in NOP mode or > with simulator. Let me know if you have any troubles with using it. > I built it. But "-fcheck-pointer-bounds -mmpx" doesn't generate MPX enabled executable which runs on both MPX-enabled and non MPX-enabled hardwares. I didn't see any MPX run-time library. -- H.J. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-14 15:09 ` H.J. Lu @ 2014-05-15 9:03 ` Ilya Enkovich 2014-05-15 11:07 ` Ilya Enkovich 1 sibling, 0 replies; 15+ messages in thread From: Ilya Enkovich @ 2014-05-15 9:03 UTC (permalink / raw) To: H.J. Lu; +Cc: Jeff Law, gcc-patches, Richard Biener, Zamyatin, Igor 2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>: > On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>: >>> On 05/13/14 02:38, Ilya Enkovich wrote: >>>>>> >>>>>> propagate constant bounds value and remove checks in called function). >>>>> >>>>> >>>>> So from a linking standpoint, presumably you have to mangle the >>>>> instrumented >>>>> caller/callee in some manner. Right? Or are you dynamically dispatching >>>>> somehow? >>>> >>>> >>>> Originally the idea was o have instrumented clone to have the same >>>> assembler name as the original function. Since instrumented code is >>>> fully compatible with not instrumented code, we always emit only one >>>> version. Usage of the same assembler name allows instrumented and not >>>> instrumented calls to look similar in assembler. It worked fine until >>>> I tried it with LTO where assembler name is used as a unique >>>> identifier. With linker resolutions files it became even more harder >>>> to use such approach. To resolve these issues I started to use new >>>> assembler name with postfix, but linked with the original name using >>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for >>>> clones and originals during compilation, but both clone and original >>>> functions have similar name in output assembler. >>> >>> OK. So if I read that correctly, it implies that the existence of bounds >>> information does not change the signature of the callee. This is obviously >>> important for C++. >>> >>> Sounds like I need to sit down with the branch and see how this works in the >>> new scheme. >> >> Both mpx branch and Wiki >> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler) >> page are up-to-date now and may be tried out either in NOP mode or >> with simulator. Let me know if you have any troubles with using it. >> > > I built it. But "-fcheck-pointer-bounds -mmpx" doesn't generate > MPX enabled executable which runs on both MPX-enabled and > non MPX-enabled hardwares. I didn't see any MPX run-time library. MPX run-time library is not a part of GCC now. It goes with Intel SDE. See https://software.intel.com/en-us/articles/using-intel-mpx-with-the-intel-software-development-emulator for usage instructions. Ilya > > -- > H.J. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-14 15:09 ` H.J. Lu 2014-05-15 9:03 ` Ilya Enkovich @ 2014-05-15 11:07 ` Ilya Enkovich 2014-05-15 11:27 ` Richard Biener 1 sibling, 1 reply; 15+ messages in thread From: Ilya Enkovich @ 2014-05-15 11:07 UTC (permalink / raw) To: H.J. Lu; +Cc: Jeff Law, gcc-patches, Richard Biener, Zamyatin, Igor 2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>: > On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>: >>> On 05/13/14 02:38, Ilya Enkovich wrote: >>>>>> >>>>>> propagate constant bounds value and remove checks in called function). >>>>> >>>>> >>>>> So from a linking standpoint, presumably you have to mangle the >>>>> instrumented >>>>> caller/callee in some manner. Right? Or are you dynamically dispatching >>>>> somehow? >>>> >>>> >>>> Originally the idea was o have instrumented clone to have the same >>>> assembler name as the original function. Since instrumented code is >>>> fully compatible with not instrumented code, we always emit only one >>>> version. Usage of the same assembler name allows instrumented and not >>>> instrumented calls to look similar in assembler. It worked fine until >>>> I tried it with LTO where assembler name is used as a unique >>>> identifier. With linker resolutions files it became even more harder >>>> to use such approach. To resolve these issues I started to use new >>>> assembler name with postfix, but linked with the original name using >>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for >>>> clones and originals during compilation, but both clone and original >>>> functions have similar name in output assembler. >>> >>> OK. So if I read that correctly, it implies that the existence of bounds >>> information does not change the signature of the callee. This is obviously >>> important for C++. >>> >>> Sounds like I need to sit down with the branch and see how this works in the >>> new scheme. >> >> Both mpx branch and Wiki >> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler) >> page are up-to-date now and may be tried out either in NOP mode or >> with simulator. Let me know if you have any troubles with using it. >> > > I built it. But "-fcheck-pointer-bounds -mmpx" doesn't generate > MPX enabled executable which runs on both MPX-enabled and > non MPX-enabled hardwares. I didn't see any MPX run-time library. Just checked out the branch and checked generated code. #cat test.c int test (int *p, int i) { return p[i]; } #gcc -fcheck-pointer-bounds -mmpx test.c -S -O2 #cat test.s .file "test.c" .section .text.unlikely,"ax",@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl test .type test, @function test: .LFB1: .cfi_startproc movslq %esi, %rsi leaq (%rdi,%rsi,4), %rax bndcl (%rax), %bnd0 bndcu 3(%rax), %bnd0 movl (%rax), %eax bnd ret .cfi_endproc ... Checks are here. What do you see in your test? Ilya > > -- > H.J. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-15 11:07 ` Ilya Enkovich @ 2014-05-15 11:27 ` Richard Biener 2014-05-15 12:10 ` Ilya Enkovich 0 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2014-05-15 11:27 UTC (permalink / raw) To: Ilya Enkovich; +Cc: H.J. Lu, Jeff Law, gcc-patches, Zamyatin, Igor On Thu, May 15, 2014 at 1:07 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > 2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>: >> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>: >>>> On 05/13/14 02:38, Ilya Enkovich wrote: >>>>>>> >>>>>>> propagate constant bounds value and remove checks in called function). >>>>>> >>>>>> >>>>>> So from a linking standpoint, presumably you have to mangle the >>>>>> instrumented >>>>>> caller/callee in some manner. Right? Or are you dynamically dispatching >>>>>> somehow? >>>>> >>>>> >>>>> Originally the idea was o have instrumented clone to have the same >>>>> assembler name as the original function. Since instrumented code is >>>>> fully compatible with not instrumented code, we always emit only one >>>>> version. Usage of the same assembler name allows instrumented and not >>>>> instrumented calls to look similar in assembler. It worked fine until >>>>> I tried it with LTO where assembler name is used as a unique >>>>> identifier. With linker resolutions files it became even more harder >>>>> to use such approach. To resolve these issues I started to use new >>>>> assembler name with postfix, but linked with the original name using >>>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for >>>>> clones and originals during compilation, but both clone and original >>>>> functions have similar name in output assembler. >>>> >>>> OK. So if I read that correctly, it implies that the existence of bounds >>>> information does not change the signature of the callee. This is obviously >>>> important for C++. >>>> >>>> Sounds like I need to sit down with the branch and see how this works in the >>>> new scheme. >>> >>> Both mpx branch and Wiki >>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler) >>> page are up-to-date now and may be tried out either in NOP mode or >>> with simulator. Let me know if you have any troubles with using it. >>> >> >> I built it. But "-fcheck-pointer-bounds -mmpx" doesn't generate >> MPX enabled executable which runs on both MPX-enabled and >> non MPX-enabled hardwares. I didn't see any MPX run-time library. > > Just checked out the branch and checked generated code. > > #cat test.c > int > test (int *p, int i) > { > return p[i]; > } > #gcc -fcheck-pointer-bounds -mmpx test.c -S -O2 > #cat test.s > .file "test.c" > .section .text.unlikely,"ax",@progbits > .LCOLDB0: > .text > .LHOTB0: > .p2align 4,,15 > .globl test > .type test, @function > test: > .LFB1: > .cfi_startproc > movslq %esi, %rsi > leaq (%rdi,%rsi,4), %rax > bndcl (%rax), %bnd0 > bndcu 3(%rax), %bnd0 > movl (%rax), %eax > bnd ret > .cfi_endproc > ... > > Checks are here. What do you see in your test? Wow, that's quite an overhead compared to the non-instrumented variant movslq %esi, %rsi movl (%rdi,%rsi,4), %eax ret I thought bounds-checking was done with some clever prefixes thus that movslq %esi, %rsi bndmovl (%rdi,%rsi,4), %eax, %bnd0 bnd ret would be possible (well, replace with valid ISA). Richard. > Ilya > >> >> -- >> H.J. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] Using function clones for Pointer Bounds Checker 2014-05-15 11:27 ` Richard Biener @ 2014-05-15 12:10 ` Ilya Enkovich 0 siblings, 0 replies; 15+ messages in thread From: Ilya Enkovich @ 2014-05-15 12:10 UTC (permalink / raw) To: Richard Biener; +Cc: H.J. Lu, Jeff Law, gcc-patches, Zamyatin, Igor 2014-05-15 15:27 GMT+04:00 Richard Biener <richard.guenther@gmail.com>: > On Thu, May 15, 2014 at 1:07 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> 2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>: >>> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >>>> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>: >>>>> On 05/13/14 02:38, Ilya Enkovich wrote: >>>>>>>> >>>>>>>> propagate constant bounds value and remove checks in called function). >>>>>>> >>>>>>> >>>>>>> So from a linking standpoint, presumably you have to mangle the >>>>>>> instrumented >>>>>>> caller/callee in some manner. Right? Or are you dynamically dispatching >>>>>>> somehow? >>>>>> >>>>>> >>>>>> Originally the idea was o have instrumented clone to have the same >>>>>> assembler name as the original function. Since instrumented code is >>>>>> fully compatible with not instrumented code, we always emit only one >>>>>> version. Usage of the same assembler name allows instrumented and not >>>>>> instrumented calls to look similar in assembler. It worked fine until >>>>>> I tried it with LTO where assembler name is used as a unique >>>>>> identifier. With linker resolutions files it became even more harder >>>>>> to use such approach. To resolve these issues I started to use new >>>>>> assembler name with postfix, but linked with the original name using >>>>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for >>>>>> clones and originals during compilation, but both clone and original >>>>>> functions have similar name in output assembler. >>>>> >>>>> OK. So if I read that correctly, it implies that the existence of bounds >>>>> information does not change the signature of the callee. This is obviously >>>>> important for C++. >>>>> >>>>> Sounds like I need to sit down with the branch and see how this works in the >>>>> new scheme. >>>> >>>> Both mpx branch and Wiki >>>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler) >>>> page are up-to-date now and may be tried out either in NOP mode or >>>> with simulator. Let me know if you have any troubles with using it. >>>> >>> >>> I built it. But "-fcheck-pointer-bounds -mmpx" doesn't generate >>> MPX enabled executable which runs on both MPX-enabled and >>> non MPX-enabled hardwares. I didn't see any MPX run-time library. >> >> Just checked out the branch and checked generated code. >> >> #cat test.c >> int >> test (int *p, int i) >> { >> return p[i]; >> } >> #gcc -fcheck-pointer-bounds -mmpx test.c -S -O2 >> #cat test.s >> .file "test.c" >> .section .text.unlikely,"ax",@progbits >> .LCOLDB0: >> .text >> .LHOTB0: >> .p2align 4,,15 >> .globl test >> .type test, @function >> test: >> .LFB1: >> .cfi_startproc >> movslq %esi, %rsi >> leaq (%rdi,%rsi,4), %rax >> bndcl (%rax), %bnd0 >> bndcu 3(%rax), %bnd0 >> movl (%rax), %eax >> bnd ret >> .cfi_endproc >> ... >> >> Checks are here. What do you see in your test? > > Wow, that's quite an overhead compared to the non-instrumented variant > > movslq %esi, %rsi > movl (%rdi,%rsi,4), %eax > ret > Overhead is actually two instructions - checks for lower and upper bounds. lea instruction is probably a miss-optimization. Checks are cheap instructions and do not introduce new dependencies for the load which is the heaviest here. BTW checks are not the main reason for overhead in an instrumented code, it is a bounds tables management (store/load bounds for stored/loaded pointers) which is. Anyway it is too early to speak about overhead until we have hardware to measure it. > I thought bounds-checking was done with some clever prefixes thus > that > > movslq %esi, %rsi > bndmovl (%rdi,%rsi,4), %eax, %bnd0 > bnd ret > > would be possible (well, replace with valid ISA). Doubt it would be possible to encode it keeping backward compatible with existing hardware. Also putting all logic into one instruction does not mean it is executed faster than a sequence of instructions, especially on out-of-order CPUs. Ilya > > Richard. > >> Ilya >> >>> >>> -- >>> H.J. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-05-15 12:10 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-01-14 9:16 [RFC] Using function clones for Pointer Bounds Checker Ilya Enkovich 2014-01-14 11:46 ` Richard Biener 2014-01-14 12:47 ` Ilya Enkovich 2014-01-14 14:04 ` Richard Biener 2014-01-14 14:36 ` Ilya Enkovich 2014-01-30 12:12 ` Ilya Enkovich 2014-05-12 22:12 ` Jeff Law 2014-05-13 8:38 ` Ilya Enkovich 2014-05-13 19:21 ` Jeff Law 2014-05-14 8:18 ` Ilya Enkovich 2014-05-14 15:09 ` H.J. Lu 2014-05-15 9:03 ` Ilya Enkovich 2014-05-15 11:07 ` Ilya Enkovich 2014-05-15 11:27 ` Richard Biener 2014-05-15 12:10 ` Ilya Enkovich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).