public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] Using function clones for Pointer Bounds Checker
@ 2014-01-14  9:16 Ilya Enkovich
  2014-01-14 11:46 ` Richard Biener
  2014-05-12 22:12 ` Jeff Law
  0 siblings, 2 replies; 15+ messages in thread
From: Ilya Enkovich @ 2014-01-14  9:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Jeff Law, Zamyatin, Igor

Hi,

I've been working for some time on the prototype of the Pointer Bounds
Checker which uses function clones for instrumentation
(http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
several experiments with this approach I want to share my results and
ask for some feedback to make a decision about the future steps.

Firstly I want to remind the reasons for digging in this direction. In
the original approach bounds of call arguments and input parameters
are associated with arguments via special built-in calls. It creates
implicit data flow compiler is not aware about which confuses some
optimizations resulting in miss-optimization and breaks bounds data
flow. Thus optimizations have to be fixed to get better pointers
protection.

Clones approach does not use special built-in function calls to
associate bounds with call arguments and input parameters. Each
function which should be instrumented gets an additional version and
only this special version will be instrumented.This new version gets
additional bound arguments to express input bounds. When function call
is instrumented, it is redirected to instrumented version and all
bounds are passed as explicit call arguments. Thus we have explicit
pointer bounds flow similar to regular function parameters. It should
allow to avoid changes in optimization, avoid miss-optimizations,
allow existing IPA optimizations to work with bound args (e.g.
propagate constant bounds value and remove checks in called function).

I made a prototype implementation of this approach in the following way:

- Add new IPA pass before early local passes to produce versions for
all functions to be instrumented.
- Put instrumentation pass after SSA pass.
- Add new pass after IPA passes to remove bodies of functions which
have instrumented versions. Function nodes may still be required for
calls in not instrumented code. But we do not emit this code and
therefore function bodies are not needed.

Positive changes are:

- IPA optimizations are not confused by bound parameters
- bounds are now more like regular arguments; it makes their
processing in expand easier
- functions with bounds not attached to any pointer are allowed

On simple codes this approach worked well but on a bigger tests some
issues were revealed.

1. Nodes reachability. Instrumented version is actually always
reachable when original function is reachable because it is always
emitted instead of the original. Thus I had to fix reachability
analysis to achieve it. Another similar problem is check whether node
can be removed after inline when inlining instrumented function. Not
hard to fix but probably other similar problems exist.

2. Function processing order. Function processing order is determined
before early local passes. But during function instrumentation call
graph is modified significantly and used topological order becomes
outdated. That causes some troubles. E.g. function marked as 'always
inline' cannot be inlined because it is not in SSA form yet. Surely
inlining problem may be solved by just putting instrumentation after
early inline, but similar problem may exist in other passes too. To
resolve this problem I tried to split early local passes into three
parts. The first one builds SSA, the second one performs
instrumentation, the last one does the rest. Each part is performed on
all functions before the next one starts. Thus I get all functions in
SSA form and all instrumentation performed before starting early
optimizations. Unfortunately such passes order leads to invalid SSA
because of local_pure_const optimization affecting callers correctness
(in case caller SSA was built before optimization revealed 'pure' or
'const' flag).

In general I feel that having special function version for
instrumentation has a better potential, should lead to less intrusive
changes in the compiler and better code quality.

But before continue this implementation I would like to get some
feedback and probably some advice on how to order passes to get the
best result. Currently I incline to have all functions instrumented
before any local optimizations and solve pure_const problem by
modifying all callers when attribute is computed for some function.

Thanks,
Ilya

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-01-14  9:16 [RFC] Using function clones for Pointer Bounds Checker Ilya Enkovich
@ 2014-01-14 11:46 ` Richard Biener
  2014-01-14 12:47   ` Ilya Enkovich
  2014-05-12 22:12 ` Jeff Law
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Biener @ 2014-01-14 11:46 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka

On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Hi,
>
> I've been working for some time on the prototype of the Pointer Bounds
> Checker which uses function clones for instrumentation
> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
> several experiments with this approach I want to share my results and
> ask for some feedback to make a decision about the future steps.
>
> Firstly I want to remind the reasons for digging in this direction. In
> the original approach bounds of call arguments and input parameters
> are associated with arguments via special built-in calls. It creates
> implicit data flow compiler is not aware about which confuses some
> optimizations resulting in miss-optimization and breaks bounds data
> flow. Thus optimizations have to be fixed to get better pointers
> protection.
>
> Clones approach does not use special built-in function calls to
> associate bounds with call arguments and input parameters. Each
> function which should be instrumented gets an additional version and
> only this special version will be instrumented.This new version gets
> additional bound arguments to express input bounds. When function call
> is instrumented, it is redirected to instrumented version and all
> bounds are passed as explicit call arguments. Thus we have explicit
> pointer bounds flow similar to regular function parameters. It should
> allow to avoid changes in optimization, avoid miss-optimizations,
> allow existing IPA optimizations to work with bound args (e.g.
> propagate constant bounds value and remove checks in called function).
>
> I made a prototype implementation of this approach in the following way:
>
> - Add new IPA pass before early local passes to produce versions for
> all functions to be instrumented.
> - Put instrumentation pass after SSA pass.
> - Add new pass after IPA passes to remove bodies of functions which
> have instrumented versions. Function nodes may still be required for
> calls in not instrumented code. But we do not emit this code and
> therefore function bodies are not needed.
>
> Positive changes are:
>
> - IPA optimizations are not confused by bound parameters
> - bounds are now more like regular arguments; it makes their
> processing in expand easier
> - functions with bounds not attached to any pointer are allowed

First of all thanks for trying to work in this direction.  Comments on the
issues you encountered below (also CCed Honza as he should be more
familiar with reachability and clone issues).

> On simple codes this approach worked well but on a bigger tests some
> issues were revealed.
>
> 1. Nodes reachability. Instrumented version is actually always
> reachable when original function is reachable because it is always
> emitted instead of the original. Thus I had to fix reachability
> analysis to achieve it. Another similar problem is check whether node
> can be removed after inline when inlining instrumented function. Not
> hard to fix but probably other similar problems exist.

I suppose you do not update the callgraph / the call stmts when
creating the clones?  Btw, is it desirable to inline the uninstrumented
function and then instrument the result (thus run cloning and
instrumentation after early inlining?)?  Especially handling always_inlines
before cloning/isntrumentation looks very sensible.

> 2. Function processing order. Function processing order is determined
> before early local passes. But during function instrumentation call
> graph is modified significantly and used topological order becomes
> outdated. That causes some troubles. E.g. function marked as 'always
> inline' cannot be inlined because it is not in SSA form yet. Surely
> inlining problem may be solved by just putting instrumentation after
> early inline, but similar problem may exist in other passes too. To
> resolve this problem I tried to split early local passes into three
> parts. The first one builds SSA, the second one performs
> instrumentation, the last one does the rest. Each part is performed on
> all functions before the next one starts. Thus I get all functions in
> SSA form and all instrumentation performed before starting early
> optimizations. Unfortunately such passes order leads to invalid SSA
> because of local_pure_const optimization affecting callers correctness
> (in case caller SSA was built before optimization revealed 'pure' or
> 'const' flag).

Generally the processing order of early_local_passes is chosen
to get better optimization - changing it shouldn't affect correctness
and thus the issues you observe demand fixing anyway.
(I've noted in the past that the early_local_passes processing order
should more explicitely honor callgraph SCCs, eventually even iterating).

Moving SSA build out of the early_local_passes and into a
separate lowering stage is possible, the pure-const stuff is
handled by keeping pass_fixup_cfg where it is now I think.
In theory you can go into SSA form in all_lowering_passes
already (but you have to adjust some pieces dealing with
late created functions).

Note that the idea of having stuff inside a single early-local-passes
pipeline is that it increases cache locality and reduces memory
usage by reclaiming functions no longer needed after inlining.

> In general I feel that having special function version for
> instrumentation has a better potential, should lead to less intrusive
> changes in the compiler and better code quality.
>
> But before continue this implementation I would like to get some
> feedback and probably some advice on how to order passes to get the
> best result. Currently I incline to have all functions instrumented
> before any local optimizations and solve pure_const problem by
> modifying all callers when attribute is computed for some function.

As noted above the pure_const proble is fixed by the fixup_cfg pass.

Generally if instrumentation is not strictly required before inlining
then doing cloning and instrumentation as a "real" IPA pass would
be the way to go.  You'd create clones during IPA analysis,
fixup clones during IPA propagate (IPA inline should notify you
of any decisions it makes), and instantiate and instrument the
remaining required clones at IPA transform time.

Generally if doing it before early opts is not too intrusive making it
a real IPA pass later shouldn't be too difficult (heh ...).

Honza may have more comments here, of course.

Thanks,
Richard.

> Thanks,
> Ilya

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-01-14 11:46 ` Richard Biener
@ 2014-01-14 12:47   ` Ilya Enkovich
  2014-01-14 14:04     ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Ilya Enkovich @ 2014-01-14 12:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka

2014/1/14 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> Hi,
>>
>> I've been working for some time on the prototype of the Pointer Bounds
>> Checker which uses function clones for instrumentation
>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
>> several experiments with this approach I want to share my results and
>> ask for some feedback to make a decision about the future steps.
>>
>> Firstly I want to remind the reasons for digging in this direction. In
>> the original approach bounds of call arguments and input parameters
>> are associated with arguments via special built-in calls. It creates
>> implicit data flow compiler is not aware about which confuses some
>> optimizations resulting in miss-optimization and breaks bounds data
>> flow. Thus optimizations have to be fixed to get better pointers
>> protection.
>>
>> Clones approach does not use special built-in function calls to
>> associate bounds with call arguments and input parameters. Each
>> function which should be instrumented gets an additional version and
>> only this special version will be instrumented.This new version gets
>> additional bound arguments to express input bounds. When function call
>> is instrumented, it is redirected to instrumented version and all
>> bounds are passed as explicit call arguments. Thus we have explicit
>> pointer bounds flow similar to regular function parameters. It should
>> allow to avoid changes in optimization, avoid miss-optimizations,
>> allow existing IPA optimizations to work with bound args (e.g.
>> propagate constant bounds value and remove checks in called function).
>>
>> I made a prototype implementation of this approach in the following way:
>>
>> - Add new IPA pass before early local passes to produce versions for
>> all functions to be instrumented.
>> - Put instrumentation pass after SSA pass.
>> - Add new pass after IPA passes to remove bodies of functions which
>> have instrumented versions. Function nodes may still be required for
>> calls in not instrumented code. But we do not emit this code and
>> therefore function bodies are not needed.
>>
>> Positive changes are:
>>
>> - IPA optimizations are not confused by bound parameters
>> - bounds are now more like regular arguments; it makes their
>> processing in expand easier
>> - functions with bounds not attached to any pointer are allowed
>
> First of all thanks for trying to work in this direction.  Comments on the
> issues you encountered below (also CCed Honza as he should be more
> familiar with reachability and clone issues).
>
>> On simple codes this approach worked well but on a bigger tests some
>> issues were revealed.
>>
>> 1. Nodes reachability. Instrumented version is actually always
>> reachable when original function is reachable because it is always
>> emitted instead of the original. Thus I had to fix reachability
>> analysis to achieve it. Another similar problem is check whether node
>> can be removed after inline when inlining instrumented function. Not
>> hard to fix but probably other similar problems exist.
>
> I suppose you do not update the callgraph / the call stmts when
> creating the clones?  Btw, is it desirable to inline the uninstrumented
> function and then instrument the result (thus run cloning and
> instrumentation after early inlining?)?  Especially handling always_inlines
> before cloning/isntrumentation looks very sensible.

Right. Created clones have the same code as the original function and
therefore same cgraph edges. I suppose instrumentation after early
inlining is OK and may be preferred because inline shouldn't lead to
any losses of bounds information. I tried variant when instrumentation
works right after early inlining but with cloning still before early
local passes. In general it looked OK.

>
>> 2. Function processing order. Function processing order is determined
>> before early local passes. But during function instrumentation call
>> graph is modified significantly and used topological order becomes
>> outdated. That causes some troubles. E.g. function marked as 'always
>> inline' cannot be inlined because it is not in SSA form yet. Surely
>> inlining problem may be solved by just putting instrumentation after
>> early inline, but similar problem may exist in other passes too. To
>> resolve this problem I tried to split early local passes into three
>> parts. The first one builds SSA, the second one performs
>> instrumentation, the last one does the rest. Each part is performed on
>> all functions before the next one starts. Thus I get all functions in
>> SSA form and all instrumentation performed before starting early
>> optimizations. Unfortunately such passes order leads to invalid SSA
>> because of local_pure_const optimization affecting callers correctness
>> (in case caller SSA was built before optimization revealed 'pure' or
>> 'const' flag).
>
> Generally the processing order of early_local_passes is chosen
> to get better optimization - changing it shouldn't affect correctness
> and thus the issues you observe demand fixing anyway.
> (I've noted in the past that the early_local_passes processing order
> should more explicitely honor callgraph SCCs, eventually even iterating).
>
> Moving SSA build out of the early_local_passes and into a
> separate lowering stage is possible, the pure-const stuff is
> handled by keeping pass_fixup_cfg where it is now I think.
> In theory you can go into SSA form in all_lowering_passes
> already (but you have to adjust some pieces dealing with
> late created functions).

Currently pass_fixup_cfg is called at the beginning of the early local
passes and therefore function processing order mattes much here. If
caller is processed before the callee marked as pure then there is no
additional pass_fixup_cfg for the caller after pure flag for callee is
set. Probably additional IPA pass is required after early_local_passes
to fixup all functions?

>
> Note that the idea of having stuff inside a single early-local-passes
> pipeline is that it increases cache locality and reduces memory
> usage by reclaiming functions no longer needed after inlining.
>
>> In general I feel that having special function version for
>> instrumentation has a better potential, should lead to less intrusive
>> changes in the compiler and better code quality.
>>
>> But before continue this implementation I would like to get some
>> feedback and probably some advice on how to order passes to get the
>> best result. Currently I incline to have all functions instrumented
>> before any local optimizations and solve pure_const problem by
>> modifying all callers when attribute is computed for some function.
>
> As noted above the pure_const proble is fixed by the fixup_cfg pass.
>
> Generally if instrumentation is not strictly required before inlining
> then doing cloning and instrumentation as a "real" IPA pass would
> be the way to go.  You'd create clones during IPA analysis,
> fixup clones during IPA propagate (IPA inline should notify you
> of any decisions it makes), and instantiate and instrument the
> remaining required clones at IPA transform time.

Instrumentation is not required before inlining but is required before
any optimization pass. Optimizations may change pointers flow and
therefore break bounds flow, resulting in possible false bound
violations. It means instrumentation has to be performed before
early_optimizations. Thus I may split early_local_passes into three
IPA passes:
  1. build SSA + early_inline
  2. Make instrumented versions + release bodies of original functions
(for functions having instrumented version)
  3. Perform early_optimizations.


Thanks,
Ilya

>
> Generally if doing it before early opts is not too intrusive making it
> a real IPA pass later shouldn't be too difficult (heh ...).
>
> Honza may have more comments here, of course.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Ilya

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-01-14 12:47   ` Ilya Enkovich
@ 2014-01-14 14:04     ` Richard Biener
  2014-01-14 14:36       ` Ilya Enkovich
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2014-01-14 14:04 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka

On Tue, Jan 14, 2014 at 1:47 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2014/1/14 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> Hi,
>>>
>>> I've been working for some time on the prototype of the Pointer Bounds
>>> Checker which uses function clones for instrumentation
>>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
>>> several experiments with this approach I want to share my results and
>>> ask for some feedback to make a decision about the future steps.
>>>
>>> Firstly I want to remind the reasons for digging in this direction. In
>>> the original approach bounds of call arguments and input parameters
>>> are associated with arguments via special built-in calls. It creates
>>> implicit data flow compiler is not aware about which confuses some
>>> optimizations resulting in miss-optimization and breaks bounds data
>>> flow. Thus optimizations have to be fixed to get better pointers
>>> protection.
>>>
>>> Clones approach does not use special built-in function calls to
>>> associate bounds with call arguments and input parameters. Each
>>> function which should be instrumented gets an additional version and
>>> only this special version will be instrumented.This new version gets
>>> additional bound arguments to express input bounds. When function call
>>> is instrumented, it is redirected to instrumented version and all
>>> bounds are passed as explicit call arguments. Thus we have explicit
>>> pointer bounds flow similar to regular function parameters. It should
>>> allow to avoid changes in optimization, avoid miss-optimizations,
>>> allow existing IPA optimizations to work with bound args (e.g.
>>> propagate constant bounds value and remove checks in called function).
>>>
>>> I made a prototype implementation of this approach in the following way:
>>>
>>> - Add new IPA pass before early local passes to produce versions for
>>> all functions to be instrumented.
>>> - Put instrumentation pass after SSA pass.
>>> - Add new pass after IPA passes to remove bodies of functions which
>>> have instrumented versions. Function nodes may still be required for
>>> calls in not instrumented code. But we do not emit this code and
>>> therefore function bodies are not needed.
>>>
>>> Positive changes are:
>>>
>>> - IPA optimizations are not confused by bound parameters
>>> - bounds are now more like regular arguments; it makes their
>>> processing in expand easier
>>> - functions with bounds not attached to any pointer are allowed
>>
>> First of all thanks for trying to work in this direction.  Comments on the
>> issues you encountered below (also CCed Honza as he should be more
>> familiar with reachability and clone issues).
>>
>>> On simple codes this approach worked well but on a bigger tests some
>>> issues were revealed.
>>>
>>> 1. Nodes reachability. Instrumented version is actually always
>>> reachable when original function is reachable because it is always
>>> emitted instead of the original. Thus I had to fix reachability
>>> analysis to achieve it. Another similar problem is check whether node
>>> can be removed after inline when inlining instrumented function. Not
>>> hard to fix but probably other similar problems exist.
>>
>> I suppose you do not update the callgraph / the call stmts when
>> creating the clones?  Btw, is it desirable to inline the uninstrumented
>> function and then instrument the result (thus run cloning and
>> instrumentation after early inlining?)?  Especially handling always_inlines
>> before cloning/isntrumentation looks very sensible.
>
> Right. Created clones have the same code as the original function and
> therefore same cgraph edges. I suppose instrumentation after early
> inlining is OK and may be preferred because inline shouldn't lead to
> any losses of bounds information. I tried variant when instrumentation
> works right after early inlining but with cloning still before early
> local passes. In general it looked OK.
>
>>
>>> 2. Function processing order. Function processing order is determined
>>> before early local passes. But during function instrumentation call
>>> graph is modified significantly and used topological order becomes
>>> outdated. That causes some troubles. E.g. function marked as 'always
>>> inline' cannot be inlined because it is not in SSA form yet. Surely
>>> inlining problem may be solved by just putting instrumentation after
>>> early inline, but similar problem may exist in other passes too. To
>>> resolve this problem I tried to split early local passes into three
>>> parts. The first one builds SSA, the second one performs
>>> instrumentation, the last one does the rest. Each part is performed on
>>> all functions before the next one starts. Thus I get all functions in
>>> SSA form and all instrumentation performed before starting early
>>> optimizations. Unfortunately such passes order leads to invalid SSA
>>> because of local_pure_const optimization affecting callers correctness
>>> (in case caller SSA was built before optimization revealed 'pure' or
>>> 'const' flag).
>>
>> Generally the processing order of early_local_passes is chosen
>> to get better optimization - changing it shouldn't affect correctness
>> and thus the issues you observe demand fixing anyway.
>> (I've noted in the past that the early_local_passes processing order
>> should more explicitely honor callgraph SCCs, eventually even iterating).
>>
>> Moving SSA build out of the early_local_passes and into a
>> separate lowering stage is possible, the pure-const stuff is
>> handled by keeping pass_fixup_cfg where it is now I think.
>> In theory you can go into SSA form in all_lowering_passes
>> already (but you have to adjust some pieces dealing with
>> late created functions).
>
> Currently pass_fixup_cfg is called at the beginning of the early local
> passes and therefore function processing order mattes much here. If
> caller is processed before the callee marked as pure then there is no
> additional pass_fixup_cfg for the caller after pure flag for callee is
> set. Probably additional IPA pass is required after early_local_passes
> to fixup all functions?

You probably need an additional fixup_cfg pass somewhere.

>> Note that the idea of having stuff inside a single early-local-passes
>> pipeline is that it increases cache locality and reduces memory
>> usage by reclaiming functions no longer needed after inlining.
>>
>>> In general I feel that having special function version for
>>> instrumentation has a better potential, should lead to less intrusive
>>> changes in the compiler and better code quality.
>>>
>>> But before continue this implementation I would like to get some
>>> feedback and probably some advice on how to order passes to get the
>>> best result. Currently I incline to have all functions instrumented
>>> before any local optimizations and solve pure_const problem by
>>> modifying all callers when attribute is computed for some function.
>>
>> As noted above the pure_const proble is fixed by the fixup_cfg pass.
>>
>> Generally if instrumentation is not strictly required before inlining
>> then doing cloning and instrumentation as a "real" IPA pass would
>> be the way to go.  You'd create clones during IPA analysis,
>> fixup clones during IPA propagate (IPA inline should notify you
>> of any decisions it makes), and instantiate and instrument the
>> remaining required clones at IPA transform time.
>
> Instrumentation is not required before inlining but is required before
> any optimization pass. Optimizations may change pointers flow and
> therefore break bounds flow, resulting in possible false bound
> violations. It means instrumentation has to be performed before
> early_optimizations.

Ah, ok ...

> Thus I may split early_local_passes into three
> IPA passes:
>   1. build SSA + early_inline
>   2. Make instrumented versions + release bodies of original functions
> (for functions having instrumented version)
>   3. Perform early_optimizations.

That defeats the purpose and functioning of early inlining though.
early inlining wants the callee to have gone through all early local
passes - doing the above breaks this.

So I suppose your original ordering is the only sensible thing unless
somebody comes up with new great ideas ;)

(eventually inlining always-inline functions can be split off and done
earlier)

Richard.

>
> Thanks,
> Ilya
>
>>
>> Generally if doing it before early opts is not too intrusive making it
>> a real IPA pass later shouldn't be too difficult (heh ...).
>>
>> Honza may have more comments here, of course.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Ilya

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-01-14 14:04     ` Richard Biener
@ 2014-01-14 14:36       ` Ilya Enkovich
  2014-01-30 12:12         ` Ilya Enkovich
  0 siblings, 1 reply; 15+ messages in thread
From: Ilya Enkovich @ 2014-01-14 14:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka

2014/1/14 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Jan 14, 2014 at 1:47 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2014/1/14 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> I've been working for some time on the prototype of the Pointer Bounds
>>>> Checker which uses function clones for instrumentation
>>>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
>>>> several experiments with this approach I want to share my results and
>>>> ask for some feedback to make a decision about the future steps.
>>>>
>>>> Firstly I want to remind the reasons for digging in this direction. In
>>>> the original approach bounds of call arguments and input parameters
>>>> are associated with arguments via special built-in calls. It creates
>>>> implicit data flow compiler is not aware about which confuses some
>>>> optimizations resulting in miss-optimization and breaks bounds data
>>>> flow. Thus optimizations have to be fixed to get better pointers
>>>> protection.
>>>>
>>>> Clones approach does not use special built-in function calls to
>>>> associate bounds with call arguments and input parameters. Each
>>>> function which should be instrumented gets an additional version and
>>>> only this special version will be instrumented.This new version gets
>>>> additional bound arguments to express input bounds. When function call
>>>> is instrumented, it is redirected to instrumented version and all
>>>> bounds are passed as explicit call arguments. Thus we have explicit
>>>> pointer bounds flow similar to regular function parameters. It should
>>>> allow to avoid changes in optimization, avoid miss-optimizations,
>>>> allow existing IPA optimizations to work with bound args (e.g.
>>>> propagate constant bounds value and remove checks in called function).
>>>>
>>>> I made a prototype implementation of this approach in the following way:
>>>>
>>>> - Add new IPA pass before early local passes to produce versions for
>>>> all functions to be instrumented.
>>>> - Put instrumentation pass after SSA pass.
>>>> - Add new pass after IPA passes to remove bodies of functions which
>>>> have instrumented versions. Function nodes may still be required for
>>>> calls in not instrumented code. But we do not emit this code and
>>>> therefore function bodies are not needed.
>>>>
>>>> Positive changes are:
>>>>
>>>> - IPA optimizations are not confused by bound parameters
>>>> - bounds are now more like regular arguments; it makes their
>>>> processing in expand easier
>>>> - functions with bounds not attached to any pointer are allowed
>>>
>>> First of all thanks for trying to work in this direction.  Comments on the
>>> issues you encountered below (also CCed Honza as he should be more
>>> familiar with reachability and clone issues).
>>>
>>>> On simple codes this approach worked well but on a bigger tests some
>>>> issues were revealed.
>>>>
>>>> 1. Nodes reachability. Instrumented version is actually always
>>>> reachable when original function is reachable because it is always
>>>> emitted instead of the original. Thus I had to fix reachability
>>>> analysis to achieve it. Another similar problem is check whether node
>>>> can be removed after inline when inlining instrumented function. Not
>>>> hard to fix but probably other similar problems exist.
>>>
>>> I suppose you do not update the callgraph / the call stmts when
>>> creating the clones?  Btw, is it desirable to inline the uninstrumented
>>> function and then instrument the result (thus run cloning and
>>> instrumentation after early inlining?)?  Especially handling always_inlines
>>> before cloning/isntrumentation looks very sensible.
>>
>> Right. Created clones have the same code as the original function and
>> therefore same cgraph edges. I suppose instrumentation after early
>> inlining is OK and may be preferred because inline shouldn't lead to
>> any losses of bounds information. I tried variant when instrumentation
>> works right after early inlining but with cloning still before early
>> local passes. In general it looked OK.
>>
>>>
>>>> 2. Function processing order. Function processing order is determined
>>>> before early local passes. But during function instrumentation call
>>>> graph is modified significantly and used topological order becomes
>>>> outdated. That causes some troubles. E.g. function marked as 'always
>>>> inline' cannot be inlined because it is not in SSA form yet. Surely
>>>> inlining problem may be solved by just putting instrumentation after
>>>> early inline, but similar problem may exist in other passes too. To
>>>> resolve this problem I tried to split early local passes into three
>>>> parts. The first one builds SSA, the second one performs
>>>> instrumentation, the last one does the rest. Each part is performed on
>>>> all functions before the next one starts. Thus I get all functions in
>>>> SSA form and all instrumentation performed before starting early
>>>> optimizations. Unfortunately such passes order leads to invalid SSA
>>>> because of local_pure_const optimization affecting callers correctness
>>>> (in case caller SSA was built before optimization revealed 'pure' or
>>>> 'const' flag).
>>>
>>> Generally the processing order of early_local_passes is chosen
>>> to get better optimization - changing it shouldn't affect correctness
>>> and thus the issues you observe demand fixing anyway.
>>> (I've noted in the past that the early_local_passes processing order
>>> should more explicitely honor callgraph SCCs, eventually even iterating).
>>>
>>> Moving SSA build out of the early_local_passes and into a
>>> separate lowering stage is possible, the pure-const stuff is
>>> handled by keeping pass_fixup_cfg where it is now I think.
>>> In theory you can go into SSA form in all_lowering_passes
>>> already (but you have to adjust some pieces dealing with
>>> late created functions).
>>
>> Currently pass_fixup_cfg is called at the beginning of the early local
>> passes and therefore function processing order mattes much here. If
>> caller is processed before the callee marked as pure then there is no
>> additional pass_fixup_cfg for the caller after pure flag for callee is
>> set. Probably additional IPA pass is required after early_local_passes
>> to fixup all functions?
>
> You probably need an additional fixup_cfg pass somewhere.
>
>>> Note that the idea of having stuff inside a single early-local-passes
>>> pipeline is that it increases cache locality and reduces memory
>>> usage by reclaiming functions no longer needed after inlining.
>>>
>>>> In general I feel that having special function version for
>>>> instrumentation has a better potential, should lead to less intrusive
>>>> changes in the compiler and better code quality.
>>>>
>>>> But before continue this implementation I would like to get some
>>>> feedback and probably some advice on how to order passes to get the
>>>> best result. Currently I incline to have all functions instrumented
>>>> before any local optimizations and solve pure_const problem by
>>>> modifying all callers when attribute is computed for some function.
>>>
>>> As noted above the pure_const proble is fixed by the fixup_cfg pass.
>>>
>>> Generally if instrumentation is not strictly required before inlining
>>> then doing cloning and instrumentation as a "real" IPA pass would
>>> be the way to go.  You'd create clones during IPA analysis,
>>> fixup clones during IPA propagate (IPA inline should notify you
>>> of any decisions it makes), and instantiate and instrument the
>>> remaining required clones at IPA transform time.
>>
>> Instrumentation is not required before inlining but is required before
>> any optimization pass. Optimizations may change pointers flow and
>> therefore break bounds flow, resulting in possible false bound
>> violations. It means instrumentation has to be performed before
>> early_optimizations.
>
> Ah, ok ...
>
>> Thus I may split early_local_passes into three
>> IPA passes:
>>   1. build SSA + early_inline
>>   2. Make instrumented versions + release bodies of original functions
>> (for functions having instrumented version)
>>   3. Perform early_optimizations.
>
> That defeats the purpose and functioning of early inlining though.
> early inlining wants the callee to have gone through all early local
> passes - doing the above breaks this.
>
> So I suppose your original ordering is the only sensible thing unless
> somebody comes up with new great ideas ;)

Thanks for your comments!

I'll continue my experiments with with my initial early_local_passes
splitting. Will just put there original functions bodies release to
avoid overhead for their useless optimizations. So, it will be 3 IPA
passes:

1. SSA build
2. Make instrumented versions + release bodies of original functions
3. early_inline + early optimizations.

Will be back with more results.

Thanks,
Ilya

>
> (eventually inlining always-inline functions can be split off and done
> earlier)
>
> Richard.
>
>>
>> Thanks,
>> Ilya
>>
>>>
>>> Generally if doing it before early opts is not too intrusive making it
>>> a real IPA pass later shouldn't be too difficult (heh ...).
>>>
>>> Honza may have more comments here, of course.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Ilya

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-01-14 14:36       ` Ilya Enkovich
@ 2014-01-30 12:12         ` Ilya Enkovich
  0 siblings, 0 replies; 15+ messages in thread
From: Ilya Enkovich @ 2014-01-30 12:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jeff Law, Zamyatin, Igor, Jan Hubicka

>
> Thanks for your comments!
>
> I'll continue my experiments with with my initial early_local_passes
> splitting. Will just put there original functions bodies release to
> avoid overhead for their useless optimizations. So, it will be 3 IPA
> passes:
>
> 1. SSA build
> 2. Make instrumented versions + release bodies of original functions
> 3. early_inline + early optimizations.
>
> Will be back with more results.
>

Hi,

I've implemented the chosen approach and performed a set of
experiments. Without LTO it mostly worked fine. Some new reachability
analysis  issues were met and I finally solved them by introducing a
new type of IPA reference between the original function and it's
instrumented version.

Two significant problems were revealed during LTO tests. The first one
was with assembler function names. With no LTO I just used the same
assembler name for both original and instrumented function versions.
With LTO assembler name becomes much more important identifier and it
does not seems possible to share the same name between original and
instrumented versions. I started to use suffix for instrumented
function names and chained it with the original assembler name (used
IDENTIFIER_TRANSPARENT_ALIAS flag, unfortunately not in all cases
aliases chain is followed, so I had to additionally fix few places to
have original names printed for the instrumented function decls).

BTW LTO streamer does not preserve transparent aliases chain for
identifiers. Is it intentional?

The second problem was resolutions file from linker. Linker has no
idea about connection between instrumented and not instrumented
functions and therefore may declare instrumented functions as local
when external calls to the original exist. It caused more reachability
analysis issues (instrumented functions were removed) and visibility
issue (instrumented functions were not marked as global). Visibility
issue was simply solved by looking at original function decl before
globalizing decl. In solving reachability problems I found useful to
transform original functions to the special kind of thunks having call
edge to the instrumented version. It guarantees we do not remove
instrumented version when original is called. It also provides
optimization opportunity for not instrumented calls (e.g. IPCP goes
through thunks and may propagate constant params; will require
additional support to map params correctly though).

If my choices look reasonable I would continue with code cleanup,
merge and preparing a patch series for stage 1.

Thanks,
Ilya

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-01-14  9:16 [RFC] Using function clones for Pointer Bounds Checker Ilya Enkovich
  2014-01-14 11:46 ` Richard Biener
@ 2014-05-12 22:12 ` Jeff Law
  2014-05-13  8:38   ` Ilya Enkovich
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Law @ 2014-05-12 22:12 UTC (permalink / raw)
  To: Ilya Enkovich, gcc-patches; +Cc: Richard Biener, Zamyatin, Igor

On 01/14/14 02:15, Ilya Enkovich wrote:
> Hi,
>
> I've been working for some time on the prototype of the Pointer Bounds
> Checker which uses function clones for instrumentation
> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
> several experiments with this approach I want to share my results and
> ask for some feedback to make a decision about the future steps.
Sorry for the delayed reply, when you sent this message I was ignoring 
anything that wasn't an open issue for GCC 4.9 -- it all gets thrown 
into a massive queue of things to come back to once stage1 reopens.

So I really like the idea of trying to use function clones to simplify 
some of the details of implementing MPX support.

>
> Clones approach does not use special built-in function calls to
> associate bounds with call arguments and input parameters. Each
> function which should be instrumented gets an additional version and
> only this special version will be instrumented.
I like this.



> This new version gets
> additional bound arguments to express input bounds. When function call
> is instrumented, it is redirected to instrumented version and all
> bounds are passed as explicit call arguments. Thus we have explicit
> pointer bounds flow similar to regular function parameters. It should
> allow to avoid changes in optimization, avoid miss-optimizations,
> allow existing IPA optimizations to work with bound args (e.g.
> propagate constant bounds value and remove checks in called function).
So from a linking standpoint, presumably you have to mangle the 
instrumented caller/callee in some manner.  Right?  Or are you 
dynamically dispatching somehow?


>
> 1. Nodes reachability. Instrumented version is actually always
> reachable when original function is reachable because it is always
> emitted instead of the original. Thus I had to fix reachability
> analysis to achieve it. Another similar problem is check whether node
> can be removed after inline when inlining instrumented function. Not
> hard to fix but probably other similar problems exist.
Jan is going to be the best person to discuss these issues with you. 
He's the architect behind most of the IPA bits we've got right now.

>>
> In general I feel that having special function version for
> instrumentation has a better potential, should lead to less intrusive
> changes in the compiler and better code quality.
Agreed.  It seems *much* cleaner.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-12 22:12 ` Jeff Law
@ 2014-05-13  8:38   ` Ilya Enkovich
  2014-05-13 19:21     ` Jeff Law
  0 siblings, 1 reply; 15+ messages in thread
From: Ilya Enkovich @ 2014-05-13  8:38 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Richard Biener, Zamyatin, Igor

2014-05-13 1:20 GMT+04:00 Jeff Law <law@redhat.com>:
> On 01/14/14 02:15, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> I've been working for some time on the prototype of the Pointer Bounds
>> Checker which uses function clones for instrumentation
>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After
>> several experiments with this approach I want to share my results and
>> ask for some feedback to make a decision about the future steps.
>
> Sorry for the delayed reply, when you sent this message I was ignoring
> anything that wasn't an open issue for GCC 4.9 -- it all gets thrown into a
> massive queue of things to come back to once stage1 reopens.
>
> So I really like the idea of trying to use function clones to simplify some
> of the details of implementing MPX support.
>
>
>>
>> Clones approach does not use special built-in function calls to
>> associate bounds with call arguments and input parameters. Each
>> function which should be instrumented gets an additional version and
>> only this special version will be instrumented.
>
> I like this.
>
>
>
>
>> This new version gets
>> additional bound arguments to express input bounds. When function call
>> is instrumented, it is redirected to instrumented version and all
>> bounds are passed as explicit call arguments. Thus we have explicit
>> pointer bounds flow similar to regular function parameters. It should
>> allow to avoid changes in optimization, avoid miss-optimizations,
>> allow existing IPA optimizations to work with bound args (e.g.
>> propagate constant bounds value and remove checks in called function).
>
> So from a linking standpoint, presumably you have to mangle the instrumented
> caller/callee in some manner.  Right?  Or are you dynamically dispatching
> somehow?

Originally the idea was o have instrumented clone to have the same
assembler name as the original function. Since instrumented code is
fully compatible with not instrumented code, we always emit only one
version. Usage of the same assembler name allows instrumented and not
instrumented calls to look similar in assembler. It worked fine until
I tried it with LTO where assembler name is used as a unique
identifier. With linker resolutions files it became even more harder
to use such approach. To resolve these issues I started to use new
assembler name with postfix, but linked with the original name using
IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
clones and originals during compilation, but both clone and original
functions have similar name in output assembler.

>
>
>
>>
>> 1. Nodes reachability. Instrumented version is actually always
>> reachable when original function is reachable because it is always
>> emitted instead of the original. Thus I had to fix reachability
>> analysis to achieve it. Another similar problem is check whether node
>> can be removed after inline when inlining instrumented function. Not
>> hard to fix but probably other similar problems exist.
>
> Jan is going to be the best person to discuss these issues with you. He's
> the architect behind most of the IPA bits we've got right now.

For now I managed to handle this via introducing a new type of thunk
for original functions. If we have instrumented function it means we
do not emit original one and therefore may remove its body. At this
point I transform the original function into a new kind of thunk which
has call edge to the instrumented version. It makes reachability
analysis to work OK for instrumented functions with no modifications.
The only problem here is to avoid function clones removal before
instrumentation pass happens. I achieve it by forbidding to remove not
instrumented instrumentation clones by adding new case into
cgraph_can_remove_if_no_direct_calls_and_refs_p.

>
>
>>>
>> In general I feel that having special function version for
>> instrumentation has a better potential, should lead to less intrusive
>> changes in the compiler and better code quality.
>
> Agreed.  It seems *much* cleaner.
>
> jeff

Thanks for your review.
Ilya

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-13  8:38   ` Ilya Enkovich
@ 2014-05-13 19:21     ` Jeff Law
  2014-05-14  8:18       ` Ilya Enkovich
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Law @ 2014-05-13 19:21 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: gcc-patches, Richard Biener, Zamyatin, Igor

On 05/13/14 02:38, Ilya Enkovich wrote:
>>> propagate constant bounds value and remove checks in called function).
>>
>> So from a linking standpoint, presumably you have to mangle the instrumented
>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>> somehow?
>
> Originally the idea was o have instrumented clone to have the same
> assembler name as the original function. Since instrumented code is
> fully compatible with not instrumented code, we always emit only one
> version. Usage of the same assembler name allows instrumented and not
> instrumented calls to look similar in assembler. It worked fine until
> I tried it with LTO where assembler name is used as a unique
> identifier. With linker resolutions files it became even more harder
> to use such approach. To resolve these issues I started to use new
> assembler name with postfix, but linked with the original name using
> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
> clones and originals during compilation, but both clone and original
> functions have similar name in output assembler.
OK.  So if I read that correctly, it implies that the existence of 
bounds information does not change the signature of the callee.   This 
is obviously important for C++.

Sounds like I need to sit down with the branch and see how this works in 
the new scheme.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-13 19:21     ` Jeff Law
@ 2014-05-14  8:18       ` Ilya Enkovich
  2014-05-14 15:09         ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Ilya Enkovich @ 2014-05-14  8:18 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Richard Biener, Zamyatin, Igor

2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>:
> On 05/13/14 02:38, Ilya Enkovich wrote:
>>>>
>>>> propagate constant bounds value and remove checks in called function).
>>>
>>>
>>> So from a linking standpoint, presumably you have to mangle the
>>> instrumented
>>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>>> somehow?
>>
>>
>> Originally the idea was o have instrumented clone to have the same
>> assembler name as the original function. Since instrumented code is
>> fully compatible with not instrumented code, we always emit only one
>> version. Usage of the same assembler name allows instrumented and not
>> instrumented calls to look similar in assembler. It worked fine until
>> I tried it with LTO where assembler name is used as a unique
>> identifier. With linker resolutions files it became even more harder
>> to use such approach. To resolve these issues I started to use new
>> assembler name with postfix, but linked with the original name using
>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
>> clones and originals during compilation, but both clone and original
>> functions have similar name in output assembler.
>
> OK.  So if I read that correctly, it implies that the existence of bounds
> information does not change the signature of the callee.   This is obviously
> important for C++.
>
> Sounds like I need to sit down with the branch and see how this works in the
> new scheme.

Both mpx branch and Wiki
(http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
page are up-to-date now and may be tried out either in NOP mode or
with simulator. Let me know if you have any troubles with using it.

Ilya

>
> jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-14  8:18       ` Ilya Enkovich
@ 2014-05-14 15:09         ` H.J. Lu
  2014-05-15  9:03           ` Ilya Enkovich
  2014-05-15 11:07           ` Ilya Enkovich
  0 siblings, 2 replies; 15+ messages in thread
From: H.J. Lu @ 2014-05-14 15:09 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: Jeff Law, gcc-patches, Richard Biener, Zamyatin, Igor

On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>:
>> On 05/13/14 02:38, Ilya Enkovich wrote:
>>>>>
>>>>> propagate constant bounds value and remove checks in called function).
>>>>
>>>>
>>>> So from a linking standpoint, presumably you have to mangle the
>>>> instrumented
>>>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>>>> somehow?
>>>
>>>
>>> Originally the idea was o have instrumented clone to have the same
>>> assembler name as the original function. Since instrumented code is
>>> fully compatible with not instrumented code, we always emit only one
>>> version. Usage of the same assembler name allows instrumented and not
>>> instrumented calls to look similar in assembler. It worked fine until
>>> I tried it with LTO where assembler name is used as a unique
>>> identifier. With linker resolutions files it became even more harder
>>> to use such approach. To resolve these issues I started to use new
>>> assembler name with postfix, but linked with the original name using
>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
>>> clones and originals during compilation, but both clone and original
>>> functions have similar name in output assembler.
>>
>> OK.  So if I read that correctly, it implies that the existence of bounds
>> information does not change the signature of the callee.   This is obviously
>> important for C++.
>>
>> Sounds like I need to sit down with the branch and see how this works in the
>> new scheme.
>
> Both mpx branch and Wiki
> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
> page are up-to-date now and may be tried out either in NOP mode or
> with simulator. Let me know if you have any troubles with using it.
>

I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
MPX enabled executable which runs on both MPX-enabled and
non MPX-enabled hardwares. I didn't see any MPX run-time library.

-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-14 15:09         ` H.J. Lu
@ 2014-05-15  9:03           ` Ilya Enkovich
  2014-05-15 11:07           ` Ilya Enkovich
  1 sibling, 0 replies; 15+ messages in thread
From: Ilya Enkovich @ 2014-05-15  9:03 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jeff Law, gcc-patches, Richard Biener, Zamyatin, Igor

2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>:
> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>:
>>> On 05/13/14 02:38, Ilya Enkovich wrote:
>>>>>>
>>>>>> propagate constant bounds value and remove checks in called function).
>>>>>
>>>>>
>>>>> So from a linking standpoint, presumably you have to mangle the
>>>>> instrumented
>>>>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>>>>> somehow?
>>>>
>>>>
>>>> Originally the idea was o have instrumented clone to have the same
>>>> assembler name as the original function. Since instrumented code is
>>>> fully compatible with not instrumented code, we always emit only one
>>>> version. Usage of the same assembler name allows instrumented and not
>>>> instrumented calls to look similar in assembler. It worked fine until
>>>> I tried it with LTO where assembler name is used as a unique
>>>> identifier. With linker resolutions files it became even more harder
>>>> to use such approach. To resolve these issues I started to use new
>>>> assembler name with postfix, but linked with the original name using
>>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
>>>> clones and originals during compilation, but both clone and original
>>>> functions have similar name in output assembler.
>>>
>>> OK.  So if I read that correctly, it implies that the existence of bounds
>>> information does not change the signature of the callee.   This is obviously
>>> important for C++.
>>>
>>> Sounds like I need to sit down with the branch and see how this works in the
>>> new scheme.
>>
>> Both mpx branch and Wiki
>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
>> page are up-to-date now and may be tried out either in NOP mode or
>> with simulator. Let me know if you have any troubles with using it.
>>
>
> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
> MPX enabled executable which runs on both MPX-enabled and
> non MPX-enabled hardwares. I didn't see any MPX run-time library.

MPX run-time library is not a part of GCC now. It goes with Intel SDE.
See https://software.intel.com/en-us/articles/using-intel-mpx-with-the-intel-software-development-emulator
for usage instructions.

Ilya
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-14 15:09         ` H.J. Lu
  2014-05-15  9:03           ` Ilya Enkovich
@ 2014-05-15 11:07           ` Ilya Enkovich
  2014-05-15 11:27             ` Richard Biener
  1 sibling, 1 reply; 15+ messages in thread
From: Ilya Enkovich @ 2014-05-15 11:07 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jeff Law, gcc-patches, Richard Biener, Zamyatin, Igor

2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>:
> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>:
>>> On 05/13/14 02:38, Ilya Enkovich wrote:
>>>>>>
>>>>>> propagate constant bounds value and remove checks in called function).
>>>>>
>>>>>
>>>>> So from a linking standpoint, presumably you have to mangle the
>>>>> instrumented
>>>>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>>>>> somehow?
>>>>
>>>>
>>>> Originally the idea was o have instrumented clone to have the same
>>>> assembler name as the original function. Since instrumented code is
>>>> fully compatible with not instrumented code, we always emit only one
>>>> version. Usage of the same assembler name allows instrumented and not
>>>> instrumented calls to look similar in assembler. It worked fine until
>>>> I tried it with LTO where assembler name is used as a unique
>>>> identifier. With linker resolutions files it became even more harder
>>>> to use such approach. To resolve these issues I started to use new
>>>> assembler name with postfix, but linked with the original name using
>>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
>>>> clones and originals during compilation, but both clone and original
>>>> functions have similar name in output assembler.
>>>
>>> OK.  So if I read that correctly, it implies that the existence of bounds
>>> information does not change the signature of the callee.   This is obviously
>>> important for C++.
>>>
>>> Sounds like I need to sit down with the branch and see how this works in the
>>> new scheme.
>>
>> Both mpx branch and Wiki
>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
>> page are up-to-date now and may be tried out either in NOP mode or
>> with simulator. Let me know if you have any troubles with using it.
>>
>
> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
> MPX enabled executable which runs on both MPX-enabled and
> non MPX-enabled hardwares. I didn't see any MPX run-time library.

Just checked out the branch and checked generated code.

#cat test.c
int
test (int *p, int i)
{
  return p[i];
}
#gcc -fcheck-pointer-bounds -mmpx test.c -S -O2
#cat test.s
        .file   "test.c"
        .section        .text.unlikely,"ax",@progbits
.LCOLDB0:
        .text
.LHOTB0:
        .p2align 4,,15
        .globl  test
        .type   test, @function
test:
.LFB1:
        .cfi_startproc
        movslq  %esi, %rsi
        leaq    (%rdi,%rsi,4), %rax
        bndcl   (%rax), %bnd0
        bndcu   3(%rax), %bnd0
        movl    (%rax), %eax
        bnd ret
        .cfi_endproc
...

Checks are here. What do you see in your test?

Ilya

>
> --
> H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-15 11:07           ` Ilya Enkovich
@ 2014-05-15 11:27             ` Richard Biener
  2014-05-15 12:10               ` Ilya Enkovich
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2014-05-15 11:27 UTC (permalink / raw)
  To: Ilya Enkovich; +Cc: H.J. Lu, Jeff Law, gcc-patches, Zamyatin, Igor

On Thu, May 15, 2014 at 1:07 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>:
>> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>:
>>>> On 05/13/14 02:38, Ilya Enkovich wrote:
>>>>>>>
>>>>>>> propagate constant bounds value and remove checks in called function).
>>>>>>
>>>>>>
>>>>>> So from a linking standpoint, presumably you have to mangle the
>>>>>> instrumented
>>>>>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>>>>>> somehow?
>>>>>
>>>>>
>>>>> Originally the idea was o have instrumented clone to have the same
>>>>> assembler name as the original function. Since instrumented code is
>>>>> fully compatible with not instrumented code, we always emit only one
>>>>> version. Usage of the same assembler name allows instrumented and not
>>>>> instrumented calls to look similar in assembler. It worked fine until
>>>>> I tried it with LTO where assembler name is used as a unique
>>>>> identifier. With linker resolutions files it became even more harder
>>>>> to use such approach. To resolve these issues I started to use new
>>>>> assembler name with postfix, but linked with the original name using
>>>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
>>>>> clones and originals during compilation, but both clone and original
>>>>> functions have similar name in output assembler.
>>>>
>>>> OK.  So if I read that correctly, it implies that the existence of bounds
>>>> information does not change the signature of the callee.   This is obviously
>>>> important for C++.
>>>>
>>>> Sounds like I need to sit down with the branch and see how this works in the
>>>> new scheme.
>>>
>>> Both mpx branch and Wiki
>>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
>>> page are up-to-date now and may be tried out either in NOP mode or
>>> with simulator. Let me know if you have any troubles with using it.
>>>
>>
>> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
>> MPX enabled executable which runs on both MPX-enabled and
>> non MPX-enabled hardwares. I didn't see any MPX run-time library.
>
> Just checked out the branch and checked generated code.
>
> #cat test.c
> int
> test (int *p, int i)
> {
>   return p[i];
> }
> #gcc -fcheck-pointer-bounds -mmpx test.c -S -O2
> #cat test.s
>         .file   "test.c"
>         .section        .text.unlikely,"ax",@progbits
> .LCOLDB0:
>         .text
> .LHOTB0:
>         .p2align 4,,15
>         .globl  test
>         .type   test, @function
> test:
> .LFB1:
>         .cfi_startproc
>         movslq  %esi, %rsi
>         leaq    (%rdi,%rsi,4), %rax
>         bndcl   (%rax), %bnd0
>         bndcu   3(%rax), %bnd0
>         movl    (%rax), %eax
>         bnd ret
>         .cfi_endproc
> ...
>
> Checks are here. What do you see in your test?

Wow, that's quite an overhead compared to the non-instrumented variant

        movslq  %esi, %rsi
        movl    (%rdi,%rsi,4), %eax
        ret

I thought bounds-checking was done with some clever prefixes thus
that

        movslq  %esi, %rsi
        bndmovl    (%rdi,%rsi,4), %eax, %bnd0
        bnd ret

would be possible (well, replace with valid ISA).

Richard.

> Ilya
>
>>
>> --
>> H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Using function clones for Pointer Bounds Checker
  2014-05-15 11:27             ` Richard Biener
@ 2014-05-15 12:10               ` Ilya Enkovich
  0 siblings, 0 replies; 15+ messages in thread
From: Ilya Enkovich @ 2014-05-15 12:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: H.J. Lu, Jeff Law, gcc-patches, Zamyatin, Igor

2014-05-15 15:27 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, May 15, 2014 at 1:07 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>> 2014-05-14 19:09 GMT+04:00 H.J. Lu <hjl.tools@gmail.com>:
>>> On Wed, May 14, 2014 at 1:18 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2014-05-13 23:21 GMT+04:00 Jeff Law <law@redhat.com>:
>>>>> On 05/13/14 02:38, Ilya Enkovich wrote:
>>>>>>>>
>>>>>>>> propagate constant bounds value and remove checks in called function).
>>>>>>>
>>>>>>>
>>>>>>> So from a linking standpoint, presumably you have to mangle the
>>>>>>> instrumented
>>>>>>> caller/callee in some manner.  Right?  Or are you dynamically dispatching
>>>>>>> somehow?
>>>>>>
>>>>>>
>>>>>> Originally the idea was o have instrumented clone to have the same
>>>>>> assembler name as the original function. Since instrumented code is
>>>>>> fully compatible with not instrumented code, we always emit only one
>>>>>> version. Usage of the same assembler name allows instrumented and not
>>>>>> instrumented calls to look similar in assembler. It worked fine until
>>>>>> I tried it with LTO where assembler name is used as a unique
>>>>>> identifier. With linker resolutions files it became even more harder
>>>>>> to use such approach. To resolve these issues I started to use new
>>>>>> assembler name with postfix, but linked with the original name using
>>>>>> IDENTIFIER_TRANSPARENT_ALIAS. It gives different assembler names for
>>>>>> clones and originals during compilation, but both clone and original
>>>>>> functions have similar name in output assembler.
>>>>>
>>>>> OK.  So if I read that correctly, it implies that the existence of bounds
>>>>> information does not change the signature of the callee.   This is obviously
>>>>> important for C++.
>>>>>
>>>>> Sounds like I need to sit down with the branch and see how this works in the
>>>>> new scheme.
>>>>
>>>> Both mpx branch and Wiki
>>>> (http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler)
>>>> page are up-to-date now and may be tried out either in NOP mode or
>>>> with simulator. Let me know if you have any troubles with using it.
>>>>
>>>
>>> I built it.  But "-fcheck-pointer-bounds -mmpx" doesn't generate
>>> MPX enabled executable which runs on both MPX-enabled and
>>> non MPX-enabled hardwares. I didn't see any MPX run-time library.
>>
>> Just checked out the branch and checked generated code.
>>
>> #cat test.c
>> int
>> test (int *p, int i)
>> {
>>   return p[i];
>> }
>> #gcc -fcheck-pointer-bounds -mmpx test.c -S -O2
>> #cat test.s
>>         .file   "test.c"
>>         .section        .text.unlikely,"ax",@progbits
>> .LCOLDB0:
>>         .text
>> .LHOTB0:
>>         .p2align 4,,15
>>         .globl  test
>>         .type   test, @function
>> test:
>> .LFB1:
>>         .cfi_startproc
>>         movslq  %esi, %rsi
>>         leaq    (%rdi,%rsi,4), %rax
>>         bndcl   (%rax), %bnd0
>>         bndcu   3(%rax), %bnd0
>>         movl    (%rax), %eax
>>         bnd ret
>>         .cfi_endproc
>> ...
>>
>> Checks are here. What do you see in your test?
>
> Wow, that's quite an overhead compared to the non-instrumented variant
>
>         movslq  %esi, %rsi
>         movl    (%rdi,%rsi,4), %eax
>         ret
>

Overhead is actually two instructions - checks for lower and upper
bounds. lea instruction is probably a miss-optimization. Checks are
cheap instructions and do not introduce new dependencies for the load
which is the heaviest here. BTW checks are not the main reason for
overhead in an instrumented code, it is a bounds tables management
(store/load bounds for stored/loaded pointers) which is. Anyway it is
too early to speak about overhead until we have hardware to measure
it.

> I thought bounds-checking was done with some clever prefixes thus
> that
>
>         movslq  %esi, %rsi
>         bndmovl    (%rdi,%rsi,4), %eax, %bnd0
>         bnd ret
>
> would be possible (well, replace with valid ISA).

Doubt it would be possible to encode it keeping backward compatible
with existing hardware. Also putting all logic into one instruction
does not mean it is executed faster than a sequence of instructions,
especially on out-of-order CPUs.

Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> --
>>> H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-15 12:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-14  9:16 [RFC] Using function clones for Pointer Bounds Checker Ilya Enkovich
2014-01-14 11:46 ` Richard Biener
2014-01-14 12:47   ` Ilya Enkovich
2014-01-14 14:04     ` Richard Biener
2014-01-14 14:36       ` Ilya Enkovich
2014-01-30 12:12         ` Ilya Enkovich
2014-05-12 22:12 ` Jeff Law
2014-05-13  8:38   ` Ilya Enkovich
2014-05-13 19:21     ` Jeff Law
2014-05-14  8:18       ` Ilya Enkovich
2014-05-14 15:09         ` H.J. Lu
2014-05-15  9:03           ` Ilya Enkovich
2014-05-15 11:07           ` Ilya Enkovich
2014-05-15 11:27             ` Richard Biener
2014-05-15 12:10               ` Ilya Enkovich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).