Programming model for tagged addresses

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Programming model for tagged addresses
@ 2021-05-07  8:24 Florian Weimer
  2021-05-07 10:38 ` Szabolcs Nagy
  2021-05-07 11:48 ` H.J. Lu
  0 siblings, 2 replies; 4+ messages in thread
From: Florian Weimer @ 2021-05-07  8:24 UTC (permalink / raw)
  To: libc-alpha

This is related to this bug:

  memmove doesn't work with tagged address
  <https://sourceware.org/bugzilla/show_bug.cgi?id=27828>

The bug is about detecting memory region overlap in the presence of
tagged addresses.  This problem exists also with address tagging
emulation using alias mappings.

If tags are fixed at allocation, I do not think these comparisons are a
problem.  The argument goes like this: Backwards vs forwards copy only
matters in case of overlap.  All pointers within the same top-level
object have the same tag, so the existing comparisons are fine.
Overlapping memmove between different top-level objects cannot happen
because top-level objects do not overlap.  So you have to copy multiple
objects to get an overlap, but that copies data between the objects as
well, which is necessarily undefined.

Things change when applications are expected to flip tag bits as they
see fit, including for pointers to subjects.  This leads to the question
whether it's valid to pass such tag-altered pointers to glibc functions
and system calls.  Many objects have significant addresses (mutex and
other synchronization objects, stdio streams), so the answer to that
isn't immediately obvious.

The next question is tag bits coming from glibc and the kernel are
always zero initially.  For example, for malloc, we currently use two
bits in the heap to classify chunks (main arena, non-main arena, mmap).
These bits do not change after allocation, so it is tempting to put them
into the pointer itself.  But this means that some of the tag bits are
lost for application use.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Programming model for tagged addresses
  2021-05-07  8:24 Programming model for tagged addresses Florian Weimer
@ 2021-05-07 10:38 ` Szabolcs Nagy
  2021-05-07 14:24   ` H.J. Lu
  2021-05-07 11:48 ` H.J. Lu
  1 sibling, 1 reply; 4+ messages in thread
From: Szabolcs Nagy @ 2021-05-07 10:38 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 05/07/2021 10:24, Florian Weimer via Libc-alpha wrote:
> This is related to this bug:
> 
>   memmove doesn't work with tagged address
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=27828>
> 
> The bug is about detecting memory region overlap in the presence of
> tagged addresses.  This problem exists also with address tagging
> emulation using alias mappings.
> 
> If tags are fixed at allocation, I do not think these comparisons are a
> problem.  The argument goes like this: Backwards vs forwards copy only
> matters in case of overlap.  All pointers within the same top-level
> object have the same tag, so the existing comparisons are fine.
> Overlapping memmove between different top-level objects cannot happen
> because top-level objects do not overlap.  So you have to copy multiple
> objects to get an overlap, but that copies data between the objects as
> well, which is necessarily undefined.
> 
> Things change when applications are expected to flip tag bits as they
> see fit, including for pointers to subjects.  This leads to the question
> whether it's valid to pass such tag-altered pointers to glibc functions
> and system calls.  Many objects have significant addresses (mutex and
> other synchronization objects, stdio streams), so the answer to that
> isn't immediately obvious.

thanks for bringing this up.

on aarch64 we also need to work out a heap tagging abi,
which necessarily relies on an address tagging abi.

we were already asked how suballocators can use tagging
i.e. fine grained memory tagging within a big malloced
chunk, and our answer so far was that is not allowed.
(our original concerns:
- libc internals assume one tag per malloc allocation,
  e.g. free can scan the entire range to check the tags.
- user code may use the malloc returned allocation as
  a whole as well as the suballocated objects separately
  and those two layers can't be mixed.
- we don't want to guarantee that tagging works on all
  malloc returned allocations, e.g. it makes sense to
  optimize large allocations to not use tagging just
  guard pages. without PROT_MTE, munmap can be faster.
- if user code wants to tag, it should use separate mmap.
  which implies munmap/madvise/.. are special: they need
  to cope with mixed tags. exact abi is TODO)

more generally the heap tagging abi so far relies on the
tags never changing during the lifetime of an object:
there is only one valid user pointer to an object and it
never changes.

for plain address tagging this may be too restrictive:
user code wants to tag pointers of existing objects,
when there may be pointers escaped with different tags.
this breaks c language semantics: pointer compares no
longer work (multiple different pointers may access the
same object and they compare unequal).

i think we need to either
- design a c language subset for tagged pointers and then
  ensure the libc follows that subset and supports user
  code that does so too,
- or only allow limited use of pointer tagging, with
  requirements like one pointer tag escaped per object.

> 
> The next question is tag bits coming from glibc and the kernel are
> always zero initially.  For example, for malloc, we currently use two
> bits in the heap to classify chunks (main arena, non-main arena, mmap).
> These bits do not change after allocation, so it is tempting to put them
> into the pointer itself.  But this means that some of the tag bits are
> lost for application use.

i think reserving tag bits/values for implementation use
is reasonable abi choice. so far we did not do that for
heap tagging because of the limited tag space and no
pressing need.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Programming model for tagged addresses
  2021-05-07  8:24 Programming model for tagged addresses Florian Weimer
  2021-05-07 10:38 ` Szabolcs Nagy
@ 2021-05-07 11:48 ` H.J. Lu
  1 sibling, 0 replies; 4+ messages in thread
From: H.J. Lu @ 2021-05-07 11:48 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

On Fri, May 7, 2021 at 2:33 AM Florian Weimer via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> This is related to this bug:
>
>   memmove doesn't work with tagged address
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=27828>
>
> The bug is about detecting memory region overlap in the presence of
> tagged addresses.  This problem exists also with address tagging
> emulation using alias mappings.
>
> If tags are fixed at allocation, I do not think these comparisons are a
> problem.  The argument goes like this: Backwards vs forwards copy only
> matters in case of overlap.  All pointers within the same top-level
> object have the same tag, so the existing comparisons are fine.
> Overlapping memmove between different top-level objects cannot happen
> because top-level objects do not overlap.  So you have to copy multiple
> objects to get an overlap, but that copies data between the objects as
> well, which is necessarily undefined.
>
> Things change when applications are expected to flip tag bits as they
> see fit, including for pointers to subjects.  This leads to the question
> whether it's valid to pass such tag-altered pointers to glibc functions
> and system calls.  Many objects have significant addresses (mutex and
> other synchronization objects, stdio streams), so the answer to that
> isn't immediately obvious.

It should be valid.  Otherwise, we don't need TBI nor LAM.   Glibc just
needs to be aware of the valid address bits used for address translation
and handle it properly.  BTW, kernel can handle tagged addresses today.

> The next question is tag bits coming from glibc and the kernel are
> always zero initially.  For example, for malloc, we currently use two
> bits in the heap to classify chunks (main arena, non-main arena, mmap).
> These bits do not change after allocation, so it is tempting to put them
> into the pointer itself.  But this means that some of the tag bits are
> lost for application use.

Applications may put tags in tagged bits on pointers returned by malloc
or mmap. Glibc should always clear the tag on pointers when operating
on such pointers if needed.

-- 
H.J.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Programming model for tagged addresses
  2021-05-07 10:38 ` Szabolcs Nagy
@ 2021-05-07 14:24   ` H.J. Lu
  0 siblings, 0 replies; 4+ messages in thread
From: H.J. Lu @ 2021-05-07 14:24 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Florian Weimer, GNU C Library

On Fri, May 7, 2021 at 5:16 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> The 05/07/2021 10:24, Florian Weimer via Libc-alpha wrote:
> > This is related to this bug:
> >
> >   memmove doesn't work with tagged address
> >   <https://sourceware.org/bugzilla/show_bug.cgi?id=27828>
> >
> > The bug is about detecting memory region overlap in the presence of
> > tagged addresses.  This problem exists also with address tagging
> > emulation using alias mappings.
> >
> > If tags are fixed at allocation, I do not think these comparisons are a
> > problem.  The argument goes like this: Backwards vs forwards copy only
> > matters in case of overlap.  All pointers within the same top-level
> > object have the same tag, so the existing comparisons are fine.
> > Overlapping memmove between different top-level objects cannot happen
> > because top-level objects do not overlap.  So you have to copy multiple
> > objects to get an overlap, but that copies data between the objects as
> > well, which is necessarily undefined.
> >
> > Things change when applications are expected to flip tag bits as they
> > see fit, including for pointers to subjects.  This leads to the question
> > whether it's valid to pass such tag-altered pointers to glibc functions
> > and system calls.  Many objects have significant addresses (mutex and
> > other synchronization objects, stdio streams), so the answer to that
> > isn't immediately obvious.
>
> thanks for bringing this up.
>
> on aarch64 we also need to work out a heap tagging abi,
> which necessarily relies on an address tagging abi.
>
> we were already asked how suballocators can use tagging
> i.e. fine grained memory tagging within a big malloced
> chunk, and our answer so far was that is not allowed.
> (our original concerns:
> - libc internals assume one tag per malloc allocation,
>   e.g. free can scan the entire range to check the tags.
> - user code may use the malloc returned allocation as
>   a whole as well as the suballocated objects separately
>   and those two layers can't be mixed.
> - we don't want to guarantee that tagging works on all
>   malloc returned allocations, e.g. it makes sense to
>   optimize large allocations to not use tagging just
>   guard pages. without PROT_MTE, munmap can be faster.
> - if user code wants to tag, it should use separate mmap.
>   which implies munmap/madvise/.. are special: they need
>   to cope with mixed tags. exact abi is TODO)
>
> more generally the heap tagging abi so far relies on the
> tags never changing during the lifetime of an object:
> there is only one valid user pointer to an object and it
> never changes.
>
> for plain address tagging this may be too restrictive:
> user code wants to tag pointers of existing objects,
> when there may be pointers escaped with different tags.
> this breaks c language semantics: pointer compares no
> longer work (multiple different pointers may access the
> same object and they compare unequal).
>
> i think we need to either
> - design a c language subset for tagged pointers and then
>   ensure the libc follows that subset and supports user
>   code that does so too,
> - or only allow limited use of pointer tagging, with
>   requirements like one pointer tag escaped per object.
>
> >
> > The next question is tag bits coming from glibc and the kernel are
> > always zero initially.  For example, for malloc, we currently use two
> > bits in the heap to classify chunks (main arena, non-main arena, mmap).
> > These bits do not change after allocation, so it is tempting to put them
> > into the pointer itself.  But this means that some of the tag bits are
> > lost for application use.
>
> i think reserving tag bits/values for implementation use
> is reasonable abi choice. so far we did not do that for
> heap tagging because of the limited tag space and no
> pressing need.

Our LAM work is blocked by the API issue.  That is why I proposed
<sys/tagged-address.h>:

https://sourceware.org/pipermail/libc-alpha/2021-April/125249.html

I'd like to see a solution in glibc 2.35.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-07 14:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-07  8:24 Programming model for tagged addresses Florian Weimer
2021-05-07 10:38 ` Szabolcs Nagy
2021-05-07 14:24   ` H.J. Lu
2021-05-07 11:48 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).