[RFC] AArch64 Memory tagging support

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [RFC] AArch64 Memory tagging support
@ 2019-08-21 10:39 Alan Hayward
  2019-08-21 16:34 ` John Baldwin
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Hayward @ 2019-08-21 10:39 UTC (permalink / raw)
  To: gdb-patches\@sourceware.org; +Cc: nd

This is a rough design for implementing ARMv8.5 MTE support in GDB,
detailing the UI changes and sketching out the internals.
The Linux interfaces (ptrace, coredumps etc) are currently still under
discussion, and so it will be quite a while before the GDB code is
implemented, but I wanted to get a design out early to ensure that the GDB
requirements from the Linux interfaces are known.

Any comments are welcome. At this stage I’m more concerned about the overall
strategy being workable.

Background

The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which allows
4bit tags to be assigned to each memory 16bytes of memory. Each allocation
is referred to as Allocated Tag (AT) in the text below. ATs are stored
separately to the main memory. When accessing a memory location, 4bits of
the address are reserved for use as a tag. This is referred to as a Logical
Tag (LT) in the text below. If the LT does not match the AT in a memory read
or write, then the access will trap.

For more details see the MTE links here:
https://developer.arm.com/architectures/cpu-architecture/a-profile#mte

For a very high-level overview see:
https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/

GDB UI: Memory Access

In the general use case, when using GDB to examine memory, GDB should print
out when a memory tag failure happens. However, the operation it was doing (for
example, reading/writing memory) should still succeed. A GDB user would not
expect a signal to be passed upwards to the subject program.

For example, x is an int* variable in the subject application and it contains
an address with an incorrect LT:

(gdb) print x             /* x contains an incorrect LT. */
$1 = 0x1234007c0
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$2 = 67
(gdb) set *x = 72
<incorrect memory tag 0x12 for address 0x1234007c0>
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$2 = 72

When printing areas of memory (for example with the command x) this warning
should only be printed once per dump.

(gdb) x/20xw y
0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
<incorrect memory tag 0x12 for address 0x1234007c0>
0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067

However, there will be instances where the GDB user wants to either suppress
any tag warning entirely or pass any errors upwards to the subject program as
a signal. GDB already has similar functionality available for signals using
the command handle. An Aarch64 only command "memtag” should be added for this.

(gdb) memtag handle
Memory tag failures will be printed
Memory tag failures will not raise a signal
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$1 = 67
(gdb) memtag handle noprint
Memory tag failures will not be printed
Memory tag failures will not raise a signal
(gdb) print *x
$2 = 67
(gdb) memtag handle raise
Memory tag failures will not be printed
Memory tag failures will raise a signal
(gdb) print *x
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Suggested arguments to "memtag handle" are "print", "noprint", "raise”,
"noraise”. This will only change the behaviour for memory tag failures
generated by the user inside GDB (ie this not affect inferior behaviour)

GDB UI: Examining Tags

The memtags command can also be used to read and write memory tags for a given
memory location. Also, we want to be able to read and write tags from a given
address.

(gdb) print x                               /* x contains an incorrect tag. */
$1 = 0x1234007c0
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$1 = 67
(gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the passed in pointer */
$2 = 0x12
(gdb) memtag showtag x        /* Show the AT for the memory address. Never returns errors if address contains the wrong LT.   */
$3 = 0x13
(gdb) memtag checktag x        /* Same as showtag, but also errors using the rules in "memtag handle".  */
<incorrect memory tag 0x12 for address 0x1234007c0>
$4 = 0x13
(gdb) memtag writetag x 0x12        /* Write the tag for the passed in memory address  */
(gdb) memtag checktag x
$5 = 0x12
(gdb) memtag writelogicaltag x 0x14        /* Update the tag in the pointer */
(gdb) print x                               /* x contains an incorrect tag. */
$1 = 0x1434007c0
(gdb) memtag checktag x
<incorrect memory tag 0x14 for address 0x1234007c0>

Linux Ptrace

Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
methods and /proc/<pid>/mem.

New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will be added
to read/write data tags. Peek will allow a range of tags to be read in a
single call.

Memory accesses inside GDB

It should be enough for AArch64 to override target_xfer_partial.
If the process is using memory tags, and the address contains a LT, then
call PEEKDATATAG for the memory range being accessed and check if the access
would succeed. If it doesn't then print just the first failure to the screen.
If it does succeed then call the overridden function to access the memory.

Core Dumps

There will be extra sections inside a core dump containing the memory tags.
The core low version of target_xfer_partial needs overriding. 
Similar to the xfer_partial override in the previous section, add
functionality to check tags, and report failures. Check the tags by
accessing the MTE segments in the corefile.  Memory is stored in the core
dump untagged, so addresses will need stripping before accessing.

Alan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] AArch64 Memory tagging support
  2019-08-21 10:39 [RFC] AArch64 Memory tagging support Alan Hayward
@ 2019-08-21 16:34 ` John Baldwin
  2019-08-22 10:31   ` Alan Hayward
  2020-04-13 14:57   ` Luis Machado
  0 siblings, 2 replies; 9+ messages in thread
From: John Baldwin @ 2019-08-21 16:34 UTC (permalink / raw)
  To: Alan Hayward, gdb-patches; +Cc: nd

On 8/21/19 3:39 AM, Alan Hayward wrote:
> This is a rough design for implementing ARMv8.5 MTE support in GDB,
> detailing the UI changes and sketching out the internals.
> The Linux interfaces (ptrace, coredumps etc) are currently still under
> discussion, and so it will be quite a while before the GDB code is
> implemented, but I wanted to get a design out early to ensure that the GDB
> requirements from the Linux interfaces are known.
> 
> Any comments are welcome. At this stage Iâ€™m more concerned about the overall
> strategy being workable.

I have several thoughts on this as I have a somewhat similar need for dealing
with memory tags, though slightly differently.  In my case, I work on a research
project called CHERI that assigns 1 bit tags to every 16-bytes (or in some cases
32-bytes) of memory as well as to certain registers.  I haven't yet really dealt
with tags in memory in my GDB patches to date, but will need to.  Also, in the
case of CHERI, we turn C and C++ pointers into 129-bit (128-bits plus the 1 bit
tag) where the extra 64 bits hold attributes like bounds and permissions of the
pointer, and the 1-bit tag determines validity.  You can ready more about it
at www.chericpu.com if you are curious.  We currently provide models of it on MIPS
and are bringing it up on RISC-V (simulations and FPGA).

To the extent that we can have somewhat generic tagging interface for GDB that
might cover sparc ADI as well, that might be nice.

> Background
> 
> The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which allows
> 4bit tags to be assigned to each memory 16bytes of memory. Each allocation
> is referred to as Allocated Tag (AT) in the text below. ATs are stored
> separately to the main memory. When accessing a memory location, 4bits of
> the address are reserved for use as a tag. This is referred to as a Logical
> Tag (LT) in the text below. If the LT does not match the AT in a memory read
> or write, then the access will trap.
> 
> For more details see the MTE links here:
> https://developer.arm.com/architectures/cpu-architecture/a-profile#mte
> 
> For a very high-level overview see:
> https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/
> 
> 
> GDB UI: Memory Access
> 
> In the general use case, when using GDB to examine memory, GDB should print
> out when a memory tag failure happens. However, the operation it was doing (for
> example, reading/writing memory) should still succeed. A GDB user would not
> expect a signal to be passed upwards to the subject program.
> 
> For example, x is an int* variable in the subject application and it contains
> an address with an incorrect LT:
> 
> (gdb) print x             /* x contains an incorrect LT. */
> $1 = 0x1234007c0
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $2 = 67
> (gdb) set *x = 72
> <incorrect memory tag 0x12 for address 0x1234007c0>
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $2 = 72

I would like to have something similar eventually where attempts to access an
out-of-bounds pointer would fail, but perhaps with some kind of override flag
(like p/r for disabling pretty-printers) to permit examining out-of-bounds
memory contents.  I think having the same type of override to "dump the memory
anyway, even if the tag is wrong" might be useful for users, though I agree
the default behavior should be to warn about invalid use.

> When printing areas of memory (for example with the command x) this warning
> should only be printed once per dump.
> 
> (gdb) x/20xw y
> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
> <incorrect memory tag 0x12 for address 0x1234007c0>
> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
> 0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
> 0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067

One other thing that might be nice to have is some kind of view of memory that
dumps tags and bytes in parallel, so something like:

(gdb) x/20xwt y
0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000 [0x13]
0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000 [0x12]

etc.

> However, there will be instances where the GDB user wants to either suppress
> any tag warning entirely or pass any errors upwards to the subject program as
> a signal. GDB already has similar functionality available for signals using
> the command handle. An Aarch64 only command "memtagâ€ should be added for this.
> 
> (gdb) memtag handle
> Memory tag failures will be printed
> Memory tag failures will not raise a signal
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $1 = 67
> (gdb) memtag handle noprint
> Memory tag failures will not be printed
> Memory tag failures will not raise a signal
> (gdb) print *x
> $2 = 67
> (gdb) memtag handle raise
> Memory tag failures will not be printed
> Memory tag failures will raise a signal
> (gdb) print *x
> Program terminated with signal SIGSEGV, Segmentation fault.
> The program no longer exists.
> 
> Suggested arguments to "memtag handle" are "print", "noprint", "raiseâ€,
> "noraiseâ€. This will only change the behaviour for memory tag failures
> generated by the user inside GDB (ie this not affect inferior behaviour)

Given that these features are somewhat MTE-specific, I would perhaps suggest
using 'mte' instead of 'memtag' for the name.

> GDB UI: Examining Tags
> 
> The memtags command can also be used to read and write memory tags for a given
> memory location. Also, we want to be able to read and write tags from a given
> address.
> 
> (gdb) print x                               /* x contains an incorrect tag. */
> $1 = 0x1234007c0
> (gdb) print *x
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $1 = 67
> (gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the passed in pointer */
> $2 = 0x12
> (gdb) memtag showtag x        /* Show the AT for the memory address. Never returns errors if address contains the wrong LT.   */
> $3 = 0x13
> (gdb) memtag checktag x        /* Same as showtag, but also errors using the rules in "memtag handle".  */
> <incorrect memory tag 0x12 for address 0x1234007c0>
> $4 = 0x13
> (gdb) memtag writetag x 0x12        /* Write the tag for the passed in memory address  */
> (gdb) memtag checktag x
> $5 = 0x12
> (gdb) memtag writelogicaltag x 0x14        /* Update the tag in the pointer */
> (gdb) print x                               /* x contains an incorrect tag. */
> $1 = 0x1434007c0
> (gdb) memtag checktag x
> <incorrect memory tag 0x14 for address 0x1234007c0>

I would perhaps also use 'mte' here.  'memtag showtag' might be generic to
memory tags in general, but the others are likely MTE-specific.

> Linux Ptrace
> 
> Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
> methods and /proc/<pid>/mem.
> 
> New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will be added
> to read/write data tags. Peek will allow a range of tags to be read in a
> single call.

On FreeBSD (we use a variant of FreeBSD for CHERI research) I had a somewhat
similar plan which was to add a new "address space" for PT_IO that returned
packed tag bits.

> Memory accesses inside GDB
> 
> It should be enough for AArch64 to override target_xfer_partial.
> If the process is using memory tags, and the address contains a LT, then
> call PEEKDATATAG for the memory range being accessed and check if the access
> would succeed. If it doesn't then print just the first failure to the screen.
> If it does succeed then call the overridden function to access the memory.> 
> 
> Core Dumps
> 
> There will be extra sections inside a core dump containing the memory tags.
> The core low version of target_xfer_partial needs overriding. 
> Similar to the xfer_partial override in the previous section, add
> functionality to check tags, and report failures. Check the tags by
> accessing the MTE segments in the corefile.  Memory is stored in the core
> dump untagged, so addresses will need stripping before accessing.

I am curious how you were planning to describe tags in cores.  I don't have
concrete thoughts yet but the approach I had been leaning towards was
having something similar to PT_LOAD, but perhaps PT_TAGS or the like whose
header would include "tag size" and "tag stride" and the contents of the
segment would be packed tag bits from a starting VA in the header.  This
would permit storing both 1-bit and 4-bit tags and would also in theory
support some other memory tagging schemes I'm aware of from some other
research.

One thing that I would like that you don't currently have a need for (though
perhaps the memory display mode I suggested above might need) is a way to
pass around a word of memory and it's tag together, perhaps as a single
'struct value'.  In my case I would like to have the tag associated with
either a register or memory present when printing pointers.  (I have a new
gdbarch method in my patches that prints pointer attributes and right now
it ignores the tags, but it would be nice to annotate untagged pointers
which in CHERI's case are not dereferencable.)

-- 
John Baldwin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] AArch64 Memory tagging support
  2019-08-21 16:34 ` John Baldwin
@ 2019-08-22 10:31   ` Alan Hayward
  2020-04-13 14:57   ` Luis Machado
  1 sibling, 0 replies; 9+ messages in thread
From: Alan Hayward @ 2019-08-22 10:31 UTC (permalink / raw)
  To: John Baldwin; +Cc: gdb-patches, nd



> On 21 Aug 2019, at 17:33, John Baldwin <jhb@FreeBSD.org> wrote:
> 
> On 8/21/19 3:39 AM, Alan Hayward wrote:
>> This is a rough design for implementing ARMv8.5 MTE support in GDB,
>> detailing the UI changes and sketching out the internals.
>> The Linux interfaces (ptrace, coredumps etc) are currently still under
>> discussion, and so it will be quite a while before the GDB code is
>> implemented, but I wanted to get a design out early to ensure that the GDB
>> requirements from the Linux interfaces are known.
>> 
>> Any comments are welcome. At this stage I’m more concerned about the overall
>> strategy being workable.
> 
> I have several thoughts on this as I have a somewhat similar need for dealing
> with memory tags, though slightly differently.  In my case, I work on a research
> project called CHERI that assigns 1 bit tags to every 16-bytes (or in some cases
> 32-bytes) of memory as well as to certain registers.  I haven't yet really dealt
> with tags in memory in my GDB patches to date, but will need to.  Also, in the
> case of CHERI, we turn C and C++ pointers into 129-bit (128-bits plus the 1 bit
> tag) where the extra 64 bits hold attributes like bounds and permissions of the
> pointer, and the 1-bit tag determines validity.  You can ready more about it
> at www.chericpu.com if you are curious.  We currently provide models of it on MIPS
> and are bringing it up on RISC-V (simulations and FPGA).
> 
> To the extent that we can have somewhat generic tagging interface for GDB that
> might cover sparc ADI as well, that might be nice.

Thanks for responding.

Agreed, where possible we should keep things common and reusable.

A disclaimer from myself is that we are quite far away from adding GDB support, as all
of this is dependent on Linux support being available (there hasn’t been any discussion
for the ptrace and core changes on the Linux lists yet). If you plan on adding any of
this to gdb for CHERI in the interim that’s fine with me. :)


> 
>> Background
>> 
>> The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which allows
>> 4bit tags to be assigned to each memory 16bytes of memory. Each allocation
>> is referred to as Allocated Tag (AT) in the text below. ATs are stored
>> separately to the main memory. When accessing a memory location, 4bits of
>> the address are reserved for use as a tag. This is referred to as a Logical
>> Tag (LT) in the text below. If the LT does not match the AT in a memory read
>> or write, then the access will trap.
>> 
>> For more details see the MTE links here:
>> https://developer.arm.com/architectures/cpu-architecture/a-profile#mte
>> 
>> For a very high-level overview see:
>> https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/
>> 
>> 
>> GDB UI: Memory Access
>> 
>> In the general use case, when using GDB to examine memory, GDB should print
>> out when a memory tag failure happens. However, the operation it was doing (for
>> example, reading/writing memory) should still succeed. A GDB user would not
>> expect a signal to be passed upwards to the subject program.
>> 
>> For example, x is an int* variable in the subject application and it contains
>> an address with an incorrect LT:
>> 
>> (gdb) print x             /* x contains an incorrect LT. */
>> $1 = 0x1234007c0
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $2 = 67
>> (gdb) set *x = 72
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $2 = 72
> 
> I would like to have something similar eventually where attempts to access an
> out-of-bounds pointer would fail, but perhaps with some kind of override flag
> (like p/r for disabling pretty-printers) to permit examining out-of-bounds
> memory contents.  I think having the same type of override to "dump the memory
> anyway, even if the tag is wrong" might be useful for users, though I agree
> the default behavior should be to warn about invalid use.

Ok, sounds like making this mechanism common is the way to go.

> 
>> When printing areas of memory (for example with the command x) this warning
>> should only be printed once per dump.
>> 
>> (gdb) x/20xw y
>> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
>> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
>> 0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
>> 0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067
> 
> One other thing that might be nice to have is some kind of view of memory that
> dumps tags and bytes in parallel, so something like:
> 
> (gdb) x/20xwt y
> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000 [0x13]
> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000 [0x12]
> 
> etc.

This would have to ensure a tag spanned exactly a single line or divided exactly
across multiple lines.  If so, then yes, it’d be useful.
Always printing the tags if the line lengths are right might be the way to go
(rather than enabling via a flag).

> 
>> However, there will be instances where the GDB user wants to either suppress
>> any tag warning entirely or pass any errors upwards to the subject program as
>> a signal. GDB already has similar functionality available for signals using
>> the command handle. An Aarch64 only command "memtag” should be added for this.
>> 
>> (gdb) memtag handle
>> Memory tag failures will be printed
>> Memory tag failures will not raise a signal
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $1 = 67
>> (gdb) memtag handle noprint
>> Memory tag failures will not be printed
>> Memory tag failures will not raise a signal
>> (gdb) print *x
>> $2 = 67
>> (gdb) memtag handle raise
>> Memory tag failures will not be printed
>> Memory tag failures will raise a signal
>> (gdb) print *x
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> The program no longer exists.
>> 
>> Suggested arguments to "memtag handle" are "print", "noprint", "raise”,
>> "noraise”. This will only change the behaviour for memory tag failures
>> generated by the user inside GDB (ie this not affect inferior behaviour)
> 
> Given that these features are somewhat MTE-specific, I would perhaps suggest
> using 'mte' instead of 'memtag' for the name.

I’m happy with that.

> 
>> GDB UI: Examining Tags
>> 
>> The memtags command can also be used to read and write memory tags for a given
>> memory location. Also, we want to be able to read and write tags from a given
>> address.
>> 
>> (gdb) print x                               /* x contains an incorrect tag. */
>> $1 = 0x1234007c0
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $1 = 67
>> (gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the passed in pointer */
>> $2 = 0x12
>> (gdb) memtag showtag x        /* Show the AT for the memory address. Never returns errors if address contains the wrong LT.   */
>> $3 = 0x13
>> (gdb) memtag checktag x        /* Same as showtag, but also errors using the rules in "memtag handle".  */
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $4 = 0x13
>> (gdb) memtag writetag x 0x12        /* Write the tag for the passed in memory address  */
>> (gdb) memtag checktag x
>> $5 = 0x12
>> (gdb) memtag writelogicaltag x 0x14        /* Update the tag in the pointer */
>> (gdb) print x                               /* x contains an incorrect tag. */
>> $1 = 0x1434007c0
>> (gdb) memtag checktag x
>> <incorrect memory tag 0x14 for address 0x1234007c0>
> 
> I would perhaps also use 'mte' here.  'memtag showtag' might be generic to
> memory tags in general, but the others are likely MTE-specific.

Ok.

> 
>> Linux Ptrace
>> 
>> Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
>> methods and /proc/<pid>/mem.
>> 
>> New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will be added
>> to read/write data tags. Peek will allow a range of tags to be read in a
>> single call.
> 
> On FreeBSD (we use a variant of FreeBSD for CHERI research) I had a somewhat
> similar plan which was to add a new "address space" for PT_IO that returned
> packed tag bits.
> 
>> Memory accesses inside GDB
>> 
>> It should be enough for AArch64 to override target_xfer_partial.
>> If the process is using memory tags, and the address contains a LT, then
>> call PEEKDATATAG for the memory range being accessed and check if the access
>> would succeed. If it doesn't then print just the first failure to the screen.
>> If it does succeed then call the overridden function to access the memory.> 
>> 
>> Core Dumps
>> 
>> There will be extra sections inside a core dump containing the memory tags.
>> The core low version of target_xfer_partial needs overriding. 
>> Similar to the xfer_partial override in the previous section, add
>> functionality to check tags, and report failures. Check the tags by
>> accessing the MTE segments in the corefile.  Memory is stored in the core
>> dump untagged, so addresses will need stripping before accessing.
> 
> I am curious how you were planning to describe tags in cores.  I don't have
> concrete thoughts yet but the approach I had been leaning towards was
> having something similar to PT_LOAD, but perhaps PT_TAGS or the like whose
> header would include "tag size" and "tag stride" and the contents of the
> segment would be packed tag bits from a starting VA in the header.  This
> would permit storing both 1-bit and 4-bit tags and would also in theory
> support some other memory tagging schemes I'm aware of from some other
> research.
> 

That’s roughly what we were thinking too, although we're a while away from
proposing anything more concrete. I’ll make sure your kept in the loop so that
we can get something that works for both our cases.


> One thing that I would like that you don't currently have a need for (though
> perhaps the memory display mode I suggested above might need) is a way to
> pass around a word of memory and it's tag together, perhaps as a single
> 'struct value'.  In my case I would like to have the tag associated with
> either a register or memory present when printing pointers.  (I have a new
> gdbarch method in my patches that prints pointer attributes and right now
> it ignores the tags, but it would be nice to annotate untagged pointers
> which in CHERI's case are not dereferencable.)
> 
> -- 
> John Baldwin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] AArch64 Memory tagging support
  2019-08-21 16:34 ` John Baldwin
  2019-08-22 10:31   ` Alan Hayward
@ 2020-04-13 14:57   ` Luis Machado
  2020-06-05 12:55     ` Luis Machado
  1 sibling, 1 reply; 9+ messages in thread
From: Luis Machado @ 2020-04-13 14:57 UTC (permalink / raw)
  To: John Baldwin, Alan Hayward, gdb-patches; +Cc: nd

Hi,

We have updated the design based on input.

On 8/21/19 1:33 PM, John Baldwin wrote:
> On 8/21/19 3:39 AM, Alan Hayward wrote:
>> This is a rough design for implementing ARMv8.5 MTE support in GDB,
>> detailing the UI changes and sketching out the internals.
>> The Linux interfaces (ptrace, coredumps etc) are currently still under
>> discussion, and so it will be quite a while before the GDB code is
>> implemented, but I wanted to get a design out early to ensure that the GDB
>> requirements from the Linux interfaces are known.
>>
>> Any comments are welcome. At this stage I’m more concerned about the overall
>> strategy being workable.
> 
> I have several thoughts on this as I have a somewhat similar need for dealing
> with memory tags, though slightly differently.  In my case, I work on a research
> project called CHERI that assigns 1 bit tags to every 16-bytes (or in some cases
> 32-bytes) of memory as well as to certain registers.  I haven't yet really dealt
> with tags in memory in my GDB patches to date, but will need to.  Also, in the
> case of CHERI, we turn C and C++ pointers into 129-bit (128-bits plus the 1 bit
> tag) where the extra 64 bits hold attributes like bounds and permissions of the
> pointer, and the 1-bit tag determines validity.  You can ready more about it
> at www.chericpu.com if you are curious.  We currently provide models of it on MIPS
> and are bringing it up on RISC-V (simulations and FPGA).

It sounds like CHERI could benefit from this memory tagging work for 
that particular tag bit. It would be stored in-pointer and would be 
validated the same way as MTE. We could add a customizable tag stride to 
account for variations in the memory ranges covered by tags.

MTE tags 16 bytes at a time. If CHERI needs to use 32 bytes, then it 
should be able to set a gdbarch variable. Would that work for you?

The other bits, however, would need another mechanism for verifying 
validity. Bounds and permissions in this case.

Alternatively, we could treat the whole additional 65 bits as a big 
arch-specific memory tag, which the architecture should decode when a 
memory access is attempted, and let GDB know if such access is valid or not.

> 
> To the extent that we can have somewhat generic tagging interface for GDB that
> might cover sparc ADI as well, that might be nice.
> 

It sounds like the design we currently have will be suitable for ADI as 
well, though i didn't look too deep into how Sparc handles the tags.

>> Background
>>
>> The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which allows
>> 4bit tags to be assigned to each memory 16bytes of memory. Each allocation
>> is referred to as Allocated Tag (AT) in the text below. ATs are stored
>> separately to the main memory. When accessing a memory location, 4bits of
>> the address are reserved for use as a tag. This is referred to as a Logical
>> Tag (LT) in the text below. If the LT does not match the AT in a memory read
>> or write, then the access will trap.
>>
>> For more details see the MTE links here:
>> https://developer.arm.com/architectures/cpu-architecture/a-profile#mte
>>
>> For a very high-level overview see:
>> https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/
>>
>>
>> GDB UI: Memory Access
>>
>> In the general use case, when using GDB to examine memory, GDB should print
>> out when a memory tag failure happens. However, the operation it was doing (for
>> example, reading/writing memory) should still succeed. A GDB user would not
>> expect a signal to be passed upwards to the subject program.
>>
>> For example, x is an int* variable in the subject application and it contains
>> an address with an incorrect LT:
>>
>> (gdb) print x             /* x contains an incorrect LT. */
>> $1 = 0x1234007c0
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $2 = 67
>> (gdb) set *x = 72
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $2 = 72
> 
> I would like to have something similar eventually where attempts to access an
> out-of-bounds pointer would fail, but perhaps with some kind of override flag
> (like p/r for disabling pretty-printers) to permit examining out-of-bounds
> memory contents.  I think having the same type of override to "dump the memory
> anyway, even if the tag is wrong" might be useful for users, though I agree
> the default behavior should be to warn about invalid use.

I think we could achieve this by using an on/off flag to allow/disallow 
validation of accesses to memory addresses.

As for validating bounds, it depends on whether we want to consider the 
bounds as part of a bigger "tag". This would simplify the code a bit.

> 
>> When printing areas of memory (for example with the command x) this warning
>> should only be printed once per dump.
>>
>> (gdb) x/20xw y
>> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
>> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
>> 0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
>> 0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067
> 
> One other thing that might be nice to have is some kind of view of memory that
> dumps tags and bytes in parallel, so something like:
> 
> (gdb) x/20xwt y
> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000 [0x13]
> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000 [0x12]
> 
> etc.
> 

I think this is doable. In the worst case we could print memory contents 
starting from the first aligned address in the range that contains the 
provided address. So, given a memory address memaddr, we would print 
data starting from (memaddr & tag_alignment).

If that's not desirable, we could print some 'x' bytes leading up to the 
desired address. This way things would be printed nicely, like so:

(gdb) x/20xwt y
0x1234007a0: 0xxxxxxxxx 0xxxxxxxxx 0xxx0a6425 0x00000000 [0x13]
0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
0x1234007c0: 0x00000040 0x00000003 0x0000xxxx 0xxxxxxxxx [0x12]

>> However, there will be instances where the GDB user wants to either suppress
>> any tag warning entirely or pass any errors upwards to the subject program as
>> a signal. GDB already has similar functionality available for signals using
>> the command handle. An Aarch64 only command "memtag” should be added for this.
>>
>> (gdb) memtag handle
>> Memory tag failures will be printed
>> Memory tag failures will not raise a signal
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $1 = 67
>> (gdb) memtag handle noprint
>> Memory tag failures will not be printed
>> Memory tag failures will not raise a signal
>> (gdb) print *x
>> $2 = 67
>> (gdb) memtag handle raise
>> Memory tag failures will not be printed
>> Memory tag failures will raise a signal
>> (gdb) print *x
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> The program no longer exists.
>>
>> Suggested arguments to "memtag handle" are "print", "noprint", "raise”,
>> "noraise”. This will only change the behaviour for memory tag failures
>> generated by the user inside GDB (ie this not affect inferior behaviour)
> 
> Given that these features are somewhat MTE-specific, I would perhaps suggest
> using 'mte' instead of 'memtag' for the name.
>   

We've switched to MTE now, though this command could be generic as well. 
Then other architectures implementing memory tagging wouldn't have to 
add their own commands to GDB. Thoughts?

With regards to handling the behavior of memory tagging in GDB, i think 
a switch to enable/disable validation would be a better alternative, 
like so:

set mte validation on/off (default on)

The "handle" command is more of a runtime command that tries to deal 
with received signals and what to do about them. Even if we decide to 
raise tag violations via "mte handle raise", they will still be filtered 
by "handle", and the SIGSEGV will likely generate a visible user stop.

Also, attempting to print memory won't trigger a SIGSEGV since this 
request goes through GDB through either ptrace or /proc/<pid>/mem. There 
is no inferior movement and thus no signals being delivered for that.

We should also watch out for the many memory accesses GDB does during 
unwinding and debug info reading. We don't want to keep seeing a lot of 
warnings about memory tag failures in that case.

I guess we'll need to exercise this and see how GDB behaves. Then either 
fix the code or temporarily disable memory tag validation when GDB 
attempts to read memory for those purposes.

>> GDB UI: Examining Tags
>>
>> The memtags command can also be used to read and write memory tags for a given
>> memory location. Also, we want to be able to read and write tags from a given
>> address.
>>
>> (gdb) print x                               /* x contains an incorrect tag. */
>> $1 = 0x1234007c0
>> (gdb) print *x
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $1 = 67
>> (gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the passed in pointer */
>> $2 = 0x12
>> (gdb) memtag showtag x        /* Show the AT for the memory address. Never returns errors if address contains the wrong LT.   */
>> $3 = 0x13
>> (gdb) memtag checktag x        /* Same as showtag, but also errors using the rules in "memtag handle".  */
>> <incorrect memory tag 0x12 for address 0x1234007c0>
>> $4 = 0x13
>> (gdb) memtag writetag x 0x12        /* Write the tag for the passed in memory address  */
>> (gdb) memtag checktag x
>> $5 = 0x12
>> (gdb) memtag writelogicaltag x 0x14        /* Update the tag in the pointer */
>> (gdb) print x                               /* x contains an incorrect tag. */
>> $1 = 0x1434007c0
>> (gdb) memtag checktag x
>> <incorrect memory tag 0x14 for address 0x1234007c0>
> 
> I would perhaps also use 'mte' here.  'memtag showtag' might be generic to
> memory tags in general, but the others are likely MTE-specific.
> 

We've switched to MTE now, but my point about making a generic command 
for all architectures remains.

>> Linux Ptrace
>>
>> Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
>> methods and /proc/<pid>/mem.
>>
>> New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will be added
>> to read/write data tags. Peek will allow a range of tags to be read in a
>> single call.
> 
> On FreeBSD (we use a variant of FreeBSD for CHERI research) I had a somewhat
> similar plan which was to add a new "address space" for PT_IO that returned
> packed tag bits.
> 

Doing some research, it seems GDB relies more heavily on /proc/<pid>/mem 
to read/write memory contents, so ptrace wouldn't be used that often for 
this purpose.

I'm considering an interface like /proc/<pid>/memtags, which i'm 
discussing with the kernel folks.

The ptrace interfaces would still be there in case /proc/<pid>/mem is 
not available though.

>> Memory accesses inside GDB
>>
>> It should be enough for AArch64 to override target_xfer_partial.
>> If the process is using memory tags, and the address contains a LT, then
>> call PEEKDATATAG for the memory range being accessed and check if the access
>> would succeed. If it doesn't then print just the first failure to the screen.
>> If it does succeed then call the overridden function to access the memory.>
>>
>> Core Dumps
>>
>> There will be extra sections inside a core dump containing the memory tags.
>> The core low version of target_xfer_partial needs overriding.
>> Similar to the xfer_partial override in the previous section, add
>> functionality to check tags, and report failures. Check the tags by
>> accessing the MTE segments in the corefile.  Memory is stored in the core
>> dump untagged, so addresses will need stripping before accessing.
> 
> I am curious how you were planning to describe tags in cores.  I don't have
> concrete thoughts yet but the approach I had been leaning towards was
> having something similar to PT_LOAD, but perhaps PT_TAGS or the like whose
> header would include "tag size" and "tag stride" and the contents of the
> segment would be packed tag bits from a starting VA in the header.  This
> would permit storing both 1-bit and 4-bit tags and would also in theory
> support some other memory tagging schemes I'm aware of from some other
> research.

This is still WIP, as the kernel patches are being worked on. I'll 
provide an update whenever we have a draft design.

> 
> One thing that I would like that you don't currently have a need for (though
> perhaps the memory display mode I suggested above might need) is a way to
> pass around a word of memory and it's tag together, perhaps as a single
> 'struct value'.  In my case I would like to have the tag associated with
> either a register or memory present when printing pointers.  (I have a new
> gdbarch method in my patches that prints pointer attributes and right now
> it ignores the tags, but it would be nice to annotate untagged pointers
> which in CHERI's case are not dereferencable.)
> 

I think this would be a much bigger change to how GDB passes around data 
and memory addresses. But it would certainly be nice to have that.

I'm picturing memory addresses would have to be replaced by a structure 
as well, holding the address, permissions, bounds and tags. Contents 
from memory would also carry around such data.

Right now I'm not sure if this is feasible though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] AArch64 Memory tagging support
  2020-04-13 14:57   ` Luis Machado
@ 2020-06-05 12:55     ` Luis Machado
  2020-06-16 16:34       ` Luis Machado
  0 siblings, 1 reply; 9+ messages in thread
From: Luis Machado @ 2020-06-05 12:55 UTC (permalink / raw)
  To: John Baldwin, Alan Hayward, gdb-patches
  Cc: nd, Omair Javaid, David Spickett, Diana Picus

Hi,

Just a heads-up that I plan to submit a series to add AArch64 MTE 
support (and, more generally, memory tagging support) in roughly a 
couple weeks time.

The kernel interfaces are still under review 
(https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=288601), 
but mostly settled for AArch64.

Core file support has not yet been designed/defined.

On 4/13/20 11:57 AM, Luis Machado wrote:
> Hi,
> 
> We have updated the design based on input.
> 
> On 8/21/19 1:33 PM, John Baldwin wrote:
>> On 8/21/19 3:39 AM, Alan Hayward wrote:
>>> This is a rough design for implementing ARMv8.5 MTE support in GDB,
>>> detailing the UI changes and sketching out the internals.
>>> The Linux interfaces (ptrace, coredumps etc) are currently still under
>>> discussion, and so it will be quite a while before the GDB code is
>>> implemented, but I wanted to get a design out early to ensure that 
>>> the GDB
>>> requirements from the Linux interfaces are known.
>>>
>>> Any comments are welcome. At this stage I’m more concerned about the 
>>> overall
>>> strategy being workable.
>>
>> I have several thoughts on this as I have a somewhat similar need for 
>> dealing
>> with memory tags, though slightly differently.  In my case, I work on 
>> a research
>> project called CHERI that assigns 1 bit tags to every 16-bytes (or in 
>> some cases
>> 32-bytes) of memory as well as to certain registers.  I haven't yet 
>> really dealt
>> with tags in memory in my GDB patches to date, but will need to.  
>> Also, in the
>> case of CHERI, we turn C and C++ pointers into 129-bit (128-bits plus 
>> the 1 bit
>> tag) where the extra 64 bits hold attributes like bounds and 
>> permissions of the
>> pointer, and the 1-bit tag determines validity.  You can ready more 
>> about it
>> at www.chericpu.com if you are curious.  We currently provide models 
>> of it on MIPS
>> and are bringing it up on RISC-V (simulations and FPGA).
> 
> It sounds like CHERI could benefit from this memory tagging work for 
> that particular tag bit. It would be stored in-pointer and would be 
> validated the same way as MTE. We could add a customizable tag stride to 
> account for variations in the memory ranges covered by tags.

I have factored this in, so we have a gdbarch method to set the granule 
size (or stride length), keeping this configurable per-architecture.

> 
> MTE tags 16 bytes at a time. If CHERI needs to use 32 bytes, then it 
> should be able to set a gdbarch variable. Would that work for you?
> 
> The other bits, however, would need another mechanism for verifying 
> validity. Bounds and permissions in this case.
> 
> Alternatively, we could treat the whole additional 65 bits as a big 
> arch-specific memory tag, which the architecture should decode when a 
> memory access is attempted, and let GDB know if such access is valid or 
> not.
> 

With some more generic bits in place, we could make this work nicely for 
validating bounds, permissions and flags as if they were a tag of some kind.

>>
>> To the extent that we can have somewhat generic tagging interface for 
>> GDB that
>> might cover sparc ADI as well, that might be nice.
>>

I was surprised to find the ADI implementation entirely contained in the 
sparc64-specific layers of GDB, with no generic infrastructure. So, 
essentially, there is no way to reuse that. But I think the ADI 
implementation can be ported to the more generic interface I plan to put 
in place, if sparc developers want to.

> 
> It sounds like the design we currently have will be suitable for ADI as 
> well, though i didn't look too deep into how Sparc handles the tags.
> 
>>> Background
>>>
>>> The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which 
>>> allows
>>> 4bit tags to be assigned to each memory 16bytes of memory. Each 
>>> allocation
>>> is referred to as Allocated Tag (AT) in the text below. ATs are stored
>>> separately to the main memory. When accessing a memory location, 
>>> 4bits of
>>> the address are reserved for use as a tag. This is referred to as a 
>>> Logical
>>> Tag (LT) in the text below. If the LT does not match the AT in a 
>>> memory read
>>> or write, then the access will trap.
>>>
>>> For more details see the MTE links here:
>>> https://developer.arm.com/architectures/cpu-architecture/a-profile#mte
>>>
>>> For a very high-level overview see:
>>> https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/
>>>
>>>
>>> GDB UI: Memory Access
>>>
>>> In the general use case, when using GDB to examine memory, GDB should 
>>> print
>>> out when a memory tag failure happens. However, the operation it was 
>>> doing (for
>>> example, reading/writing memory) should still succeed. A GDB user 
>>> would not
>>> expect a signal to be passed upwards to the subject program.
>>>
>>> For example, x is an int* variable in the subject application and it 
>>> contains
>>> an address with an incorrect LT:
>>>
>>> (gdb) print x             /* x contains an incorrect LT. */
>>> $1 = 0x1234007c0
>>> (gdb) print *x
>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>> $2 = 67
>>> (gdb) set *x = 72
>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>> (gdb) print *x
>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>> $2 = 72
>>
>> I would like to have something similar eventually where attempts to 
>> access an
>> out-of-bounds pointer would fail, but perhaps with some kind of 
>> override flag
>> (like p/r for disabling pretty-printers) to permit examining 
>> out-of-bounds
>> memory contents.  I think having the same type of override to "dump 
>> the memory
>> anyway, even if the tag is wrong" might be useful for users, though I 
>> agree
>> the default behavior should be to warn about invalid use.
> 
> I think we could achieve this by using an on/off flag to allow/disallow 
> validation of accesses to memory addresses.
> 
> As for validating bounds, it depends on whether we want to consider the 
> bounds as part of a bigger "tag". This would simplify the code a bit.
> 

Basically I'm considering augmenting the "print" command and the "x" 
command.

The initial implementation of print's tag support will attempt to 
validate expressions that evaluate to pointers. We could expand the 
implementation to try and do more clever things, like trying to 
determine when a particular expression access pointers.

Validating *every* memory access GDB does is prohibitive, specially 
since it already trims the top byte from the addresses passed in to the 
xfer* functions. So the tags would be eliminated.

>>
>>> When printing areas of memory (for example with the command x) this 
>>> warning
>>> should only be printed once per dump.
>>>
>>> (gdb) x/20xw y
>>> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
>>> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
>>> 0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
>>> 0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067
>>
>> One other thing that might be nice to have is some kind of view of 
>> memory that
>> dumps tags and bytes in parallel, so something like:
>>
>> (gdb) x/20xwt y
>> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000 [0x13]
>> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
>> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000 [0x12]
>>
>> etc.
>>

I've incorporated this idea in the implementation. The "x" command will 
have a new modifier (right not it is /m) to display tags. The current 
implementation displays tags as an isolated line instead of displaying 
it at the last column, like so:

[0x13]
0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
[0x0]
0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
[0x12]
0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000

Due to the way GDB does the dump, it is tricky to aligned things to the 
memory tag granule while still printing only the bytes the user 
requested. This could be improved though.

> 
> I think this is doable. In the worst case we could print memory contents 
> starting from the first aligned address in the range that contains the 
> provided address. So, given a memory address memaddr, we would print 
> data starting from (memaddr & tag_alignment).
> 
> If that's not desirable, we could print some 'x' bytes leading up to the 
> desired address. This way things would be printed nicely, like so:
> 
> (gdb) x/20xwt y
> 0x1234007a0: 0xxxxxxxxx 0xxxxxxxxx 0xxx0a6425 0x00000000 [0x13]
> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
> 0x1234007c0: 0x00000040 0x00000003 0x0000xxxx 0xxxxxxxxx [0x12]
> 
>>> However, there will be instances where the GDB user wants to either 
>>> suppress
>>> any tag warning entirely or pass any errors upwards to the subject 
>>> program as
>>> a signal. GDB already has similar functionality available for signals 
>>> using
>>> the command handle. An Aarch64 only command "memtag” should be added 
>>> for this.
>>>
>>> (gdb) memtag handle
>>> Memory tag failures will be printed
>>> Memory tag failures will not raise a signal
>>> (gdb) print *x
>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>> $1 = 67
>>> (gdb) memtag handle noprint
>>> Memory tag failures will not be printed
>>> Memory tag failures will not raise a signal
>>> (gdb) print *x
>>> $2 = 67
>>> (gdb) memtag handle raise
>>> Memory tag failures will not be printed
>>> Memory tag failures will raise a signal
>>> (gdb) print *x
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> The program no longer exists.
>>>
>>> Suggested arguments to "memtag handle" are "print", "noprint", "raise”,
>>> "noraise”. This will only change the behaviour for memory tag failures
>>> generated by the user inside GDB (ie this not affect inferior behaviour)

I've made the global toggle "set memory-tagging on/off" and the memory 
tagging commands "mtag <sub option> <value>".

If a particular architecture doesn't support memory tagging, GDB will 
refuse to use any of the commands and will not attempt to validate tags.

>>
>> Given that these features are somewhat MTE-specific, I would perhaps 
>> suggest
>> using 'mte' instead of 'memtag' for the name.
> 
> We've switched to MTE now, though this command could be generic as well. 
> Then other architectures implementing memory tagging wouldn't have to 
> add their own commands to GDB. Thoughts?
> 
> With regards to handling the behavior of memory tagging in GDB, i think 
> a switch to enable/disable validation would be a better alternative, 
> like so:
> 
> set mte validation on/off (default on)
> 
> The "handle" command is more of a runtime command that tries to deal 
> with received signals and what to do about them. Even if we decide to 
> raise tag violations via "mte handle raise", they will still be filtered 
> by "handle", and the SIGSEGV will likely generate a visible user stop.
> 
> Also, attempting to print memory won't trigger a SIGSEGV since this 
> request goes through GDB through either ptrace or /proc/<pid>/mem. There 
> is no inferior movement and thus no signals being delivered for that.
> 
> We should also watch out for the many memory accesses GDB does during 
> unwinding and debug info reading. We don't want to keep seeing a lot of 
> warnings about memory tag failures in that case.
> 
> I guess we'll need to exercise this and see how GDB behaves. Then either 
> fix the code or temporarily disable memory tag validation when GDB 
> attempts to read memory for those purposes.
> 
>>> GDB UI: Examining Tags
>>>
>>> The memtags command can also be used to read and write memory tags 
>>> for a given
>>> memory location. Also, we want to be able to read and write tags from 
>>> a given
>>> address.
>>>
>>> (gdb) print x                               /* x contains an 
>>> incorrect tag. */
>>> $1 = 0x1234007c0
>>> (gdb) print *x
>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>> $1 = 67
>>> (gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the 
>>> passed in pointer */
>>> $2 = 0x12
>>> (gdb) memtag showtag x        /* Show the AT for the memory address. 
>>> Never returns errors if address contains the wrong LT.   */
>>> $3 = 0x13
>>> (gdb) memtag checktag x        /* Same as showtag, but also errors 
>>> using the rules in "memtag handle".  */
>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>> $4 = 0x13
>>> (gdb) memtag writetag x 0x12        /* Write the tag for the passed 
>>> in memory address  */
>>> (gdb) memtag checktag x
>>> $5 = 0x12
>>> (gdb) memtag writelogicaltag x 0x14        /* Update the tag in the 
>>> pointer */
>>> (gdb) print x                               /* x contains an 
>>> incorrect tag. */
>>> $1 = 0x1434007c0
>>> (gdb) memtag checktag x
>>> <incorrect memory tag 0x14 for address 0x1234007c0>
>>
>> I would perhaps also use 'mte' here.  'memtag showtag' might be 
>> generic to
>> memory tags in general, but the others are likely MTE-specific.
>>
> 
> We've switched to MTE now, but my point about making a generic command 
> for all architectures remains.
> 
>>> Linux Ptrace
>>>
>>> Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
>>> methods and /proc/<pid>/mem.
>>>
>>> New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will be 
>>> added
>>> to read/write data tags. Peek will allow a range of tags to be read in a
>>> single call.
>>
>> On FreeBSD (we use a variant of FreeBSD for CHERI research) I had a 
>> somewhat
>> similar plan which was to add a new "address space" for PT_IO that 
>> returned
>> packed tag bits.
>>
> 
> Doing some research, it seems GDB relies more heavily on /proc/<pid>/mem 
> to read/write memory contents, so ptrace wouldn't be used that often for 
> this purpose.
> 
> I'm considering an interface like /proc/<pid>/memtags, which i'm 
> discussing with the kernel folks.
> 
> The ptrace interfaces would still be there in case /proc/<pid>/mem is 
> not available though.
> 
>>> Memory accesses inside GDB
>>>
>>> It should be enough for AArch64 to override target_xfer_partial.
>>> If the process is using memory tags, and the address contains a LT, then
>>> call PEEKDATATAG for the memory range being accessed and check if the 
>>> access
>>> would succeed. If it doesn't then print just the first failure to the 
>>> screen.
>>> If it does succeed then call the overridden function to access the 
>>> memory.>
>>>
>>> Core Dumps
>>>
>>> There will be extra sections inside a core dump containing the memory 
>>> tags.
>>> The core low version of target_xfer_partial needs overriding.
>>> Similar to the xfer_partial override in the previous section, add
>>> functionality to check tags, and report failures. Check the tags by
>>> accessing the MTE segments in the corefile.  Memory is stored in the 
>>> core
>>> dump untagged, so addresses will need stripping before accessing.
>>
>> I am curious how you were planning to describe tags in cores.  I don't 
>> have
>> concrete thoughts yet but the approach I had been leaning towards was
>> having something similar to PT_LOAD, but perhaps PT_TAGS or the like 
>> whose
>> header would include "tag size" and "tag stride" and the contents of the
>> segment would be packed tag bits from a starting VA in the header.  This
>> would permit storing both 1-bit and 4-bit tags and would also in theory
>> support some other memory tagging schemes I'm aware of from some other
>> research.
> 
> This is still WIP, as the kernel patches are being worked on. I'll 
> provide an update whenever we have a draft design.
> 
>>
>> One thing that I would like that you don't currently have a need for 
>> (though
>> perhaps the memory display mode I suggested above might need) is a way to
>> pass around a word of memory and it's tag together, perhaps as a single
>> 'struct value'.  In my case I would like to have the tag associated with
>> either a register or memory present when printing pointers.  (I have a 
>> new
>> gdbarch method in my patches that prints pointer attributes and right now
>> it ignores the tags, but it would be nice to annotate untagged pointers
>> which in CHERI's case are not dereferencable.)
>>
> 
> I think this would be a much bigger change to how GDB passes around data 
> and memory addresses. But it would certainly be nice to have that.
> 
> I'm picturing memory addresses would have to be replaced by a structure 
> as well, holding the address, permissions, bounds and tags. Contents 
> from memory would also carry around such data.
> 
> Right now I'm not sure if this is feasible though.

For the memory tag gdbarch methods, I've made them accept a struct 
*value, which should be good enough to pass pointers/addresses of 
various sizes alongside other data. So, hopefully, you'd be able to pass 
down a CHERI capability with its tag?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] AArch64 Memory tagging support
  2020-06-05 12:55     ` Luis Machado
@ 2020-06-16 16:34       ` Luis Machado
  0 siblings, 0 replies; 9+ messages in thread
From: Luis Machado @ 2020-06-16 16:34 UTC (permalink / raw)
  To: John Baldwin, Alan Hayward, gdb-patches
  Cc: nd, Omair Javaid, David Spickett, Diana Picus

Hi,

I've now made the series available in a separate user branch. I'll let 
it sit there for some feedback before I proceed to submit the final 
version to the mailing list.

https://sourceware.org/git/?p=binutils-gdb.git;a=shortlog;h=refs/heads/users/luisgpm/aarch64-mte-v1

Luis

On 6/5/20 9:55 AM, Luis Machado wrote:
> Hi,
> 
> Just a heads-up that I plan to submit a series to add AArch64 MTE 
> support (and, more generally, memory tagging support) in roughly a 
> couple weeks time.
> 
> The kernel interfaces are still under review 
> (https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=288601), 
> but mostly settled for AArch64.
> 
> Core file support has not yet been designed/defined.
> 
> On 4/13/20 11:57 AM, Luis Machado wrote:
>> Hi,
>>
>> We have updated the design based on input.
>>
>> On 8/21/19 1:33 PM, John Baldwin wrote:
>>> On 8/21/19 3:39 AM, Alan Hayward wrote:
>>>> This is a rough design for implementing ARMv8.5 MTE support in GDB,
>>>> detailing the UI changes and sketching out the internals.
>>>> The Linux interfaces (ptrace, coredumps etc) are currently still under
>>>> discussion, and so it will be quite a while before the GDB code is
>>>> implemented, but I wanted to get a design out early to ensure that 
>>>> the GDB
>>>> requirements from the Linux interfaces are known.
>>>>
>>>> Any comments are welcome. At this stage I’m more concerned about the 
>>>> overall
>>>> strategy being workable.
>>>
>>> I have several thoughts on this as I have a somewhat similar need for 
>>> dealing
>>> with memory tags, though slightly differently.  In my case, I work on 
>>> a research
>>> project called CHERI that assigns 1 bit tags to every 16-bytes (or in 
>>> some cases
>>> 32-bytes) of memory as well as to certain registers.  I haven't yet 
>>> really dealt
>>> with tags in memory in my GDB patches to date, but will need to. 
>>> Also, in the
>>> case of CHERI, we turn C and C++ pointers into 129-bit (128-bits plus 
>>> the 1 bit
>>> tag) where the extra 64 bits hold attributes like bounds and 
>>> permissions of the
>>> pointer, and the 1-bit tag determines validity.  You can ready more 
>>> about it
>>> at www.chericpu.com if you are curious.  We currently provide models 
>>> of it on MIPS
>>> and are bringing it up on RISC-V (simulations and FPGA).
>>
>> It sounds like CHERI could benefit from this memory tagging work for 
>> that particular tag bit. It would be stored in-pointer and would be 
>> validated the same way as MTE. We could add a customizable tag stride 
>> to account for variations in the memory ranges covered by tags.
> 
> I have factored this in, so we have a gdbarch method to set the granule 
> size (or stride length), keeping this configurable per-architecture.
> 
>>
>> MTE tags 16 bytes at a time. If CHERI needs to use 32 bytes, then it 
>> should be able to set a gdbarch variable. Would that work for you?
>>
>> The other bits, however, would need another mechanism for verifying 
>> validity. Bounds and permissions in this case.
>>
>> Alternatively, we could treat the whole additional 65 bits as a big 
>> arch-specific memory tag, which the architecture should decode when a 
>> memory access is attempted, and let GDB know if such access is valid 
>> or not.
>>
> 
> With some more generic bits in place, we could make this work nicely for 
> validating bounds, permissions and flags as if they were a tag of some 
> kind.
> 
>>>
>>> To the extent that we can have somewhat generic tagging interface for 
>>> GDB that
>>> might cover sparc ADI as well, that might be nice.
>>>
> 
> I was surprised to find the ADI implementation entirely contained in the 
> sparc64-specific layers of GDB, with no generic infrastructure. So, 
> essentially, there is no way to reuse that. But I think the ADI 
> implementation can be ported to the more generic interface I plan to put 
> in place, if sparc developers want to.
> 
>>
>> It sounds like the design we currently have will be suitable for ADI 
>> as well, though i didn't look too deep into how Sparc handles the tags.
>>
>>>> Background
>>>>
>>>> The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which 
>>>> allows
>>>> 4bit tags to be assigned to each memory 16bytes of memory. Each 
>>>> allocation
>>>> is referred to as Allocated Tag (AT) in the text below. ATs are stored
>>>> separately to the main memory. When accessing a memory location, 
>>>> 4bits of
>>>> the address are reserved for use as a tag. This is referred to as a 
>>>> Logical
>>>> Tag (LT) in the text below. If the LT does not match the AT in a 
>>>> memory read
>>>> or write, then the access will trap.
>>>>
>>>> For more details see the MTE links here:
>>>> https://developer.arm.com/architectures/cpu-architecture/a-profile#mte
>>>>
>>>> For a very high-level overview see:
>>>> https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/
>>>>
>>>>
>>>> GDB UI: Memory Access
>>>>
>>>> In the general use case, when using GDB to examine memory, GDB 
>>>> should print
>>>> out when a memory tag failure happens. However, the operation it was 
>>>> doing (for
>>>> example, reading/writing memory) should still succeed. A GDB user 
>>>> would not
>>>> expect a signal to be passed upwards to the subject program.
>>>>
>>>> For example, x is an int* variable in the subject application and it 
>>>> contains
>>>> an address with an incorrect LT:
>>>>
>>>> (gdb) print x             /* x contains an incorrect LT. */
>>>> $1 = 0x1234007c0
>>>> (gdb) print *x
>>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>>> $2 = 67
>>>> (gdb) set *x = 72
>>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>>> (gdb) print *x
>>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>>> $2 = 72
>>>
>>> I would like to have something similar eventually where attempts to 
>>> access an
>>> out-of-bounds pointer would fail, but perhaps with some kind of 
>>> override flag
>>> (like p/r for disabling pretty-printers) to permit examining 
>>> out-of-bounds
>>> memory contents.  I think having the same type of override to "dump 
>>> the memory
>>> anyway, even if the tag is wrong" might be useful for users, though I 
>>> agree
>>> the default behavior should be to warn about invalid use.
>>
>> I think we could achieve this by using an on/off flag to 
>> allow/disallow validation of accesses to memory addresses.
>>
>> As for validating bounds, it depends on whether we want to consider 
>> the bounds as part of a bigger "tag". This would simplify the code a bit.
>>
> 
> Basically I'm considering augmenting the "print" command and the "x" 
> command.
> 
> The initial implementation of print's tag support will attempt to 
> validate expressions that evaluate to pointers. We could expand the 
> implementation to try and do more clever things, like trying to 
> determine when a particular expression access pointers.
> 
> Validating *every* memory access GDB does is prohibitive, specially 
> since it already trims the top byte from the addresses passed in to the 
> xfer* functions. So the tags would be eliminated.
> 
>>>
>>>> When printing areas of memory (for example with the command x) this 
>>>> warning
>>>> should only be printed once per dump.
>>>>
>>>> (gdb) x/20xw y
>>>> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
>>>> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
>>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>>> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
>>>> 0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
>>>> 0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067
>>>
>>> One other thing that might be nice to have is some kind of view of 
>>> memory that
>>> dumps tags and bytes in parallel, so something like:
>>>
>>> (gdb) x/20xwt y
>>> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000 [0x13]
>>> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
>>> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000 [0x12]
>>>
>>> etc.
>>>
> 
> I've incorporated this idea in the implementation. The "x" command will 
> have a new modifier (right not it is /m) to display tags. The current 
> implementation displays tags as an isolated line instead of displaying 
> it at the last column, like so:
> 
> [0x13]
> 0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
> [0x0]
> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
> [0x12]
> 0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
> 
> Due to the way GDB does the dump, it is tricky to aligned things to the 
> memory tag granule while still printing only the bytes the user 
> requested. This could be improved though.
> 
>>
>> I think this is doable. In the worst case we could print memory 
>> contents starting from the first aligned address in the range that 
>> contains the provided address. So, given a memory address memaddr, we 
>> would print data starting from (memaddr & tag_alignment).
>>
>> If that's not desirable, we could print some 'x' bytes leading up to 
>> the desired address. This way things would be printed nicely, like so:
>>
>> (gdb) x/20xwt y
>> 0x1234007a0: 0xxxxxxxxx 0xxxxxxxxx 0xxx0a6425 0x00000000 [0x13]
>> 0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000 [0x0]
>> 0x1234007c0: 0x00000040 0x00000003 0x0000xxxx 0xxxxxxxxx [0x12]
>>
>>>> However, there will be instances where the GDB user wants to either 
>>>> suppress
>>>> any tag warning entirely or pass any errors upwards to the subject 
>>>> program as
>>>> a signal. GDB already has similar functionality available for 
>>>> signals using
>>>> the command handle. An Aarch64 only command "memtag” should be added 
>>>> for this.
>>>>
>>>> (gdb) memtag handle
>>>> Memory tag failures will be printed
>>>> Memory tag failures will not raise a signal
>>>> (gdb) print *x
>>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>>> $1 = 67
>>>> (gdb) memtag handle noprint
>>>> Memory tag failures will not be printed
>>>> Memory tag failures will not raise a signal
>>>> (gdb) print *x
>>>> $2 = 67
>>>> (gdb) memtag handle raise
>>>> Memory tag failures will not be printed
>>>> Memory tag failures will raise a signal
>>>> (gdb) print *x
>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>> The program no longer exists.
>>>>
>>>> Suggested arguments to "memtag handle" are "print", "noprint", "raise”,
>>>> "noraise”. This will only change the behaviour for memory tag failures
>>>> generated by the user inside GDB (ie this not affect inferior 
>>>> behaviour)
> 
> I've made the global toggle "set memory-tagging on/off" and the memory 
> tagging commands "mtag <sub option> <value>".
> 
> If a particular architecture doesn't support memory tagging, GDB will 
> refuse to use any of the commands and will not attempt to validate tags.
> 
>>>
>>> Given that these features are somewhat MTE-specific, I would perhaps 
>>> suggest
>>> using 'mte' instead of 'memtag' for the name.
>>
>> We've switched to MTE now, though this command could be generic as 
>> well. Then other architectures implementing memory tagging wouldn't 
>> have to add their own commands to GDB. Thoughts?
>>
>> With regards to handling the behavior of memory tagging in GDB, i 
>> think a switch to enable/disable validation would be a better 
>> alternative, like so:
>>
>> set mte validation on/off (default on)
>>
>> The "handle" command is more of a runtime command that tries to deal 
>> with received signals and what to do about them. Even if we decide to 
>> raise tag violations via "mte handle raise", they will still be 
>> filtered by "handle", and the SIGSEGV will likely generate a visible 
>> user stop.
>>
>> Also, attempting to print memory won't trigger a SIGSEGV since this 
>> request goes through GDB through either ptrace or /proc/<pid>/mem. 
>> There is no inferior movement and thus no signals being delivered for 
>> that.
>>
>> We should also watch out for the many memory accesses GDB does during 
>> unwinding and debug info reading. We don't want to keep seeing a lot 
>> of warnings about memory tag failures in that case.
>>
>> I guess we'll need to exercise this and see how GDB behaves. Then 
>> either fix the code or temporarily disable memory tag validation when 
>> GDB attempts to read memory for those purposes.
>>
>>>> GDB UI: Examining Tags
>>>>
>>>> The memtags command can also be used to read and write memory tags 
>>>> for a given
>>>> memory location. Also, we want to be able to read and write tags 
>>>> from a given
>>>> address.
>>>>
>>>> (gdb) print x                               /* x contains an 
>>>> incorrect tag. */
>>>> $1 = 0x1234007c0
>>>> (gdb) print *x
>>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>>> $1 = 67
>>>> (gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the 
>>>> passed in pointer */
>>>> $2 = 0x12
>>>> (gdb) memtag showtag x        /* Show the AT for the memory address. 
>>>> Never returns errors if address contains the wrong LT.   */
>>>> $3 = 0x13
>>>> (gdb) memtag checktag x        /* Same as showtag, but also errors 
>>>> using the rules in "memtag handle".  */
>>>> <incorrect memory tag 0x12 for address 0x1234007c0>
>>>> $4 = 0x13
>>>> (gdb) memtag writetag x 0x12        /* Write the tag for the passed 
>>>> in memory address  */
>>>> (gdb) memtag checktag x
>>>> $5 = 0x12
>>>> (gdb) memtag writelogicaltag x 0x14        /* Update the tag in the 
>>>> pointer */
>>>> (gdb) print x                               /* x contains an 
>>>> incorrect tag. */
>>>> $1 = 0x1434007c0
>>>> (gdb) memtag checktag x
>>>> <incorrect memory tag 0x14 for address 0x1234007c0>
>>>
>>> I would perhaps also use 'mte' here.  'memtag showtag' might be 
>>> generic to
>>> memory tags in general, but the others are likely MTE-specific.
>>>
>>
>> We've switched to MTE now, but my point about making a generic command 
>> for all architectures remains.
>>
>>>> Linux Ptrace
>>>>
>>>> Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
>>>> methods and /proc/<pid>/mem.
>>>>
>>>> New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will 
>>>> be added
>>>> to read/write data tags. Peek will allow a range of tags to be read 
>>>> in a
>>>> single call.
>>>
>>> On FreeBSD (we use a variant of FreeBSD for CHERI research) I had a 
>>> somewhat
>>> similar plan which was to add a new "address space" for PT_IO that 
>>> returned
>>> packed tag bits.
>>>
>>
>> Doing some research, it seems GDB relies more heavily on 
>> /proc/<pid>/mem to read/write memory contents, so ptrace wouldn't be 
>> used that often for this purpose.
>>
>> I'm considering an interface like /proc/<pid>/memtags, which i'm 
>> discussing with the kernel folks.
>>
>> The ptrace interfaces would still be there in case /proc/<pid>/mem is 
>> not available though.
>>
>>>> Memory accesses inside GDB
>>>>
>>>> It should be enough for AArch64 to override target_xfer_partial.
>>>> If the process is using memory tags, and the address contains a LT, 
>>>> then
>>>> call PEEKDATATAG for the memory range being accessed and check if 
>>>> the access
>>>> would succeed. If it doesn't then print just the first failure to 
>>>> the screen.
>>>> If it does succeed then call the overridden function to access the 
>>>> memory.>
>>>>
>>>> Core Dumps
>>>>
>>>> There will be extra sections inside a core dump containing the 
>>>> memory tags.
>>>> The core low version of target_xfer_partial needs overriding.
>>>> Similar to the xfer_partial override in the previous section, add
>>>> functionality to check tags, and report failures. Check the tags by
>>>> accessing the MTE segments in the corefile.  Memory is stored in the 
>>>> core
>>>> dump untagged, so addresses will need stripping before accessing.
>>>
>>> I am curious how you were planning to describe tags in cores.  I 
>>> don't have
>>> concrete thoughts yet but the approach I had been leaning towards was
>>> having something similar to PT_LOAD, but perhaps PT_TAGS or the like 
>>> whose
>>> header would include "tag size" and "tag stride" and the contents of the
>>> segment would be packed tag bits from a starting VA in the header.  This
>>> would permit storing both 1-bit and 4-bit tags and would also in theory
>>> support some other memory tagging schemes I'm aware of from some other
>>> research.
>>
>> This is still WIP, as the kernel patches are being worked on. I'll 
>> provide an update whenever we have a draft design.
>>
>>>
>>> One thing that I would like that you don't currently have a need for 
>>> (though
>>> perhaps the memory display mode I suggested above might need) is a 
>>> way to
>>> pass around a word of memory and it's tag together, perhaps as a single
>>> 'struct value'.  In my case I would like to have the tag associated with
>>> either a register or memory present when printing pointers.  (I have 
>>> a new
>>> gdbarch method in my patches that prints pointer attributes and right 
>>> now
>>> it ignores the tags, but it would be nice to annotate untagged pointers
>>> which in CHERI's case are not dereferencable.)
>>>
>>
>> I think this would be a much bigger change to how GDB passes around 
>> data and memory addresses. But it would certainly be nice to have that.
>>
>> I'm picturing memory addresses would have to be replaced by a 
>> structure as well, holding the address, permissions, bounds and tags. 
>> Contents from memory would also carry around such data.
>>
>> Right now I'm not sure if this is feasible though.
> 
> For the memory tag gdbarch methods, I've made them accept a struct 
> *value, which should be good enough to pass pointers/addresses of 
> various sizes alongside other data. So, hopefully, you'd be able to pass 
> down a CHERI capability with its tag?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] AArch64 Memory tagging support
  2019-10-11 18:17 ` Tom Tromey
@ 2019-10-14 13:12   ` Alan Hayward
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Hayward @ 2019-10-14 13:12 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches, nd



> On 11 Oct 2019, at 19:17, Tom Tromey <tom@tromey.com> wrote:
> 
>>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:
> 
> Alan> This is a rough design for implementing ARMv8.5 MTE support in GDB,
> Alan> detailing the UI changes and sketching out the internals.
> 
> 
> Alan> Memory accesses inside GDB
> 
> Alan> It should be enough for AArch64 to override target_xfer_partial.
> Alan> If the process is using memory tags, and the address contains a LT, then
> Alan> call PEEKDATATAG for the memory range being accessed and check if the access
> Alan> would succeed. If it doesn't then print just the first failure to the screen.
> Alan> If it does succeed then call the overridden function to access the memory.
> 
> Another thing to consider is that you will probably want remote protocol
> support for this.
> 
> Tom

Thanks,
I’ll make sure that gets added onto the design.

Alan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] AArch64 Memory tagging support
  2019-08-21 10:39 Alan Hayward
@ 2019-10-11 18:17 ` Tom Tromey
  2019-10-14 13:12   ` Alan Hayward
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Tromey @ 2019-10-11 18:17 UTC (permalink / raw)
  To: Alan Hayward; +Cc: gdb-patches, nd

>>>>> "Alan" == Alan Hayward <Alan.Hayward@arm.com> writes:

Alan> This is a rough design for implementing ARMv8.5 MTE support in GDB,
Alan> detailing the UI changes and sketching out the internals.

Alan> Memory accesses inside GDB

Alan> It should be enough for AArch64 to override target_xfer_partial.
Alan> If the process is using memory tags, and the address contains a LT, then
Alan> call PEEKDATATAG for the memory range being accessed and check if the access
Alan> would succeed. If it doesn't then print just the first failure to the screen.
Alan> If it does succeed then call the overridden function to access the memory.

Another thing to consider is that you will probably want remote protocol
support for this.

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC] AArch64 Memory tagging support
@ 2019-08-21 10:39 Alan Hayward
  2019-10-11 18:17 ` Tom Tromey
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Hayward @ 2019-08-21 10:39 UTC (permalink / raw)
  To: gdb-patches\@sourceware.org; +Cc: nd

This is a rough design for implementing ARMv8.5 MTE support in GDB,
detailing the UI changes and sketching out the internals.
The Linux interfaces (ptrace, coredumps etc) are currently still under
discussion, and so it will be quite a while before the GDB code is
implemented, but I wanted to get a design out early to ensure that the GDB
requirements from the Linux interfaces are known.

Any comments are welcome. At this stage I’m more concerned about the overall
strategy being workable.

Background

The ARMv8.5 ISA introduces the Memory Tagging Extension (MTE) which allows
4bit tags to be assigned to each memory 16bytes of memory. Each allocation
is referred to as Allocated Tag (AT) in the text below. ATs are stored
separately to the main memory. When accessing a memory location, 4bits of
the address are reserved for use as a tag. This is referred to as a Logical
Tag (LT) in the text below. If the LT does not match the AT in a memory read
or write, then the access will trap.

For more details see the MTE links here:
https://developer.arm.com/architectures/cpu-architecture/a-profile#mte

For a very high-level overview see:
https://threatpost.com/google-arm-android-bugs-memory-tagging/146950/

GDB UI: Memory Access

In the general use case, when using GDB to examine memory, GDB should print
out when a memory tag failure happens. However, the operation it was doing (for
example, reading/writing memory) should still succeed. A GDB user would not
expect a signal to be passed upwards to the subject program.

For example, x is an int* variable in the subject application and it contains
an address with an incorrect LT:

(gdb) print x             /* x contains an incorrect LT. */
$1 = 0x1234007c0
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$2 = 67
(gdb) set *x = 72
<incorrect memory tag 0x12 for address 0x1234007c0>
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$2 = 72

When printing areas of memory (for example with the command x) this warning
should only be printed once per dump.

(gdb) x/20xw y
0x1234007a0: 0x00000061 0x00000000 0x000a6425 0x00000000
0x1234007b0: 0x00000062 0x00000000 0x00000000 0x00000000
<incorrect memory tag 0x12 for address 0x1234007c0>
0x1234007c0: 0x00000040 0x00000003 0x00000405 0x00000000
0x1234007d0: 0x00000000 0x00000000 0xffffffff 0x00000009
0x1234007e0: 0x00033000 0x00000700 0x00000000 0x00000067

However, there will be instances where the GDB user wants to either suppress
any tag warning entirely or pass any errors upwards to the subject program as
a signal. GDB already has similar functionality available for signals using
the command handle. An Aarch64 only command "memtag” should be added for this.

(gdb) memtag handle
Memory tag failures will be printed
Memory tag failures will not raise a signal
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$1 = 67
(gdb) memtag handle noprint
Memory tag failures will not be printed
Memory tag failures will not raise a signal
(gdb) print *x
$2 = 67
(gdb) memtag handle raise
Memory tag failures will not be printed
Memory tag failures will raise a signal
(gdb) print *x
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Suggested arguments to "memtag handle" are "print", "noprint", "raise”,
"noraise”. This will only change the behaviour for memory tag failures
generated by the user inside GDB (ie this not affect inferior behaviour)

GDB UI: Examining Tags

The memtags command can also be used to read and write memory tags for a given
memory location. Also, we want to be able to read and write tags from a given
address.

(gdb) print x                               /* x contains an incorrect tag. */
$1 = 0x1234007c0
(gdb) print *x
<incorrect memory tag 0x12 for address 0x1234007c0>
$1 = 67
(gdb) memtag showlogicaltag x        /* Extract the 4bit LT from the passed in pointer */
$2 = 0x12
(gdb) memtag showtag x        /* Show the AT for the memory address. Never returns errors if address contains the wrong LT.   */
$3 = 0x13
(gdb) memtag checktag x        /* Same as showtag, but also errors using the rules in "memtag handle".  */
<incorrect memory tag 0x12 for address 0x1234007c0>
$4 = 0x13
(gdb) memtag writetag x 0x12        /* Write the tag for the passed in memory address  */
(gdb) memtag checktag x
$5 = 0x12
(gdb) memtag writelogicaltag x 0x14        /* Update the tag in the pointer */
(gdb) print x                               /* x contains an incorrect tag. */
$1 = 0x1434007c0
(gdb) memtag checktag x
<incorrect memory tag 0x14 for address 0x1234007c0>

Linux Ptrace

Linux will ignore tags when reading/writing memory via PEEK/POKE ptrace
methods and /proc/<pid>/mem.

New ptrace commands PTRACE_PEEKDATATAG and PTRACE_POKEDATATAG will be added
to read/write data tags. Peek will allow a range of tags to be read in a
single call.

Memory accesses inside GDB

It should be enough for AArch64 to override target_xfer_partial.
If the process is using memory tags, and the address contains a LT, then
call PEEKDATATAG for the memory range being accessed and check if the access
would succeed. If it doesn't then print just the first failure to the screen.
If it does succeed then call the overridden function to access the memory.

Core Dumps

There will be extra sections inside a core dump containing the memory tags.
The core low version of target_xfer_partial needs overriding.
Similar to the xfer_partial override in the previous section, add
functionality to check tags, and report failures. Check the tags by
accessing the MTE segments in the corefile.  Memory is stored in the core
dump untagged, so addresses will need stripping before accessing.

Alan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-16 16:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 10:39 [RFC] AArch64 Memory tagging support Alan Hayward
2019-08-21 16:34 ` John Baldwin
2019-08-22 10:31   ` Alan Hayward
2020-04-13 14:57   ` Luis Machado
2020-06-05 12:55     ` Luis Machado
2020-06-16 16:34       ` Luis Machado
2019-08-21 10:39 Alan Hayward
2019-10-11 18:17 ` Tom Tromey
2019-10-14 13:12   ` Alan Hayward

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).