Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Matthew Malcomson <Matthew.Malcomson@arm.com>
To: "Martin Liška" <mliska@suse.cz>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: "dodji@redhat.com" <dodji@redhat.com>, nd <nd@arm.com>,
	"kcc@google.com"	<kcc@google.com>,
	"jakub@redhat.com" <jakub@redhat.com>,
	"dvyukov@google.com"	<dvyukov@google.com>
Subject: Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC
Date: Mon, 09 Sep 2019 15:55:00 -0000	[thread overview]
Message-ID: <8fc78139-481e-6dbc-0996-2cae58627c25@arm.com> (raw)
In-Reply-To: <936e0222-0b05-b4de-7a68-9b91e79a6f76@suse.cz>

On 09/09/19 11:47, Martin Liška wrote:
> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
>> Hello,
>>
>> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
>> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be found
>> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
> 
> Hello.
> 
> I'm happy that you are working on the functionality for GCC and I can provide
> my knowledge that I have with ASAN. I briefly read the patch series and I have
> multiple questions (and observations):
> 
> 1) Is the ambition of the patchset to be a software emulation of MTE that can
>     work targets that do not support MTE? Is it something what clang
>     names hwasan-abi=interceptor?

The ambition is to provide a software emulation of MTE for AArch64 
targets that don't support MTE.
I also hope to have the framework set up so that enabling for other 
architectures is relatively easy and can be done by those interested.

As I understand it, `hwasan-abi=interceptor` vs `platform` is about 
adding such MTE emulation for "application code" or "platform code (e.g. 
kernel)" respectively.

> 
> 2) Do you have a real aarch64 hardware that has MTE support? Would it be possible
>     for the future to give such a machine to GCC Compile Farm for testing purpose?

No our team doesn't have real MTE hardware, I have been testing on an 
AArch64 machine that has TBI, other work in the team that requires MTE 
support is being tested on the Arm "Fast Models" emulator.

> 
> 3) I like the idea of sharing of internal functions like ASAN_CHECK/HWASAN_CHECK.
>     We should benefit from that in the future.
> 
> 4) Am I correct that due to escape of "tagged" pointers, one needs to have an entire
> DSO (dynamic shared object) built with hwasan enabled? Otherwise, a dereference of
> a tagged pointer will lead to a segfault (except TBI feature on aarch64)?

Yes, one needs to take pains to avoid the escape of tagged pointers on 
architectures other than AArch64.

I don't believe that compiling the entire DSO with HWASAN enabled is 
enough, since pointers can be passed across DSO boundaries.
I haven't yet looked into how to handle this.

There's an even more fundamental problem of accesses within the 
instrumented binary -- I haven't yet figured out how to remove the tag 
before accesses on architectures without the AArch64 TBI feature.

> 
> 5) Is there a documentation/definition of how shadow memory for memory tagging looks like?
> Is it similar to ASAN, where one can get to tag with:
> u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
> 

Yes, it's similar.

 From the libhwasan code, the function to fetch a pointer to the shadow 
memory byte corresponding to a memory address is MemToShadow.

constexpr uptr kShadowScale = 4;
inline uptr MemToShadow(uptr untagged_addr) {
   return (untagged_addr >> kShadowScale) +
          __hwasan_shadow_memory_dynamic_address;
}

https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42

> 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI of libsanitizer
> 

Yes, the size of these values define an ABI.

Those particular hooks are added as a demonstration for how something 
like MTE would be implemented on top of this framework (where the 
backend would specify the tag and granule size to match their targets 
architecture).

HWASAN itself would use the hard-coded tag and granule size that matches 
what libsanitizer uses.
https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36

I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in 
asan.h, and when using the sanitizer library the macro 
`HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.

>>
>> The current patch series is far from complete, but I'm posting the current state
>> to provide something to discuss at the Cauldron next week.
>>
>> In its current state, this sanitizer only works on AArch64 with a custom kernel
>> to allow tagged pointers in system calls.  This is discussed in the below link
>> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel allows
>> tagged pointers in syscalls.
> 
> Can you be please more specific. Is the MTE in upstream linux kernel? If so,
> starting from which version?

I find I can only make complicated statements remotely clear in bullet 
points ;-)

What I was trying to say was:
- HWASAN from this patch series requires AArch64 TBI.
   (I have not handled architectures without TBI)
- The upstream kernel does not accept tagged pointers in syscalls.
   (programs that use TBI must currently clear tags before passing
    pointers to the kernel)
- This patch series doesn't include any way to avoid passing tagged
   pointers to syscalls.
- Hence on order to test the sanitizer I'm using a kernel that has been
   patched to accept tagged pointers in many syscalls.
- The link to the android.com site is just another source describing the
   same requirement.

The support for the relaxed ABI (of accepting tagged pointers in various 
syscalls in the kernel) is being discussed on the kernel mailing list, 
the latest patchset I know of is here:
https://lkml.org/lkml/2019/7/25/725

I wasn't trying to say anything about MTE in that paragraph, but kernel 
support for MTE is not in upstream linux kernel and is currently being 
worked on.

> 
>> I have also not yet put tests into the DejaGNU framework, but instead have a
>> simple test file from which the tests will eventually come.  That test file is
>> attached to this email despite not being in the patch series.
>>
>> Something close to this patch series bootstraps and passes most regression
>> tests when ~--with-build-config=bootstrap-hwasan~ is used.  The regressions it
>> doesn't pass are all the other sanitizer tests and all linker plugin tests.
>> The linker plugin tests fail due to a configuration problem where the library
>> path is not correctly set.
>> (I say "something close to this patch series" because I recently made a change
>> that breaks bootstrap but I believe is the best approach once I've fixed it,
>> hence for an RFC I'm leaving it in).
>>
>> HWASAN works by storing a tag in the top bits of every pointer and a colour in
>> a shadow memory region corresponding to every area of memory.  On every memory
>> access through a pointer the tag in the pointer is checked against the colour in
>> shadow memory corresponding to the memory the pointer is accessing.  If the tag
>> and colour do not match then a fault is signalled.
>>
>> The instrumentation required for this sanitizer has a large overlap with the
>> instrumentation required for implementing MTE (which has similar functionality
>> but checks are automatically done in the hardware and instructions for colouring
>> shadow memory and for managing tags are provided by the architecture).
>> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
>>
>> We hope to use the HWASAN framework to implement MTE tagging on the stack, and
>> hence I have a "dummy" patch demonstrating the approach envisaged for this.
> 
> What's the situation with heap allocated memory and global variables?

For the heap, whatever library function allocates memory should return a 
tagged pointer and colour the shadow memory accordingly.  This pointer 
can then be treated exactly the same as all other pointers in 
instrumented code.
On freeing of memory the shadow memory is uncoloured in order to detect 
use-after-free.

For HWASAN this means malloc and friends need to be intercepted, and 
this is done by the runtime library.

For MTE there will need to be some updates in the system libraries.
A discussion on the way this will be done in glibc has been started here:
https://www.sourceware.org/ml/libc-alpha/2019-09/msg00114.html

Global variables are untagged.

For MTE we are planning on having these untagged.
This is in order to allow uninstrumented object files to be statically 
linked into MTE aware object files.
Since global object accesses are directly generated into the code, there 
would be no way to tag global objects and still use the code from that 
static object.

Since global objects will not be coloured for MTE, I am not planning on 
colouring them for HWASAN.  There would be a reasonable amount of work, 
including a new mechanism for associating objects with tags.

Having all global variables untagged means that nothing need be done, 
all pointers to global variables will have a tag of zero and the shadow 
memory will correspondingly be left coloured as zero.

> 
>>
>> Though there is still much to implement here, the general approach should be
>> clear.  Any feedback is welcomed, but I have three main points that I'm
>> particularly hoping for external opinions.
>>
>> 1) The current approach stores a tag on the RTL representing a given variable,
>>     in order to implement HWASAN for x86_64 the tag needs to be removed before
>>     every memory access but not on things like function calls.
>>     Is there any obvious way to handle removing the tag in these places?
>>     Maybe something with legitimize_address?
> 
> Not being a target expect, but I bet you'll need to store the tag with a RTL
> representation of a stack variable.
> 
> Thanks,
> Martin
> 
>> 2) The first draft presented here introduces a new RTL expression called
>>     ADDTAG.  I now believe that a hook would be neater here but haven't yet
>>     looked into it.  Do people agree?
>>     (addtag is introduced in the patch titled "Put tags into each stack variable
>>     pointer", but the reason it's introduced is so the backend can define how
>>     this gets implemented with a ~define_expand~ and that's only needed for the
>>     MTE handling as introduced in "Add in MTE stubs")
>> 3) This patch series has not yet had much thought go towards it around command
>>     line arguments.  I personally quite like the idea of having
>>     ~-fsanitize=hwaddress~ turn on "checking memory tags against shadow memory
>>     colour", and MTE being just a hardware acceleration of this ability.
>>     I suspect this idea wouldn't be liked by all and would like to hear some
>>     opinions.
>>
>> Thanks,
>> Matthew
>>
>

next prev parent reply	other threads:[~2019-09-09 15:55 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06 14:46 Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 7/X][libsanitizer] Add option to bootstrap using HWASAN Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 8/X][libsanitizer] Ensure HWASAN required alignment for stack variables Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 5/X][libsanitizer] Introduce longjmp/setjmp interceptors to libhwasan Matthew Malcomson
2019-09-09 10:02   ` Martin Liška
2019-09-09 10:29     ` Matthew Malcomson
2019-09-09 10:49       ` Martin Liška
2019-09-06 14:46 ` [RFC][PATCH 3/X][libsanitizer] Allow compilation for HWASAN_WITH_INTERCEPTORS=OFF Matthew Malcomson
2019-09-09  9:27   ` Martin Liška
2019-09-06 14:46 ` [RFC][PATCH 2/X][libsanitizer] Tie the hwasan library into our build system Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 4/X][libsanitizer] Pass size and pointer info to error reporting functions Matthew Malcomson
2019-09-09  9:27   ` Martin Liška
2019-09-06 14:46 ` [RFC][PATCH 14/X][libsanitizer] Introduce HWASAN block-scope poisoning Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 1/X][libsanitizer] Introduce libsanitizer to GCC tree Matthew Malcomson
2019-09-09  9:26   ` Martin Liška
2019-09-06 14:47 ` [RFC][PATCH 10/X][libsanitizer] Colour the shadow stack for each stack variable Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 13/X][libsanitizer] Instrument known builtin function calls Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 11/X][libsanitizer] Uncolour stack frame on function exit Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 15/X][libsanitizer] Add in MTE stubs Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 16/X][libsanitizer] Build libhwasan with interceptors Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 12/X][libsanitizer] Check pointer tags match address tags Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 9/X][libsanitizer] Put tags into each stack variable pointer Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 6/X][libsanitizer] Add -fsanitize=hwaddress flags Matthew Malcomson
2019-09-09 10:06   ` Martin Liška
2019-09-09 10:18     ` Matthew Malcomson
2019-09-09 10:20       ` Martin Liška
2019-09-09 10:47 ` [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC Martin Liška
2019-09-09 15:55   ` Matthew Malcomson [this message]
2019-09-10  1:06     ` Kostya Serebryany via gcc-patches
2019-09-11 11:53     ` Martin Liška
2019-09-11 16:37       ` Matthew Malcomson
2019-09-11 18:34         ` Evgenii Stepanov via gcc-patches
2019-09-23  8:02 ` Martin Liška
2019-10-23 11:02   ` Matthew Malcomson
2019-10-24 10:11     ` Martin Liška

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8fc78139-481e-6dbc-0996-2cae58627c25@arm.com \
    --to=matthew.malcomson@arm.com \
    --cc=dodji@redhat.com \
    --cc=dvyukov@google.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kcc@google.com \
    --cc=mliska@suse.cz \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).