From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-508841-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 116438 invoked by alias); 11 Sep 2019 11:53:10 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 116430 invoked by uid 89); 11 Sep 2019 11:53:09 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-5.5 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.1 spammy=suffering, opinions, H*f:sk:936e022, uninstrumented
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 11 Sep 2019 11:53:07 +0000
Received: from relay2.suse.de (unknown [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id DB421B693;	Wed, 11 Sep 2019 11:53:04 +0000 (UTC)
Subject: Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC
To: Matthew Malcomson <Matthew.Malcomson@arm.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: "dodji@redhat.com" <dodji@redhat.com>, nd <nd@arm.com>, "kcc@google.com" <kcc@google.com>, "jakub@redhat.com" <jakub@redhat.com>, "dvyukov@google.com" <dvyukov@google.com>
References: <156778058239.16148.17480879484406897649.scripted-patch-series@arm.com> <936e0222-0b05-b4de-7a68-9b91e79a6f76@suse.cz> <8fc78139-481e-6dbc-0996-2cae58627c25@arm.com>
From: =?UTF-8?Q?Martin_Li=c5=a1ka?= <mliska@suse.cz>
Message-ID: <111f6243-834f-9095-274e-f003cf329509@suse.cz>
Date: Wed, 11 Sep 2019 11:53:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <8fc78139-481e-6dbc-0996-2cae58627c25@arm.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2019-09/txt/msg00732.txt.bz2

On 9/9/19 5:54 PM, Matthew Malcomson wrote:
> On 09/09/19 11:47, Martin Lika wrote:
>> On 9/6/19 4:46 PM, Matthew Malcomson wrote:
>>> Hello,
>>>
>>> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
>>> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be found
>>> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
>>
>> Hello.
>>
>> I'm happy that you are working on the functionality for GCC and I can provide
>> my knowledge that I have with ASAN. I briefly read the patch series and I have
>> multiple questions (and observations):
>>
>> 1) Is the ambition of the patchset to be a software emulation of MTE that can
>>     work targets that do not support MTE? Is it something what clang
>>     names hwasan-abi=interceptor?
> 
> The ambition is to provide a software emulation of MTE for AArch64 
> targets that don't support MTE.

Hello.

It would be also great to provide the emulation on targets that do not provide TBI
(like x86_64).

> I also hope to have the framework set up so that enabling for other 
> architectures is relatively easy and can be done by those interested.
> 
> As I understand it, `hwasan-abi=interceptor` vs `platform` is about 
> adding such MTE emulation for "application code" or "platform code (e.g. 
> kernel)" respectively.

Hm, are you sure? Clang also uses -fsanitize=kernel-hwaddress which should
be equivalent to kernel-address for -fsanitize=address.

> 
>>
>> 2) Do you have a real aarch64 hardware that has MTE support? Would it be possible
>>     for the future to give such a machine to GCC Compile Farm for testing purpose?
> 
> No our team doesn't have real MTE hardware, I have been testing on an 
> AArch64 machine that has TBI, other work in the team that requires MTE 
> support is being tested on the Arm "Fast Models" emulator.
> 
>>
>> 3) I like the idea of sharing of internal functions like ASAN_CHECK/HWASAN_CHECK.
>>     We should benefit from that in the future.
>>
>> 4) Am I correct that due to escape of "tagged" pointers, one needs to have an entire
>> DSO (dynamic shared object) built with hwasan enabled? Otherwise, a dereference of
>> a tagged pointer will lead to a segfault (except TBI feature on aarch64)?
> 
> 
> Yes, one needs to take pains to avoid the escape of tagged pointers on 
> architectures other than AArch64.

Which is the very same pain which MPX was suffering from, before it was dropped
in GCC :)

> 
> I don't believe that compiling the entire DSO with HWASAN enabled is 
> enough, since pointers can be passed across DSO boundaries.
> I haven't yet looked into how to handle this.
> 
> There's an even more fundamental problem of accesses within the 
> instrumented binary -- I haven't yet figured out how to remove the tag 
> before accesses on architectures without the AArch64 TBI feature.

Which should platforms like x86_64, right?

> 
> 
>>
>> 5) Is there a documentation/definition of how shadow memory for memory tagging looks like?
>> Is it similar to ASAN, where one can get to tag with:
>> u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
>>
> 
> Yes, it's similar.
> 
>  From the libhwasan code, the function to fetch a pointer to the shadow 
> memory byte corresponding to a memory address is MemToShadow.
> 
> constexpr uptr kShadowScale = 4;
> inline uptr MemToShadow(uptr untagged_addr) {
>    return (untagged_addr >> kShadowScale) +
>           __hwasan_shadow_memory_dynamic_address;
> }
> 
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42
> 
> 
>> 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI of libsanitizer
>>
> 
> Yes, the size of these values define an ABI.
> 
> Those particular hooks are added as a demonstration for how something 
> like MTE would be implemented on top of this framework (where the 
> backend would specify the tag and granule size to match their targets 
> architecture).
> 
> HWASAN itself would use the hard-coded tag and granule size that matches 
> what libsanitizer uses.
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36
> 
> I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in 
> asan.h, and when using the sanitizer library the macro 
> `HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.
> 
> 
>>>
>>> The current patch series is far from complete, but I'm posting the current state
>>> to provide something to discuss at the Cauldron next week.
>>>
>>> In its current state, this sanitizer only works on AArch64 with a custom kernel
>>> to allow tagged pointers in system calls.  This is discussed in the below link
>>> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel allows
>>> tagged pointers in syscalls.
>>
>> Can you be please more specific. Is the MTE in upstream linux kernel? If so,
>> starting from which version?
> 
> I find I can only make complicated statements remotely clear in bullet 
> points ;-)
> 
> What I was trying to say was:
> - HWASAN from this patch series requires AArch64 TBI.
>    (I have not handled architectures without TBI)
> - The upstream kernel does not accept tagged pointers in syscalls.
>    (programs that use TBI must currently clear tags before passing
>     pointers to the kernel)

I know that in case of ASAN, the libasan provides wrappers (interceptors) for various glibc
functions that are often system calls. Similar wrappers are probably used in HWASAN
and so that one can create the memory pointer tags.

> - This patch series doesn't include any way to avoid passing tagged
>    pointers to syscalls.

I bet LLVM has the same problem so I would expect a handling in the interceptors.

> - Hence on order to test the sanitizer I'm using a kernel that has been
>    patched to accept tagged pointers in many syscalls.
> - The link to the android.com site is just another source describing the
>    same requirement.
> 
> 
> The support for the relaxed ABI (of accepting tagged pointers in various 
> syscalls in the kernel) is being discussed on the kernel mailing list, 
> the latest patchset I know of is here:
> https://lkml.org/lkml/2019/7/25/725

Thanks for pointer.

> 
> I wasn't trying to say anything about MTE in that paragraph, but kernel 
> support for MTE is not in upstream linux kernel and is currently being 
> worked on.
> 
>>
>>> I have also not yet put tests into the DejaGNU framework, but instead have a
>>> simple test file from which the tests will eventually come.  That test file is
>>> attached to this email despite not being in the patch series.
>>>
>>> Something close to this patch series bootstraps and passes most regression
>>> tests when ~--with-build-config=bootstrap-hwasan~ is used.  The regressions it
>>> doesn't pass are all the other sanitizer tests and all linker plugin tests.
>>> The linker plugin tests fail due to a configuration problem where the library
>>> path is not correctly set.
>>> (I say "something close to this patch series" because I recently made a change
>>> that breaks bootstrap but I believe is the best approach once I've fixed it,
>>> hence for an RFC I'm leaving it in).
>>>
>>> HWASAN works by storing a tag in the top bits of every pointer and a colour in
>>> a shadow memory region corresponding to every area of memory.  On every memory
>>> access through a pointer the tag in the pointer is checked against the colour in
>>> shadow memory corresponding to the memory the pointer is accessing.  If the tag
>>> and colour do not match then a fault is signalled.
>>>
>>> The instrumentation required for this sanitizer has a large overlap with the
>>> instrumentation required for implementing MTE (which has similar functionality
>>> but checks are automatically done in the hardware and instructions for colouring
>>> shadow memory and for managing tags are provided by the architecture).
>>> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
>>>
>>> We hope to use the HWASAN framework to implement MTE tagging on the stack, and
>>> hence I have a "dummy" patch demonstrating the approach envisaged for this.
>>
>> What's the situation with heap allocated memory and global variables?
> 
> For the heap, whatever library function allocates memory should return a 
> tagged pointer and colour the shadow memory accordingly.  This pointer 
> can then be treated exactly the same as all other pointers in 
> instrumented code.
> On freeing of memory the shadow memory is uncoloured in order to detect 
> use-after-free.
> 
> For HWASAN this means malloc and friends need to be intercepted, and 
> this is done by the runtime library.
> 
> For MTE there will need to be some updates in the system libraries.
> A discussion on the way this will be done in glibc has been started here:
> https://www.sourceware.org/ml/libc-alpha/2019-09/msg00114.html

I see.

Martin

> 
> 
> 
> Global variables are untagged.
> 
> For MTE we are planning on having these untagged.
> This is in order to allow uninstrumented object files to be statically 
> linked into MTE aware object files.
> Since global object accesses are directly generated into the code, there 
> would be no way to tag global objects and still use the code from that 
> static object.
> 
> 
> Since global objects will not be coloured for MTE, I am not planning on 
> colouring them for HWASAN.  There would be a reasonable amount of work, 
> including a new mechanism for associating objects with tags.
> 
> Having all global variables untagged means that nothing need be done, 
> all pointers to global variables will have a tag of zero and the shadow 
> memory will correspondingly be left coloured as zero.
> 
>>
>>>
>>> Though there is still much to implement here, the general approach should be
>>> clear.  Any feedback is welcomed, but I have three main points that I'm
>>> particularly hoping for external opinions.
>>>
>>> 1) The current approach stores a tag on the RTL representing a given variable,
>>>     in order to implement HWASAN for x86_64 the tag needs to be removed before
>>>     every memory access but not on things like function calls.
>>>     Is there any obvious way to handle removing the tag in these places?
>>>     Maybe something with legitimize_address?
>>
>> Not being a target expect, but I bet you'll need to store the tag with a RTL
>> representation of a stack variable.
>>
>> Thanks,
>> Martin
>>
>>> 2) The first draft presented here introduces a new RTL expression called
>>>     ADDTAG.  I now believe that a hook would be neater here but haven't yet
>>>     looked into it.  Do people agree?
>>>     (addtag is introduced in the patch titled "Put tags into each stack variable
>>>     pointer", but the reason it's introduced is so the backend can define how
>>>     this gets implemented with a ~define_expand~ and that's only needed for the
>>>     MTE handling as introduced in "Add in MTE stubs")
>>> 3) This patch series has not yet had much thought go towards it around command
>>>     line arguments.  I personally quite like the idea of having
>>>     ~-fsanitize=hwaddress~ turn on "checking memory tags against shadow memory
>>>     colour", and MTE being just a hardware acceleration of this ability.
>>>     I suspect this idea wouldn't be liked by all and would like to hear some
>>>     opinions.
>>>
>>> Thanks,
>>> Matthew
>>>
>>
>