Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Kostya Serebryany via gcc-patches" <gcc-patches@gcc.gnu.org>
To: Matthew Malcomson <Matthew.Malcomson@arm.com>,
	Peter Collingbourne <pcc@google.com>,
		Evgeniy Stepanov <eugenis@google.com>
Cc: "Martin Liška" <mliska@suse.cz>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	"dodji@redhat.com" <dodji@redhat.com>, nd <nd@arm.com>,
	"jakub@redhat.com" <jakub@redhat.com>,
	"dvyukov@google.com" <dvyukov@google.com>
Subject: Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC
Date: Tue, 10 Sep 2019 01:06:00 -0000	[thread overview]
Message-ID: <CAN=P9phAe+GDvEt3gP_6r=MRexgzdMODHjfRtApJ3QX=5TNtFA@mail.gmail.com> (raw)
In-Reply-To: <8fc78139-481e-6dbc-0996-2cae58627c25@arm.com>

+Peter Collingbourne +Evgeniy Stepanov (the main developers of HWASAN
in LLVM,  FYI)
Please note that Peter has recently implemented support for globals in
LLVM's HWASAN.

--kcc

On Mon, Sep 9, 2019 at 8:55 AM Matthew Malcomson
<Matthew.Malcomson@arm.com> wrote:
>
> On 09/09/19 11:47, Martin Liška wrote:
> > On 9/6/19 4:46 PM, Matthew Malcomson wrote:
> >> Hello,
> >>
> >> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardware
> >> address sanitizer (HWASAN) in GCC.  The document describing HWASAN can be found
> >> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html.
> >
> > Hello.
> >
> > I'm happy that you are working on the functionality for GCC and I can provide
> > my knowledge that I have with ASAN. I briefly read the patch series and I have
> > multiple questions (and observations):
> >
> > 1) Is the ambition of the patchset to be a software emulation of MTE that can
> >     work targets that do not support MTE? Is it something what clang
> >     names hwasan-abi=interceptor?
>
> The ambition is to provide a software emulation of MTE for AArch64
> targets that don't support MTE.
> I also hope to have the framework set up so that enabling for other
> architectures is relatively easy and can be done by those interested.
>
> As I understand it, `hwasan-abi=interceptor` vs `platform` is about
> adding such MTE emulation for "application code" or "platform code (e.g.
> kernel)" respectively.
>
> >
> > 2) Do you have a real aarch64 hardware that has MTE support? Would it be possible
> >     for the future to give such a machine to GCC Compile Farm for testing purpose?
>
> No our team doesn't have real MTE hardware, I have been testing on an
> AArch64 machine that has TBI, other work in the team that requires MTE
> support is being tested on the Arm "Fast Models" emulator.
>
> >
> > 3) I like the idea of sharing of internal functions like ASAN_CHECK/HWASAN_CHECK.
> >     We should benefit from that in the future.
> >
> > 4) Am I correct that due to escape of "tagged" pointers, one needs to have an entire
> > DSO (dynamic shared object) built with hwasan enabled? Otherwise, a dereference of
> > a tagged pointer will lead to a segfault (except TBI feature on aarch64)?
>
>
> Yes, one needs to take pains to avoid the escape of tagged pointers on
> architectures other than AArch64.
>
> I don't believe that compiling the entire DSO with HWASAN enabled is
> enough, since pointers can be passed across DSO boundaries.
> I haven't yet looked into how to handle this.
>
> There's an even more fundamental problem of accesses within the
> instrumented binary -- I haven't yet figured out how to remove the tag
> before accesses on architectures without the AArch64 TBI feature.
>
>
> >
> > 5) Is there a documentation/definition of how shadow memory for memory tagging looks like?
> > Is it similar to ASAN, where one can get to tag with:
> > u8 memory_tag = *((PTR >> TG) + SHADOW_OFFSET) & 0xf?
> >
>
> Yes, it's similar.
>
>  From the libhwasan code, the function to fetch a pointer to the shadow
> memory byte corresponding to a memory address is MemToShadow.
>
> constexpr uptr kShadowScale = 4;
> inline uptr MemToShadow(uptr untagged_addr) {
>    return (untagged_addr >> kShadowScale) +
>           __hwasan_shadow_memory_dynamic_address;
> }
>
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42
>
>
> > 6) Note that thing like memtag_tag_size, memtag_granule_size define an ABI of libsanitizer
> >
>
> Yes, the size of these values define an ABI.
>
> Those particular hooks are added as a demonstration for how something
> like MTE would be implemented on top of this framework (where the
> backend would specify the tag and granule size to match their targets
> architecture).
>
> HWASAN itself would use the hard-coded tag and granule size that matches
> what libsanitizer uses.
> https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36
>
> I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in
> asan.h, and when using the sanitizer library the macro
> `HARDWARE_MEMORY_TAGGING` would be false so their values would be constant.
>
>
> >>
> >> The current patch series is far from complete, but I'm posting the current state
> >> to provide something to discuss at the Cauldron next week.
> >>
> >> In its current state, this sanitizer only works on AArch64 with a custom kernel
> >> to allow tagged pointers in system calls.  This is discussed in the below link
> >> https://source.android.com/devices/tech/debug/hwasan -- the custom kernel allows
> >> tagged pointers in syscalls.
> >
> > Can you be please more specific. Is the MTE in upstream linux kernel? If so,
> > starting from which version?
>
> I find I can only make complicated statements remotely clear in bullet
> points ;-)
>
> What I was trying to say was:
> - HWASAN from this patch series requires AArch64 TBI.
>    (I have not handled architectures without TBI)
> - The upstream kernel does not accept tagged pointers in syscalls.
>    (programs that use TBI must currently clear tags before passing
>     pointers to the kernel)
> - This patch series doesn't include any way to avoid passing tagged
>    pointers to syscalls.
> - Hence on order to test the sanitizer I'm using a kernel that has been
>    patched to accept tagged pointers in many syscalls.
> - The link to the android.com site is just another source describing the
>    same requirement.
>
>
> The support for the relaxed ABI (of accepting tagged pointers in various
> syscalls in the kernel) is being discussed on the kernel mailing list,
> the latest patchset I know of is here:
> https://lkml.org/lkml/2019/7/25/725
>
> I wasn't trying to say anything about MTE in that paragraph, but kernel
> support for MTE is not in upstream linux kernel and is currently being
> worked on.
>
> >
> >> I have also not yet put tests into the DejaGNU framework, but instead have a
> >> simple test file from which the tests will eventually come.  That test file is
> >> attached to this email despite not being in the patch series.
> >>
> >> Something close to this patch series bootstraps and passes most regression
> >> tests when ~--with-build-config=bootstrap-hwasan~ is used.  The regressions it
> >> doesn't pass are all the other sanitizer tests and all linker plugin tests.
> >> The linker plugin tests fail due to a configuration problem where the library
> >> path is not correctly set.
> >> (I say "something close to this patch series" because I recently made a change
> >> that breaks bootstrap but I believe is the best approach once I've fixed it,
> >> hence for an RFC I'm leaving it in).
> >>
> >> HWASAN works by storing a tag in the top bits of every pointer and a colour in
> >> a shadow memory region corresponding to every area of memory.  On every memory
> >> access through a pointer the tag in the pointer is checked against the colour in
> >> shadow memory corresponding to the memory the pointer is accessing.  If the tag
> >> and colour do not match then a fault is signalled.
> >>
> >> The instrumentation required for this sanitizer has a large overlap with the
> >> instrumentation required for implementing MTE (which has similar functionality
> >> but checks are automatically done in the hardware and instructions for colouring
> >> shadow memory and for managing tags are provided by the architecture).
> >> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
> >>
> >> We hope to use the HWASAN framework to implement MTE tagging on the stack, and
> >> hence I have a "dummy" patch demonstrating the approach envisaged for this.
> >
> > What's the situation with heap allocated memory and global variables?
>
> For the heap, whatever library function allocates memory should return a
> tagged pointer and colour the shadow memory accordingly.  This pointer
> can then be treated exactly the same as all other pointers in
> instrumented code.
> On freeing of memory the shadow memory is uncoloured in order to detect
> use-after-free.
>
> For HWASAN this means malloc and friends need to be intercepted, and
> this is done by the runtime library.
>
> For MTE there will need to be some updates in the system libraries.
> A discussion on the way this will be done in glibc has been started here:
> https://www.sourceware.org/ml/libc-alpha/2019-09/msg00114.html
>
>
>
> Global variables are untagged.
>
> For MTE we are planning on having these untagged.
> This is in order to allow uninstrumented object files to be statically
> linked into MTE aware object files.
> Since global object accesses are directly generated into the code, there
> would be no way to tag global objects and still use the code from that
> static object.
>
>
> Since global objects will not be coloured for MTE, I am not planning on
> colouring them for HWASAN.  There would be a reasonable amount of work,
> including a new mechanism for associating objects with tags.
>
> Having all global variables untagged means that nothing need be done,
> all pointers to global variables will have a tag of zero and the shadow
> memory will correspondingly be left coloured as zero.
>
> >
> >>
> >> Though there is still much to implement here, the general approach should be
> >> clear.  Any feedback is welcomed, but I have three main points that I'm
> >> particularly hoping for external opinions.
> >>
> >> 1) The current approach stores a tag on the RTL representing a given variable,
> >>     in order to implement HWASAN for x86_64 the tag needs to be removed before
> >>     every memory access but not on things like function calls.
> >>     Is there any obvious way to handle removing the tag in these places?
> >>     Maybe something with legitimize_address?
> >
> > Not being a target expect, but I bet you'll need to store the tag with a RTL
> > representation of a stack variable.
> >
> > Thanks,
> > Martin
> >
> >> 2) The first draft presented here introduces a new RTL expression called
> >>     ADDTAG.  I now believe that a hook would be neater here but haven't yet
> >>     looked into it.  Do people agree?
> >>     (addtag is introduced in the patch titled "Put tags into each stack variable
> >>     pointer", but the reason it's introduced is so the backend can define how
> >>     this gets implemented with a ~define_expand~ and that's only needed for the
> >>     MTE handling as introduced in "Add in MTE stubs")
> >> 3) This patch series has not yet had much thought go towards it around command
> >>     line arguments.  I personally quite like the idea of having
> >>     ~-fsanitize=hwaddress~ turn on "checking memory tags against shadow memory
> >>     colour", and MTE being just a hardware acceleration of this ability.
> >>     I suspect this idea wouldn't be liked by all and would like to hear some
> >>     opinions.
> >>
> >> Thanks,
> >> Matthew
> >>
> >
>

next prev parent reply	other threads:[~2019-09-10  1:06 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06 14:46 Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 3/X][libsanitizer] Allow compilation for HWASAN_WITH_INTERCEPTORS=OFF Matthew Malcomson
2019-09-09  9:27   ` Martin Liška
2019-09-06 14:46 ` [RFC][PATCH 2/X][libsanitizer] Tie the hwasan library into our build system Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 4/X][libsanitizer] Pass size and pointer info to error reporting functions Matthew Malcomson
2019-09-09  9:27   ` Martin Liška
2019-09-06 14:46 ` [RFC][PATCH 7/X][libsanitizer] Add option to bootstrap using HWASAN Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 8/X][libsanitizer] Ensure HWASAN required alignment for stack variables Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 5/X][libsanitizer] Introduce longjmp/setjmp interceptors to libhwasan Matthew Malcomson
2019-09-09 10:02   ` Martin Liška
2019-09-09 10:29     ` Matthew Malcomson
2019-09-09 10:49       ` Martin Liška
2019-09-06 14:46 ` [RFC][PATCH 14/X][libsanitizer] Introduce HWASAN block-scope poisoning Matthew Malcomson
2019-09-06 14:46 ` [RFC][PATCH 1/X][libsanitizer] Introduce libsanitizer to GCC tree Matthew Malcomson
2019-09-09  9:26   ` Martin Liška
2019-09-06 14:47 ` [RFC][PATCH 10/X][libsanitizer] Colour the shadow stack for each stack variable Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 13/X][libsanitizer] Instrument known builtin function calls Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 16/X][libsanitizer] Build libhwasan with interceptors Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 11/X][libsanitizer] Uncolour stack frame on function exit Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 15/X][libsanitizer] Add in MTE stubs Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 12/X][libsanitizer] Check pointer tags match address tags Matthew Malcomson
2019-09-06 14:47 ` [RFC][PATCH 6/X][libsanitizer] Add -fsanitize=hwaddress flags Matthew Malcomson
2019-09-09 10:06   ` Martin Liška
2019-09-09 10:18     ` Matthew Malcomson
2019-09-09 10:20       ` Martin Liška
2019-09-06 14:47 ` [RFC][PATCH 9/X][libsanitizer] Put tags into each stack variable pointer Matthew Malcomson
2019-09-09 10:47 ` [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC Martin Liška
2019-09-09 15:55   ` Matthew Malcomson
2019-09-10  1:06     ` Kostya Serebryany via gcc-patches [this message]
2019-09-11 11:53     ` Martin Liška
2019-09-11 16:37       ` Matthew Malcomson
2019-09-11 18:34         ` Evgenii Stepanov via gcc-patches
2019-09-23  8:02 ` Martin Liška
2019-10-23 11:02   ` Matthew Malcomson
2019-10-24 10:11     ` Martin Liška

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN=P9phAe+GDvEt3gP_6r=MRexgzdMODHjfRtApJ3QX=5TNtFA@mail.gmail.com' \
    --to=gcc-patches@gcc.gnu.org \
    --cc=Matthew.Malcomson@arm.com \
    --cc=dodji@redhat.com \
    --cc=dvyukov@google.com \
    --cc=eugenis@google.com \
    --cc=jakub@redhat.com \
    --cc=kcc@google.com \
    --cc=mliska@suse.cz \
    --cc=nd@arm.com \
    --cc=pcc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).