From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25936 invoked by alias); 10 Sep 2019 01:06:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 25924 invoked by uid 89); 10 Sep 2019 01:06:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00,ENV_AND_HDR_SPF_MATCH,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS,USER_IN_DEF_SPF_WL autolearn=ham version=3.3.1 spammy=peter, Peter X-HELO: mail-yw1-f67.google.com Received: from mail-yw1-f67.google.com (HELO mail-yw1-f67.google.com) (209.85.161.67) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Sep 2019 01:06:03 +0000 Received: by mail-yw1-f67.google.com with SMTP id d19so3819448ywa.0 for ; Mon, 09 Sep 2019 18:06:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=YNHm5iYjMvqvRVrWQ+TsnhxEmjkApNuYnpDK348nm9E=; b=mEbpPZgoLepci3SedDvYtohnqJc6k3pTvkcrpZp6E/15FA3sec3TMMlb2F1pCQhQwR SmhNoQNyKy6CvmJEe2zd4+gTNqf5XVY8Q4i/tob0iSS+jSxo8wTkeUJYN/Ajb0d84PZc UnX0/tAT98iFz6c+Lk87R9i6CWe8OQKv8FTee+DIBAPEK9kWUKwcmO0GFtf6NqqwISZd wS0J1SLU6LM7rHgJnUiSerkckX6z55lC0oPrFVR/HI/X0LQVmTLpUXwJXDST7msmdfsF tPQtaxjBzsjyW6daUheK8lcPW5ydUII9fY9qtZVqXc12SmYGYwZa0kd7gJ9YLWbzw/Q2 aRjw== MIME-Version: 1.0 References: <156778058239.16148.17480879484406897649.scripted-patch-series@arm.com> <936e0222-0b05-b4de-7a68-9b91e79a6f76@suse.cz> <8fc78139-481e-6dbc-0996-2cae58627c25@arm.com> In-Reply-To: <8fc78139-481e-6dbc-0996-2cae58627c25@arm.com> From: "Kostya Serebryany via gcc-patches" Reply-To: Kostya Serebryany Date: Tue, 10 Sep 2019 01:06:00 -0000 Message-ID: Subject: Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC To: Matthew Malcomson , Peter Collingbourne , Evgeniy Stepanov Cc: =?UTF-8?Q?Martin_Li=C5=A1ka?= , "gcc-patches@gcc.gnu.org" , "dodji@redhat.com" , nd , "jakub@redhat.com" , "dvyukov@google.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-SW-Source: 2019-09/txt/msg00631.txt.bz2 +Peter Collingbourne +Evgeniy Stepanov (the main developers of HWASAN in LLVM, FYI) Please note that Peter has recently implemented support for globals in LLVM's HWASAN. --kcc On Mon, Sep 9, 2019 at 8:55 AM Matthew Malcomson wrote: > > On 09/09/19 11:47, Martin Li=C5=A1ka wrote: > > On 9/6/19 4:46 PM, Matthew Malcomson wrote: > >> Hello, > >> > >> This patch series is a WORK-IN-PROGRESS towards porting the LLVM hardw= are > >> address sanitizer (HWASAN) in GCC. The document describing HWASAN can= be found > >> here http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign= .html. > > > > Hello. > > > > I'm happy that you are working on the functionality for GCC and I can p= rovide > > my knowledge that I have with ASAN. I briefly read the patch series and= I have > > multiple questions (and observations): > > > > 1) Is the ambition of the patchset to be a software emulation of MTE th= at can > > work targets that do not support MTE? Is it something what clang > > names hwasan-abi=3Dinterceptor? > > The ambition is to provide a software emulation of MTE for AArch64 > targets that don't support MTE. > I also hope to have the framework set up so that enabling for other > architectures is relatively easy and can be done by those interested. > > As I understand it, `hwasan-abi=3Dinterceptor` vs `platform` is about > adding such MTE emulation for "application code" or "platform code (e.g. > kernel)" respectively. > > > > > 2) Do you have a real aarch64 hardware that has MTE support? Would it b= e possible > > for the future to give such a machine to GCC Compile Farm for testi= ng purpose? > > No our team doesn't have real MTE hardware, I have been testing on an > AArch64 machine that has TBI, other work in the team that requires MTE > support is being tested on the Arm "Fast Models" emulator. > > > > > 3) I like the idea of sharing of internal functions like ASAN_CHECK/HWA= SAN_CHECK. > > We should benefit from that in the future. > > > > 4) Am I correct that due to escape of "tagged" pointers, one needs to h= ave an entire > > DSO (dynamic shared object) built with hwasan enabled? Otherwise, a der= eference of > > a tagged pointer will lead to a segfault (except TBI feature on aarch64= )? > > > Yes, one needs to take pains to avoid the escape of tagged pointers on > architectures other than AArch64. > > I don't believe that compiling the entire DSO with HWASAN enabled is > enough, since pointers can be passed across DSO boundaries. > I haven't yet looked into how to handle this. > > There's an even more fundamental problem of accesses within the > instrumented binary -- I haven't yet figured out how to remove the tag > before accesses on architectures without the AArch64 TBI feature. > > > > > > 5) Is there a documentation/definition of how shadow memory for memory = tagging looks like? > > Is it similar to ASAN, where one can get to tag with: > > u8 memory_tag =3D *((PTR >> TG) + SHADOW_OFFSET) & 0xf? > > > > Yes, it's similar. > > From the libhwasan code, the function to fetch a pointer to the shadow > memory byte corresponding to a memory address is MemToShadow. > > constexpr uptr kShadowScale =3D 4; > inline uptr MemToShadow(uptr untagged_addr) { > return (untagged_addr >> kShadowScale) + > __hwasan_shadow_memory_dynamic_address; > } > > https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829= bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L42 > > > > 6) Note that thing like memtag_tag_size, memtag_granule_size define an = ABI of libsanitizer > > > > Yes, the size of these values define an ABI. > > Those particular hooks are added as a demonstration for how something > like MTE would be implemented on top of this framework (where the > backend would specify the tag and granule size to match their targets > architecture). > > HWASAN itself would use the hard-coded tag and granule size that matches > what libsanitizer uses. > https://github.com/llvm-mirror/compiler-rt/blob/99ce9876124e910475c627829= bf14326b8073a9d/lib/hwasan/hwasan_mapping.h#L36 > > I define these as `HWASAN_TAG_SIZE` and `HWASAN_TAG_GRANULE_SIZE` in > asan.h, and when using the sanitizer library the macro > `HARDWARE_MEMORY_TAGGING` would be false so their values would be constan= t. > > > >> > >> The current patch series is far from complete, but I'm posting the cur= rent state > >> to provide something to discuss at the Cauldron next week. > >> > >> In its current state, this sanitizer only works on AArch64 with a cust= om kernel > >> to allow tagged pointers in system calls. This is discussed in the be= low link > >> https://source.android.com/devices/tech/debug/hwasan -- the custom ker= nel allows > >> tagged pointers in syscalls. > > > > Can you be please more specific. Is the MTE in upstream linux kernel? I= f so, > > starting from which version? > > I find I can only make complicated statements remotely clear in bullet > points ;-) > > What I was trying to say was: > - HWASAN from this patch series requires AArch64 TBI. > (I have not handled architectures without TBI) > - The upstream kernel does not accept tagged pointers in syscalls. > (programs that use TBI must currently clear tags before passing > pointers to the kernel) > - This patch series doesn't include any way to avoid passing tagged > pointers to syscalls. > - Hence on order to test the sanitizer I'm using a kernel that has been > patched to accept tagged pointers in many syscalls. > - The link to the android.com site is just another source describing the > same requirement. > > > The support for the relaxed ABI (of accepting tagged pointers in various > syscalls in the kernel) is being discussed on the kernel mailing list, > the latest patchset I know of is here: > https://lkml.org/lkml/2019/7/25/725 > > I wasn't trying to say anything about MTE in that paragraph, but kernel > support for MTE is not in upstream linux kernel and is currently being > worked on. > > > > >> I have also not yet put tests into the DejaGNU framework, but instead = have a > >> simple test file from which the tests will eventually come. That test= file is > >> attached to this email despite not being in the patch series. > >> > >> Something close to this patch series bootstraps and passes most regres= sion > >> tests when ~--with-build-config=3Dbootstrap-hwasan~ is used. The regr= essions it > >> doesn't pass are all the other sanitizer tests and all linker plugin t= ests. > >> The linker plugin tests fail due to a configuration problem where the = library > >> path is not correctly set. > >> (I say "something close to this patch series" because I recently made = a change > >> that breaks bootstrap but I believe is the best approach once I've fix= ed it, > >> hence for an RFC I'm leaving it in). > >> > >> HWASAN works by storing a tag in the top bits of every pointer and a c= olour in > >> a shadow memory region corresponding to every area of memory. On ever= y memory > >> access through a pointer the tag in the pointer is checked against the= colour in > >> shadow memory corresponding to the memory the pointer is accessing. I= f the tag > >> and colour do not match then a fault is signalled. > >> > >> The instrumentation required for this sanitizer has a large overlap wi= th the > >> instrumentation required for implementing MTE (which has similar funct= ionality > >> but checks are automatically done in the hardware and instructions for= colouring > >> shadow memory and for managing tags are provided by the architecture). > >> https://community.arm.com/developer/ip-products/processors/b/processor= s-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a > >> > >> We hope to use the HWASAN framework to implement MTE tagging on the st= ack, and > >> hence I have a "dummy" patch demonstrating the approach envisaged for = this. > > > > What's the situation with heap allocated memory and global variables? > > For the heap, whatever library function allocates memory should return a > tagged pointer and colour the shadow memory accordingly. This pointer > can then be treated exactly the same as all other pointers in > instrumented code. > On freeing of memory the shadow memory is uncoloured in order to detect > use-after-free. > > For HWASAN this means malloc and friends need to be intercepted, and > this is done by the runtime library. > > For MTE there will need to be some updates in the system libraries. > A discussion on the way this will be done in glibc has been started here: > https://www.sourceware.org/ml/libc-alpha/2019-09/msg00114.html > > > > Global variables are untagged. > > For MTE we are planning on having these untagged. > This is in order to allow uninstrumented object files to be statically > linked into MTE aware object files. > Since global object accesses are directly generated into the code, there > would be no way to tag global objects and still use the code from that > static object. > > > Since global objects will not be coloured for MTE, I am not planning on > colouring them for HWASAN. There would be a reasonable amount of work, > including a new mechanism for associating objects with tags. > > Having all global variables untagged means that nothing need be done, > all pointers to global variables will have a tag of zero and the shadow > memory will correspondingly be left coloured as zero. > > > > >> > >> Though there is still much to implement here, the general approach sho= uld be > >> clear. Any feedback is welcomed, but I have three main points that I'm > >> particularly hoping for external opinions. > >> > >> 1) The current approach stores a tag on the RTL representing a given v= ariable, > >> in order to implement HWASAN for x86_64 the tag needs to be remove= d before > >> every memory access but not on things like function calls. > >> Is there any obvious way to handle removing the tag in these place= s? > >> Maybe something with legitimize_address? > > > > Not being a target expect, but I bet you'll need to store the tag with = a RTL > > representation of a stack variable. > > > > Thanks, > > Martin > > > >> 2) The first draft presented here introduces a new RTL expression call= ed > >> ADDTAG. I now believe that a hook would be neater here but haven'= t yet > >> looked into it. Do people agree? > >> (addtag is introduced in the patch titled "Put tags into each stac= k variable > >> pointer", but the reason it's introduced is so the backend can def= ine how > >> this gets implemented with a ~define_expand~ and that's only neede= d for the > >> MTE handling as introduced in "Add in MTE stubs") > >> 3) This patch series has not yet had much thought go towards it around= command > >> line arguments. I personally quite like the idea of having > >> ~-fsanitize=3Dhwaddress~ turn on "checking memory tags against sha= dow memory > >> colour", and MTE being just a hardware acceleration of this abilit= y. > >> I suspect this idea wouldn't be liked by all and would like to hea= r some > >> opinions. > >> > >> Thanks, > >> Matthew > >> > > >