From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8085 invoked by alias); 5 Sep 2018 14:11:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 8072 invoked by uid 89); 5 Sep 2018 14:11:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 05 Sep 2018 14:11:48 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6D7647A9; Wed, 5 Sep 2018 07:11:46 -0700 (PDT) Received: from e120077-lin.cambridge.arm.com (e120077-lin.cambridge.arm.com [10.2.206.194]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5DC473F575; Wed, 5 Sep 2018 07:11:45 -0700 (PDT) Subject: Re: [PATCH] Frame pointer for arm with THUMB2 mode To: Denis Khalikov , Wilco Dijkstra , GCC Patches Cc: "v.barinov@samsung.com" , nd References: <20180828104451eucas1p2f1de5376f7a84b71487d21f41b5fdbb5~PBnOkIagS0753307533eucas1p2n@eucas1p2.samsung.com> <20180905135616eucas1p2afa4bcdd31cbe719c778f87f0863e7d5~RhYpGQ6xx3240232402eucas1p2d@eucas1p2.samsung.com> From: "Richard Earnshaw (lists)" Openpgp: preference=signencrypt Message-ID: Date: Wed, 05 Sep 2018 14:11:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180905135616eucas1p2afa4bcdd31cbe719c778f87f0863e7d5~RhYpGQ6xx3240232402eucas1p2d@eucas1p2.samsung.com> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-09/txt/msg00312.txt.bz2 On 05/09/18 14:55, Denis Khalikov wrote: > Hi Wilco, > thanks for the answer. > >> Adding support for a frame chain would require an ABI change. It would > have to >> work across GCC, LLVM, Arm, Thumb-1 and Thumb-2 - not a trivial amount of >> effort. > > Clang already works that way. > Please look at this commit: > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMCallingConv.td?r1=269459&r2=269458&pathrev=269459 > > > This is an example of code which clang generates > with -mthumb -fno-omit-frame-pointer -O2. > >  @ %bb.0: >          push    {r4, r5, r6, r7, lr} >          add     r7, sp, #12 >          push.w  {r8, r9, r10} >          sub     sp, #56 >          mov     r8, r0 >          movw    r0, :lower16:_ZTVN6sensor14sensor_handlerE >          movt    r0, :upper16:_ZTVN6sensor14sensor_handlerE >          mov     r9, r8 >          adds    r0, #8 >          str     r0, [r9], #4 > > The only difference is clang sets frame pointer to the r7 on the stack > instead gcc sets to lr, > but it already handles by the Sanitizers. Then Clang is broken. You can't have different frame pointers in Arm and Thumb code. I object to another hack going in for another ill-specified frame pointer variant until such time as the ABI is updated to sort this out properly. The frame handling code is just too critical to overall performance and the prologue and epilogue code itself is also quite fragile in this respect. Adding more mess on top of that is going to make sorting this out even more tricky: and once a patch like this goes on it's not easy to remove it again. So until the ABI sanctions a proper inter-function frame chain record, GCC will only support local use of the frame pointer and no chaining. R. > >> >> So this sounds like the first thing to do is reducing the size of the > stack traces. >> The default is 30 which is far larger than useful. Using 1 for example > should >> always be fast (since you can use __builtin_return_address(0)) and > still get >> the function that called malloc. Also if the unwinder happens to be > too slow, >> it should be optimized and caching added etc. >> > > If we change the size of the traces to 2, it could be something like this: > > 0xb42a50a0 is located 0 bytes inside of 88-byte region > [0xb42a50a0,0xb42a50f8) > freed by thread T0 here: >     #0 0xb6a35cc7 in __interceptor_free (/usr/lib/libasan.so+0x127cc7) >     #1 0xb5fa64e3 in ipc::message::unref() > (/lib/libsensord-shared.so+0xe4e3) > > previously allocated by thread T0 here: >     #0 0xb6a36157 in malloc (/usr/lib/libasan.so+0x128157) >     #1 0xb2f8852d in accel_device::get_data(unsigned int, > sensor_data_t**, int*) (/usr/lib/sensor/libsensor-hal-tm1.so+0x1052d) > > Instead this: > > 0xb42250a0 is located 0 bytes inside of 88-byte region > [0xb42250a0,0xb42250f8) > freed by thread T0 here: >     #0 0xb6989cff in __interceptor_free (/usr/lib/libasan.so+0x127cff) >     #1 0xb5fa64e3 in ipc::message::unref() > (/lib/libsensord-shared.so+0xe4e3) >     #2 0xb6efbf47 in sensor::sensor_handler::notify(char const*, > sensor_data_t*, int) (/usr/bin/sensord+0x1ef47) >     #3 0xb6efad43 in sensor::sensor_event_handler::handle(int, unsigned > int) (/usr/bin/sensord+0x1dd43) >     #4 0xb5fa3bbb  (/lib/libsensord-shared.so+0xbbbb) >     #5 0xb62d9a15 in g_main_context_dispatch > (/lib/libglib-2.0.so.0+0x91a15) >     #6 0xb62da2d9  (/lib/libglib-2.0.so.0+0x922d9) >     #7 0xb62da9a9 in g_main_loop_run (/lib/libglib-2.0.so.0+0x929a9) >     #8 0xb5fa4e1b in ipc::event_loop::run(int) > (/lib/libsensord-shared.so+0xce1b) >     #9 0xb6eec9a5 in main (/usr/bin/sensord+0xf9a5) >     #10 0xb5cb663b in __libc_start_main (/lib/libc.so.6+0x1663b) > > previously allocated by thread T0 here: >     #0 0xb698a18f in malloc (/usr/lib/libasan.so+0x12818f) >     #1 0xb2f8852d in accel_device::get_data(unsigned int, > sensor_data_t**, int*) (/usr/lib/sensor/libsensor-hal-tm1.so+0x1052d) >     #2 0xb6ef848f in > sensor::physical_sensor_handler::get_data(sensor_data_t**, int*) > (/usr/bin/sensord+0x1b48f) >     #3 0xb6efaa51 in sensor::sensor_event_handler::handle(int, unsigned > int) (/usr/bin/sensord+0x1da51) >     #4 0xb5fa3bbb  (/lib/libsensord-shared.so+0xbbbb) >     #5 0xb62d9a15 in g_main_context_dispatch > (/lib/libglib-2.0.so.0+0x91a15) >     #6 0xb62da2d9  (/lib/libglib-2.0.so.0+0x922d9) >     #7 0xb62da9a9 in g_main_loop_run (/lib/libglib-2.0.so.0+0x929a9) >     #8 0xb5fa4e1b in ipc::event_loop::run(int) > (/lib/libsensord-shared.so+0xce1b) >     #9 0xb6eec9a5 in main (/usr/bin/sensord+0xf9a5) >     #10 0xb5cb663b in __libc_start_main (/lib/libc.so.6+0x1663b) > > > At the first example we lost the full context, from where the > control/data flow comes from. > >> >> The issue is that the frame pointer and frame chain always add a large >> overhead even when you do not use any sanitizers. This is especially bad >> for the proposed patch - you lose much of the benefit of using Thumb-2... >> > > The stack layout like this enables only with compile time flag > (-mthumb-fp and works only together with -mthumb and > -fno-omit-frame-pointer). It does not affect other codegen. > > Thanks. > > On 09/05/2018 03:11 PM, Wilco Dijkstra wrote: >> Hi Denis, >> >>> We are working on applying Address/LeakSanitizer for the full Tizen OS >>> distribution. It's about ~1000 packages, ASan/LSan runtime is installed >>> to ld.so.preload. As we know ASan/LSan has interceptors for >>> allocators/deallocators such as (malloc/realloc/calloc/free) and so on. >>> On every allocation from user space program, ASan calls >>> GET_STACK_TRACE_MALLOC; >>> which unwinds the stack frame, and by default uses frame based stack >>> unwinder. So, it requires to build with "-fno-omit-frame-pointer", >>> switching it to default unwinder really hits the performance in our >>> case. >> >> So this sounds like the first thing to do is reducing the size of the >> stack traces. >> The default is 30 which is far larger than useful. Using 1 for example >> should >> always be fast (since you can use __builtin_return_address(0)) and >> still get >> the function that called malloc. Also if the unwinder happens to be >> too slow, >> it should be optimized and caching added etc. >> >>>> Doing real unwinding is also far more accurate than frame pointer based >>>> unwinding (the latter doesn't handle leaf functions correctly, >>> entry/exit in >>>> non-leaf functions and shrinkwrapped functions - and this breaks >>> callgraph >>>> profiling). >>> >>> I agree, but in our case, all interceptors for allocators are >>> leaf functions, so the frame based stack unwinder works well for us. >> >> Yes a frame chain would work for this case. But it's not currently >> supported. >> >>> By default we build packages with ("-marm" "-fno-omit-frame-pointer"), >>> because need frame based stack unwinder for every allocation, as I said >>> before. As we know GCC sets fp to lr on the stack with >>> ("-fno-omit-frame-pointer" and "-marm") and I don't really know why. >>> But the binary size is bigger than for thumb, so, we cannot use default >>> thumb frame pointer and want to reduce binary size for the full >>> sanitized image. >> >> The issue is that the frame pointer and frame chain always add a large >> overhead even when you do not use any sanitizers. This is especially bad >> for the proposed patch - you lose much of the benefit of using Thumb-2... >> >> Using normal unwinding means your code runs at full speed and still >> can be >> used by the sanitizer. >> >>> In other case clang works the same way, as I offered at the patch. >>> It has the same issue, but it was fixed at the end of 2017 >>> https://bugs.llvm.org/show_bug.cgi?id=18505 (The topics starts from >>> discussion about APCS, but it is not the main point.) >>> >>> Also, unresolved issue related to this >>> https://github.com/google/sanitizers/issues/640 >> >> Adding support for a frame chain would require an ABI change. It would >> have to >> work across GCC, LLVM, Arm, Thumb-1 and Thumb-2 - not a trivial amount of >> effort. >> >> Wilco >> >>