From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27474 invoked by alias); 22 Nov 2013 02:43:25 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 27464 invoked by uid 89); 22 Nov 2013 02:43:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.2 required=5.0 tests=AWL,BAYES_50,RDNS_NONE,URIBL_BLOCKED,URIBL_RHS_DOB autolearn=no version=3.3.2 X-HELO: topped-with-meat.com Received: from Unknown (HELO topped-with-meat.com) (204.197.218.159) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 22 Nov 2013 02:42:27 +0000 Received: by topped-with-meat.com (Postfix, from userid 5281) id 836FC7469E; Thu, 21 Nov 2013 18:42:19 -0800 (PST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: libc-ports@sourceware.org Subject: [PATCH roland/arm-memcpy-fix] ARM: Fix memcpy computed-jump calculations for ARM_ALWAYS_BX case. Message-Id: <20131122024219.836FC7469E@topped-with-meat.com> Date: Fri, 22 Nov 2013 13:38:00 -0000 X-CMAE-Score: 0 X-CMAE-Analysis: v=2.1 cv=Zsx+dbLG c=1 sm=1 tr=0 a=WkljmVdYkabdwxfqvArNOQ==:117 a=14OXPxybAAAA:8 a=MA8EsnU4jZgA:10 a=Z6MIti7PxpgA:10 a=kj9zAlcOel0A:10 a=hOe2yjtxAAAA:8 a=bzJvsrZuzusA:10 a=bkP0wlvmFi16eMY85QUA:9 a=BYYeh5ThystlWu_f:21 a=Ldgst2nrBTVA_w2-:21 a=CjuIK1q_8ugA:10 X-IsSubscribed: yes X-SW-Source: 2013-11/txt/msg00044.txt.bz2 I flubbed the first version (and its testing!) of this, so it worked in the synthetic situation on GNU/Linux as I was testing, but did not work in the actual Native Client situation that motivated the change. For this fix I tested on arm-linux-gnueabi with arm-features.h hacked to define ARM_BX_ALIGN_LOG2 to 4, define ARM_ALWAYS_BX and ARM_NO_INDEX_REGISTER, and define 'bx' as a macro for 'nop;bx' to simulate the Native Client build where it's defined as a macro that expands to two instructions. The last hack (the 'bx' macro) is what was missing in my testing of the original version; testing the trunk code with that hack demonstrated the bug. OK for trunk? Thanks, Roland ports/ChangeLog.arm 2013-11-21 Roland McGrath * sysdeps/arm/armv7/multiarch/memcpy_impl.S [ARM_ALWAYS_BX] (dispatch_helper): Fix PC computation to properly account for instructions after the reference to PC given that 'bx' might actually be expanded to multiple instructions. * sysdeps/arm/arm-features.h (ARM_BX_NINSNS): Macro removed. --- a/ports/sysdeps/arm/arm-features.h +++ b/ports/sysdeps/arm/arm-features.h @@ -53,14 +53,6 @@ # define ARM_BX_ALIGN_LOG2 2 #endif -/* The number of instructions that 'bx' expands to. A more-specific - arm-features.h that defines 'bx' as a macro should define this to the - number instructions it expands to. This is used only in a context - where the 'bx' expansion won't cross an ARM_BX_ALIGN_LOG2 boundary. */ -#ifndef ARM_BX_NINSNS -# define ARM_BX_NINSNS 1 -#endif - /* An OS-specific arm-features.h file may define ARM_NO_INDEX_REGISTER to indicate that the two-register addressing modes must never be used. */ --- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S +++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S @@ -127,25 +127,26 @@ .purgem dispatch_step .endm #else -# if ARM_BX_ALIGN_LOG2 < 4 +# if ARM_BX_ALIGN_LOG2 < 3 # error case not handled # endif .macro dispatch_helper steps, log2_bytes_per_step - .p2align ARM_BX_ALIGN_LOG2 /* TMP1 gets (max_bytes - bytes_to_copy), where max_bytes is (STEPS << LOG2_BYTES_PER_STEP). - So this is (steps_to_skip << LOG2_BYTES_PER_STEP). */ - rsb tmp1, tmp1, #(\steps << \log2_bytes_per_step) - /* Pad so that the add;bx pair immediately precedes an alignment - boundary. Hence, TMP1=0 will run all the steps. */ - .rept (1 << (ARM_BX_ALIGN_LOG2 - 2)) - (2 + ARM_BX_NINSNS) - nop - .endr + So this is (steps_to_skip << LOG2_BYTES_PER_STEP). + Then it needs further adjustment to compensate for the + distance between the PC value taken below (0f + PC_OFS) + and the first step's instructions (1f). */ + rsb tmp1, tmp1, #((\steps << \log2_bytes_per_step) \ + + ((1f - PC_OFS - 0f) \ + >> (ARM_BX_ALIGN_LOG2 - \log2_bytes_per_step))) /* Shifting down LOG2_BYTES_PER_STEP gives us the number of steps to skip, then shifting up ARM_BX_ALIGN_LOG2 gives us the (byte) distance to add to the PC. */ - add tmp1, pc, tmp1, lsl #(ARM_BX_ALIGN_LOG2 - \log2_bytes_per_step) +0: add tmp1, pc, tmp1, lsl #(ARM_BX_ALIGN_LOG2 - \log2_bytes_per_step) bx tmp1 + .p2align ARM_BX_ALIGN_LOG2 +1: .endm .macro dispatch_7_dword