From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 70E103985463 for ; Fri, 15 May 2020 15:44:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 70E103985463 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=Richard.Earnshaw@foss.arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 060522F; Fri, 15 May 2020 08:44:51 -0700 (PDT) Received: from [192.168.1.19] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 80D463F71E; Fri, 15 May 2020 08:44:50 -0700 (PDT) Subject: Re: [PATCH] arm/strlen-thumb2-Os.S: Correct assembly syntax for ldrb instruction To: Keith Packard , newlib@sourceware.org References: <20200512175830.1186422-1-keithp@keithp.com> <79529c59-9b22-9b7b-1a18-2c3f52615bdd@foss.arm.com> <87mu69usxj.fsf@keithp.com> From: Richard Earnshaw Message-ID: Date: Fri, 15 May 2020 16:44:49 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <87mu69usxj.fsf@keithp.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 May 2020 15:45:01 -0000 On 15/05/2020 16:19, Keith Packard wrote: > Richard Earnshaw writes: > >> IIRC the .w was deliberate to keep the alignment right for some >> subsequent instructions. > > I'm not sure what you mean by 'alignment' here -- would the assembler > actually insert no-ops to ensure that the instruction were aligned > somehow? I can't find any mention of this additional meaning of the '.w' > qualifier and find that unlikely as it would increase code size? > > It sounds like gas and llvm have a different interpretation of the '.w' > qualifier for this instruction. The ARMv7-M Architecture Reference > Manual (ARM DDI 0403E.d (ID070218), says the '.W' means: > > .W Meaning wide, specifies that the assembler must select a 32-bit > encoding for the instruction. If this is not possible, an > assembler error is produced. > > LDRB offers three encodings, called T1, T2 and T3 in the specs. T1 and > T2 have equivalent flexibility: loading indirect from a register with an > optional offset (with T1 being 16 bit and T2 being 32 bit). Selecting > the 32-bit T2 form extends the offset from 5 to 12 bits, so I imagine > that an environment that couldn't replace instructions at link time > might use the T2 form to make room for larger relocation values in case > the offset weren't known at assembly time? > > However, only the 32-bit T3 encoding offers the post-indexed mode > required by the code here, and for that, there is no need to clarify > which to select using the .N or .W qualifiers. And, when reading the > description of the three encodings, only T2 includes the '.W' qualifier > in its syntax, although T1 doesn't include '.N', which kinda indicates > that the qualifiers should always be accepted, even if they aren't > necessary. > > Hrm. For this instruction, perhaps the intent in the specification is to > use the .W to force a T2 encoding *over a T3 encoding*. The T3 encoding > also supports register-indexed mode, but with only an 8-bit offset, so > using .W might indicate that even if the offset *could* fit in the 8-bit > T3 form, that the assembler should instead select the T2 encoding. I > dunno. > >> LLVM's assembler needs fixing if it doesn't accept '.w'. > > Yes, it seems like that would be a good idea; having the '.w' in this > case is harmless as only the T3 encoding can possibly work. There is > already a discussion about this in the llvm world; I don't know if or > when that will result in a fix being applied. > > I dug through the gas source and couldn't find any place where the '.w' > qualifier would affect the output in this case though, so removing it > should be harmless for gas. > > I'm just responding to a bug report from a user trying to use > clang to build the library, and for that, fixing the source code to work > with both gas and clang in ways that don't appear to affect the output > of gas at all seemed like a reasonable option to me. > The assembler syntax for ldrb in the the copy of the Arm ARM that I have to hand has: LDRB , [ {, #+/-}] Offset: index==TRUE, wback==FALSE LDRB , [, #+/-]! Pre-indexed: index==TRUE, wback==TRUE LDRB , [], #+/- Post-indexed: index==FALSE, wback==TRUE And is the .N .W qualifier. So clearly .W should be acceptable, even if it is the only permitted from for this case. R.