From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by sourceware.org (Postfix) with ESMTPS id 115D33855014 for ; Wed, 7 Jul 2021 20:22:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 115D33855014 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qk1-x736.google.com with SMTP id e14so3321912qkl.9 for ; Wed, 07 Jul 2021 13:22:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:references:from:cc:subject:message-id:date:user-agent :mime-version:in-reply-to; bh=uivPduYe9u8yWE2lJ6cAIc+xqxBKD8pVHOTzj+WAfC8=; b=gDz6TxXNPcLVUvc4HmMOygWAC7vu7iA/fHEY/Uq1yxoHf98NMyEOe+/7FPFG/pnL6T 5zMpDI8gbWutmfkBuxrJaf2reX3ZDgHyriB61dWXyK0hB1V490K4l4LP7yiznG/nIEHh Wr1qrJsZ5Vz3u52Z1MIDxijViVUEF/Q9MqTzgdx+uR8qw3ohxwBpqJCZkt5Ue6NrRdn2 LGsJBocDSDTUTfHJCzWEI65TqdVJxO0DNxo+5ONPShsGn8QfZUkK5VOLOUIvfssgAwiV v6Eay0utHPBclEg72E8BSxOBp9t4cIS0X8JB/NRWglkQxNs4+8fHz2qfja0cfRPUir5l Y/GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:references:from:cc:subject:message-id:date :user-agent:mime-version:in-reply-to; bh=uivPduYe9u8yWE2lJ6cAIc+xqxBKD8pVHOTzj+WAfC8=; b=EkcQo1bKpK23Rq/880UMat3oAgvfO/K5ymDL2YZT19piXC749jFadX8hH1T0XBby51 Nz2iwiTYqHb0aYcKOUs0fZAaUofFnYlMJCpOIlbFGoz6n2LqOFxs2pF1Cy6MhWTsApOB vu0RbykWP+IMHXEN3W5Zj9QxY6jJQXMXGpUkmw1kSKO79KOeQ6Uekb+To1rUocYLRUJQ RJecNwiOmfB4MljE+gPWnrGK9TavdEFN6Ef9KJ7rwL16H2YcVjPi81EhpZARTKfI0adU N2Q5MDP2xp8jYrNIf/lbNrKEIztOwiQopyLppTSZQ+SZxH5AUFty+28LPqsKj7OAfuD8 JokQ== X-Gm-Message-State: AOAM531TQP7/oAeIyAh4eRsMIWPKTE9h9LldvcSwMrj00ZcDLvy0xpZs JKGwGjU+mMbTu2bUaIWRgIY= X-Google-Smtp-Source: ABdhPJx8YzyIE+/E/cA0A4FVEKVGYK4C6oEzlJWJYLzaGPQYTLUtoXUV7WbuMxhR7z2F/eg1GW9vew== X-Received: by 2002:ae9:dd43:: with SMTP id r64mr27199906qkf.216.1625689341608; Wed, 07 Jul 2021 13:22:21 -0700 (PDT) Received: from [0.0.0.0] (097-102-108-016.res.spectrum.com. [97.102.108.16]) by smtp.googlemail.com with ESMTPSA id e12sm64520qtx.73.2021.07.07.13.22.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Jul 2021 13:22:20 -0700 (PDT) To: newlib@sourceware.org References: <02319db3-b410-18e8-ac8a-049c54e753b5@gmail.com> <87d0cba4-b71a-a79b-cff6-891c384c20d5@SystematicSw.ab.ca> <2bb2d181-36e4-1cc6-bdff-9eb0aea895ec@gmail.com> <661253a7-00b8-02ac-4956-051f7be70231@t-online.de> From: Orlando Arias Subject: Re: Help porting newlib to a new CPU architecture (sorta) Message-ID: <986af02e-7c71-cea9-40f4-03395aa46722@gmail.com> Date: Wed, 7 Jul 2021 16:23:46 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <661253a7-00b8-02ac-4956-051f7be70231@t-online.de> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="dFCimAturuDiXIz8lYs0MV2btkkBo7km0" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: newlib@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Newlib mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Jul 2021 20:22:23 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --dFCimAturuDiXIz8lYs0MV2btkkBo7km0 Content-Type: multipart/mixed; boundary="KPXkfgW0FQ4F2WOdOANzgXoudxo0eiwIE"; protected-headers="v1" From: Orlando Arias To: newlib@sourceware.org Cc: =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker?= Message-ID: <986af02e-7c71-cea9-40f4-03395aa46722@gmail.com> Subject: Re: Help porting newlib to a new CPU architecture (sorta) References: <02319db3-b410-18e8-ac8a-049c54e753b5@gmail.com> <87d0cba4-b71a-a79b-cff6-891c384c20d5@SystematicSw.ab.ca> <2bb2d181-36e4-1cc6-bdff-9eb0aea895ec@gmail.com> <661253a7-00b8-02ac-4956-051f7be70231@t-online.de> In-Reply-To: <661253a7-00b8-02ac-4956-051f7be70231@t-online.de> --KPXkfgW0FQ4F2WOdOANzgXoudxo0eiwIE Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Greetings, On 7/7/21 2:43 PM, Hans-Bernhard Br=C3=B6ker wrote: > Am 06.07.2021 um 22:46 schrieb Orlando Arias: >=20 >> Consider the AVR architecture, where program and data spaces have >> distinct address spaces. We have a pointer to a string literal that >> resides in program memory. >=20 > You're already mixing stuff up again.=C2=A0 The memory C string literal= s are > in is, by definition, _not_ program memory.=C2=A0 It's read-only data m= emory. > =C2=A0That distinction is crucial. >=20 > Small-ish embedded CPUs do not usually implement the strict Harvard > architecture principle, precisely because that does not support constan= t > data.=C2=A0 A strict Harvard would have all data, including the > const-qualified parts, in RAM, and initialize it all by running a very > boring piece of program that just writes it all using immediate > operands.=C2=A0 const data would thus consume normal RAM, without any > write-protection by the hardware at all. At the risk of further derailing the initial conversation, I feel like there is some misunderstanding here on the AVR architecture. Address 0 in program memory contains the code that executes as part of the reset vector. Address 0 in data memory is a mirror of r0. There are two physically different address spaces in that architecture. This is very explicitly stated in the datasheet for any megaAVR or tinyAVR microcontroller. The C compiler (gcc) treats (void*)0 as address 0 in data memory. To initialize the .data section, the C runtime has to copy data from one address space to a different address space. This is where the lpm instruction comes into play: it allows you to load data across physical address spaces. There is [unfortunately, we can debate] no remapping/mirroring that takes place. Now, in AVR, as you mention C string literals are expected to be in data memory, so they need to be copied over. Because of limited SRAM, however, the compiler provides an extension to the C language to keep string literals in the program memory address space. String literals stored this way are not copied over to SRAM by the runtime. Declaring the literal as: const char* m =3D "hello, world!\n"; is not enough to keep them in program memory. You have to utilize the PROGMEM macro: const char* m PROGMEM =3D "hello, world!\n"; which actually expands to __attribute__((section(".progmem"))) or some such. To access them, you need to utilize very specific macros/functions since the load has to be done with the lpm instruction. It may be confusing looking at a flat dump of the binary, since gcc still treats the end result as a "flat single address space" but in reality, that is not how the hardware operates. There are two physically distinct address spaces, and addresses between them share nothing in common. This is in contrast to something like a Cortex-M based core, where address 0 contains the initial value for the main stack pointer. The C runtime still has to initialize .data using information from a read only memory [usually flash]. However, this read only memory shares the same address space as RAM. Yes, the Cortex-M core has multiple AMBA AXI ports to connect into a bus matrix, but the memory system is still unified. Both program memory [flash/ROM/FeRAM...] and data memory [SRAM...] are in the same address space. You can declare something like: const char*m =3D "hello, world!\n"; and the compiler is smart enough to keep that data in the read only portions of memory [namely flash/ROM/FeRAM...]. They will not be copied over to SRAM by the C runtime. Accesses and references will be performed [using the ldr* family of instructions]. In fact, the C compiler will embed large integer literals in program code, and load them directly from read only memory into registers. This is because there is a limit as to how large of an integer literal can be encoded in a mov instruction. This is also how things like jump tables are implemented by gcc on Arm architectures [both A and M profiles, can not say for R profiles since I have not used them, but I imagine it is the same]. > Micro controller designers have pulled different kinds of tricks to get= > around the need to have constants directly in ROM, ranging from the > simple loop-hole instruction that does read from program memory anyway > (like the 8051's MOVC), to various kinds of mirroring schemes that just= > map ROM into data space, essentially breaking the Harvard architecture > rather fundamentally. I have seen the the mirroring scheme at work before. The STM32F4 microcontrollers [Cortex-M4F cores], for example, map internal flash to both at address 0 and address 0x08000000. SRAM begins at 0x20000000 as per the Armv7-M standard mandates, followed by a bit banding region for SRAM. From the perspective of the Cortex core, this is all in a single, unified address space. Yes, both flash and SRAM are different memory types, with different characteristics and power, clock, and access requirements, but they all lie in the same address space. This is unlike AVR, where no such schemes are available. This also has the side effect that you can not really do code injection in AVR. You can copy as much shellcode as you want to SRAM, but you will not be able to execute it, unless the currently executing code is in the bootloader section of program memory, the bootloader copies the code to program memory, then proceeds to execute it [and this requires a rather convoluted process]. In Arm Cortex-M cores, however, you can have code execute from SRAM as if it was executing from a read only memory [MPU permissions notwithstanding]. OpenOCD does this a lot, actually, when dealing with Arm-based microcontrollers. In order to load a program into flash, they inject code into SRAM which configures the [memory mapped] flash controller for the core you are working with to allow for writes, then proceed to have the flash controller store the program. Because how the address map in an Armv7-M [and Armv8-M for that matter] core is structured, the end result is the program code available starting at address 0 [with the initial vector table at that location]. > But that's ultimately a problem for the implementer of the C compiler > and run-time library to address, if they decide to try doing that on > such small architectures. >=20 >> The problem with this code is that we are treating a as a pointer in >> data memory. Declaring a to be PROGMEM does not help. We actually need= >> to rewrite the code to force the compiler to use the proper instructio= n: > That's what you get for throwing Standard C compatibility out the windo= w > by declaring that string constant using a compiler extension like PROGM= EM. >=20 > Generally the compiler would be required by the Standard to implement > "generic pointers" that can reach _all_ kinds of data defined without > use of non-standard means.=C2=A0 If it doesn't do that, it is by defini= tion > not a C compiler.=C2=A0 Which can be fine, e.g. if the architecture jus= t > cannot have a correct C implementation otherwise, or only a horribly > inefficient one. >=20 > But porting a generic standard C library like newlib or glibc onto a > platform that needs non-standard compiler extensions just to emulate > strcmp() may quickly turn into a lost cause. >=20 Except that you need to do this, because it is how the architecture works. If you do not care about conserving SRAM in AVR, you can declare your literals as const. The compiler will do its thing and assume constness for optimization purposes, but the runtime will happily copy them over to SRAM at startup and you can use your standard C library functions. Now, if you want to be more conscious about your SRAM usage, you need to use the non-standard means I mentioned. The fact that there are two physically distinct address spaces requires that. Cheers, Orlando. --KPXkfgW0FQ4F2WOdOANzgXoudxo0eiwIE-- --dFCimAturuDiXIz8lYs0MV2btkkBo7km0 Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- iF0EAREKAB0WIQQNWDJzd34+k5noE3NTFb9QFn4uoQUCYOYNUgAKCRBTFb9QFn4u ofDCAJ94IbyyZtqwlLAr/vLoWzkPMxOOGQCfZVY3vkbjhcTV+WAXWhio60JMqgQ= =y5S/ -----END PGP SIGNATURE----- --dFCimAturuDiXIz8lYs0MV2btkkBo7km0--