From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=wsf4=CY=m.gmane-mx.org=gcg-devel-2@sourceware.org>
Received: from ciao.gmane.io (ciao.gmane.io [116.202.254.214])
	by sourceware.org (Postfix) with ESMTPS id 7753F3858D33
	for <gcc@gcc.gnu.org>; Thu,  6 Jul 2023 12:53:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7753F3858D33
Authentication-Results: sourceware.org; dmarc=fail (p=quarantine dis=none) header.from=westcontrol.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=m.gmane-mx.org
Received: from list by ciao.gmane.io with local (Exim 4.92)
	(envelope-from <gcg-devel-2@m.gmane-mx.org>)
	id 1qHOUY-0009ik-Ae
	for gcc@gcc.gnu.org; Thu, 06 Jul 2023 14:53:50 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: gcc@gcc.gnu.org
From: David Brown <david@westcontrol.com>
Subject: Re: wishlist: support for shorter pointers
Date: Thu, 6 Jul 2023 14:53:43 +0200
Message-ID: <u86dgn$13qa$1@ciao.gmane.io>
References: <439affd4-11fe-de80-94c8-6fc64cbf76ec@ztk-rp.eu>
 <940e9ae5-8649-5a28-e29f-06f0b2982892@ztk-rp.eu>
 <eeddf4aa-9fd7-c843-eeef-56e4eb0ca107@westcontrol.com>
 <a8e9b05c-0a2f-ea40-34ff-7230042b3f4c@ztk-rp.eu>
 <6c881d3fc76d112d52ec668d05b68394ae792f30.camel@gwdg.de>
 <bbc95dec-e9f6-42a4-9f9e-b6e8753b9f37@ztk-rp.eu>
 <f9df7e5c11b7075aa196efaf10a00c157e43f4c2.camel@gwdg.de>
 <1eeef918-80d0-12a3-e7e9-5a75b25fb769@ztk-rp.eu>
 <8825a11f-e462-8d97-3cdf-a5015250f3c1@westcontrol.com>
 <45292545-b4e1-a2c8-38d0-a7773f309ca5@ztk-rp.eu>
 <25afc1cb-3a62-135d-3206-2d9eb6216944@westcontrol.com>
 <701a38b8-9e90-36ce-9357-8b648f04a4d8@ztk-rp.eu>
 <f679f4e5-8ac4-3e1b-7e24-1ae398d053fe@westcontrol.com>
 <540fa64b-0263-ba43-2c2a-2973ab376826@ztk-rp.eu>
 <f299e78f-0606-ec10-f8d4-518fb4e22b0c@westcontrol.com>
 <4932e81f-18b5-81de-180b-181008b168f5@ztk-rp.eu>
 <b9bde7f2-acf3-7a59-b4cd-cbddcb2a22dd@westcontrol.com>
 <2c42ea21-ad17-fe93-c228-1730a984360d@ztk-rp.eu>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.0
Content-Language: en-GB
In-Reply-To: <2c42ea21-ad17-fe93-c228-1730a984360d@ztk-rp.eu>
X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_00,BODY_8BITS,FORGED_MUA_MOZILLA,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

On 06/07/2023 09:00, Rafał Pietrak via Gcc wrote:
> Hi,
> 
> W dniu 5.07.2023 o 19:39, David Brown pisze:
> [------------------]
>>> I'm not sure what this means? At compile time, you only have 
>>> literals, so what's missing?
>>
>> The compiler knows a lot more than just literal values at compile time 
>> - lots of things are "compile-time constants" without being literals 
>> that can be used in string literals.  That includes the value of 
>> static "const" variables, and the results of calculations or "pure" 
>> function 
> 
> const --> created by a literal.

Technically in C, the only "literals" are "string literals".  Something 
like 1234 is an integer constant, not a literal.  But I don't want to 
get too deep into such standardese - especially not for C++ !

Even in C, there are lots of things that are known at compile time 
without being literals (or explicit constants).  In many situations you 
can use "constant expressions", which includes basic arithmetic on 
constants, enumeration constants, etc.  The restrictions on what can be 
used in different circumstances is not always obvious (if you have 
"static const N = 10;", then "static const M = N + 1;" is valid but "int 
xs[N];" is not).

C++ has a very much wider concept of constant expressions at compile 
time - many more ways to make constant expressions, and many more ways 
to use them.  But even there, the compiler will know things at compile 
time that are not syntactically constant in the language.  (If you have 
code in a function "if (x < 0) return; bool b = (x >= 0);" then the 
compiler can optimise in the knowledge that "b" is a compile-time 
constant of "true".)


> 
>> calls using compile-time constant data.  You can do a great deal more of 
> 
> "compile time constant data" -> literal
> 
>> this in C++ than in C ("static const int N = 10; int arr[N];" is valid 
>> in C++, but not in C).  Calculated section names might be useful for 
>> sections that later need to be sorted.
>>
>> To be fair, you can construct string literals by the preprocessor that 
>> would cover many cases.
> 
> OK. We are talking of convenience syntax that allows for using any 
> "name" in c-sources as "const-literal" if only its rooted in literals 
> only. That's useful.
> 
> +2. :)
> 
>>
>> I can also add that generating linker symbols from compile-time 
>> constructed names could be useful, to use (abuse?) the linker to find 
>> issues across different source files.  Imagine you have a 
> 
> +1
> 
>> microcontroller with multiple timers, and several sources that all 
>> need to use timers.  A module that uses timer 1 could define a 
> [----------------------]
>>>>
>>>>      __attribute__((section("jit_buffer,\"ax\"\n@")))
>>>
>>> I assume, that adding an attribute should split a particular section 
>>> into "an old one" and "the new one with new attribute", right?
>>
>> You can't have the same section name and multiple flags.  But you 
>> sometimes want to have unusual flag combinations, such as executable 
>> ram sections for "run from ram" functions.
> 
> section flags reflect "semantic" of the section (ro v.s. rw is different 
> semantics at that level). So, how do you "merge" RAM (a section called 
> ".data"), one with "!x" flag, and the other with "x" flag?
> 
> conflicting flags of sections with the same name have to be taken into 
> consideration.
> 

It doesn't make sense to merge linker input sections with conflicting 
flags - this is (and should be) an error at link time.  So I am not 
asking for a way to make a piece of ".data" section with different flags 
from the standard ".data" section - I am asking about nicer ways to make 
different sections with different selections of flags.  (Input sections 
with different flags can be merged into one output section, as the 
semantic information is lost there.)

>>
>>>
>>> One would need to have linker logic (and linker script definitions) 
>>> altered, to follow that (other features so far wouldn't require any 
>>> changes to linkers, I think).
>>>
>>>> to add the flags manually, then a newline, then a line comment 
>>>> character (@ for ARM, but this varies according to target.)
>>>>
>>>> 6. Convenient support for non-initialised non-zeroed data sections 
>>>> in a standardised way, without having to specify sections manually 
>>>> in the source and linker setup.
>>>
>>> What gain and under which circumstances you get with this? I mean, 
>>> why enforce keeping uninitialized memory fragment, while that is just 
>>> a one shot action at load time?
>>>
>>
>> Very often you have buffers in your programs, which you want to have 
>> statically allocated in ram (so they have a fixed address, perhaps 
>> specially aligned, and so you have a full overview of your memory 
>> usage in your map files), but you don't care about the contents at 
>> startup. Clearing these to 0 is just a waste of processor time.
> 
> At startup? Really? Personally I wouldn't care if I waste those cycles.
> 

Usually it is not an issue, but it can be for some systems.  I've seen 
systems where a hardware watchdog has timed out while the startup code 
is clearing large buffers unnecessarily.  There are also some low-power 
systems that are halted until some external event triggers their reset - 
you want to get to the code that checks the reset source (reset pin or 
power on) as fast as possible, and you want much of your data to remain 
preserved over soft resets.

And maybe your buffers are allocated in external dynamic ram which is 
not accessible until you have configured the ram controller - and 
thereafter it is accessible as normal ram.  For one project I have at 
the moment, the chip's on-chip ram blocks can be allocated individually 
to data tightly coupled memory, instruction tightly coupled memory, or 
general-purpose ram - all at different addresses in the memory map.  You 
do not want anything cleared until the blocks have been re-mapped from 
their default settings to their final settings.


> And having that explicitly "vocalized" in sources, I think it'll just 
> make them harder to read by a maintainer.
> 

It is even harder to read if it is not explicit in the C sources, but 
only in the linker files!


> Otherwise, from my personal experience, it may or may not be desirable.
> 
>>
>>
>>>> 7. Convenient support for sections (or variables) placed at specific 
>>>> addresses, in a standardised way.
>>>
>>> Hmm... Frankly, I'm quite comfortable with current features of linker 
>>> script, and I do it like this:
>>> SECTIONS
>>> {
>>>      sfr_devices 0x40000000 (NOLOAD): {
>>>          . = ALIGN(1K);    PROVIDE(TIM2 =    .);
>>>          . = 0x00400;    PROVIDE(TIM3 =    .);
>>>          . = 0x00800;    PROVIDE(TIM4 =    .);
>>>      }
>>> }
>>>
>>> The only problem is that so far I'm not aware of command line options 
>>> to "supplement" default linker script with such fragment. Option "-T" 
>>> replaces it, which is a nuisance.
>>
>> These are ugly and hard to maintain in practice - the most common way 
>> to give fixed addresses is to use macros that cast the fixed address 
>> to pointers to volatile objects and structs.
> 
> Yes, I know that macros are traditionally used here, but personally I 
> think using them is just hideous. I'm using the above section 
> definitions for years and they keep my c-sources nice and clean. And (in 
> particular with stm32) if I change the target device, I just change the 
> linker script and don't usually have to change the sources. That's 
> really nice. It's like efortless porting.
> 
> Having said that. I'm opened to suggestion how to get this better - like 
> having a compiler "talk to linker" about those locations.
> 

There are always more than one way to do these things.  But I believe 
most programmers prefer to stick to the C (and/or C++) source files, and 
avoid anything involving linker files or assembly files.  We are looking 
for ideas that could suit a wide range of people, not just you or I 
personally :-)

>>
>> But sometimes it is nice to have sections at specific addresses, and 
>> it would be a significant gain for most people if these could be 
>> defined entirely in C (or C++), without editing linker files.  Many 
>> embedded toolchains support such features - "int reg @ 0x1234;", or 
>> similar syntax.  gcc has an "address" attribute for the AVR, but not 
>> as a common attribute.  (It is always annoying when one target has an 
>> attribute that would be useful on other ports, but only exists on the 
>> one target.)
> 
> Yes, I know that. Then again (personally) I do prefer to be able to tell 
> the compiler "-mcpu=atmega128" ... and so have it select appropriate 
> linker script, while NOT changing my sources, then do it the other way 
> around.
> 
> [----------------]
>>>
>>> Extrapolating your words: Do you think of sections that you would 
>>> have full control on it's content at compilation, and it isn't 
>>> sufficient to do it like this:
>>> char private[] __attribute__((section("something"))) = {
>>>   0xFF, 0x01, 0x02, ....
>>> };
>>>
>>
>> You also need control of the allocation (or lack thereof).  This can 
>> be done using sections with flags and/or linker file setup, but again 
>> it would be good to have a standardised GCC extension for it.  It is 
>> far easier for people to use a GCC attribute than to learn about the 
>> messy details of section flags and linker files.
> 
> OK. But IMHO, should you move the functionality from linker to GCC, then 
> all the "mess" just get transferred upstairs. And to know the linker is 
> a must if you do a bare-metal programming anyway.
> 

I like having my messes in one place, rather than scattered around :-)

> Still, standardization is good, good, good. But how to you standardize 
> something "private" by definition?

You have to pick the right level of standardisation.  I don't believe 
any of this should be at the level of the C standards, for example.  But 
I think it should be possible to get a generalisation within GCC, so 
that it is "standard" across all targets rather than having 
target-specific attributes or extensions like named address spaces. 
It's fine for GCC to say that this feature is only guaranteed to work 
for binutils gas and ld, or compatible assemblers and linkers, with elf 
outputs.  That gives you a "standard" for most use-cases.

> 
> [------------]
>>>> 11. Convenient support for building up tables where the contents are 
>>>> scattered across different source files, without having to manually 
>>>> edit the linker files.
>>>
>>> do you have an example where that is useful?
>>
>> You might like to have a code organisation where source files could 
>> define structures for, say, threads.  Each of these would need an 
>> entry in a thread table holding priorities, run function pointer, 
>> etc.  If this table were built up as a single section where each 
>> thread declaration contributed their part of it, then the global 
>> thread table would be built at link time rather than traditional run 
>> time setup.  The advantages include a clear static measure of the 
>> number of the number of threads (see point 9), clear memory usage, and 
>> smaller initialisation code.  (Obviously we are talking about 
>> statically defined threads here, not dynamically defined threads.)
> 
> I still don' get it. (pt.9 - sizes/locations of sections available to 
> compiler? relevant to this?)
> 
> Then again. I wouldn't aspire to understand everything. If that's 
> useful, let it be.
> 
> But I'd object to call this constructs "a table". A programmer should 
> have control of how compiler interprets his/her words. "table" has a 
> very well defined semantics and to have it the way you propose ... it'd 
> be better to have a different name/syntax for those other objects.
> 

I don't think "table" /does/ have well defined semantics.  But I do 
think this would be a table!

When you use C++, you already get a table like this for global 
constructors and other initialisation code.  Sometimes the 
initialisation for a variable - especially class objects where there is 
a non-trivial constructor - requires some code to be run.  When 
compiling a C++ file, every time the compiler needs to run some 
initialisation code, it generates a little function, and then makes a 
".ctors.xxx" section containing a pointer to that function.  In the 
linker, there is a section like this:

             . = ALIGN(4);
             KEEP (*crtbegin.o(.ctors))
             KEEP (*(EXCLUDE_FILE (*crtend.o) .ctors))
             KEEP (*(SORT(.ctors.*)))
             KEEP (*crtend.o(.ctors))

The ".ctors" section in crtbegin.o defines a "start of constructors 
table" symbol, and the matching section in ctrend.o has the end symbol. 
Linking collects all these constructor pointers into a table, and the 
C++ start up code can run through the table calling all the functions in 
order.

I want to be able to do something similar, with a convenient syntax, but 
with my own choice of tables and contents.