Greetings,

On 7/6/21 2:08 PM, Brian Inglis wrote:
> C has no assumptions or requirements for outdated von Neumann
> architectures, but it also may not assume it is not running on such
> architectures.
> Early C and Unix took advantage of the Harvard architecture of larger
> PDP-11s which supported split I/D space with mapping registers to
> support double the program size (64KB each 16b I and D) as well as
> splitting kernel/supervisor/runtime/user spaces [and similar features on
> other systems], and shared text (the original meaning of the 't' "sticky
> text" perm on executables), which is why there is wording in the
> standards about unmodifiable constant values.
> 

Right, went back and looked at the standard. There is no description of
what the abstract machine for the execution environment should be. I
guess my confusion came from the second paragraph in [1]. Harvard
architectures still have the thing that you have to define whether a
pointer refers to something in program space or data space, and standard
C has no way of signaling this. In AVR, for example, requires the use of
the LPM instruction to perform loads from program memory, and the use of
SPM to store data into program memory (there are limitations to the
latter). In contrast, accesses to data memory are performed using
LDS/STS and family. The compiler will normally issue LD*/ST*
instructions when dereferencing pointers. If program memory is desired,
then the PROGMEM macro needs to be used for variables that need to be in
program memory, and a series of macros/intrinsics need to be used to
actually load/store the data in that region [forcing the use of the
LPM/SPM instructions].

I/O space is also something that can be hard to handle in C. IA-32
requires the use of the IN/OUT instructions to talk to devices in the
I/O space. The compiler will not emit this stuff unless you use some
form of macro/intrinsic. AVR has an I/O space that is also memory
mapped, so the compiler can optimize assembly output and use the I/O
space when talking to some addresses, avoiding the use of load/store
instructions (which are larger and take longer to execute). The
optimization is only possible though if the address is known at compile
time, or that it can be replaced with a symbol address at link time. For
the latter, the linker needs to check that the instruction can be
properly `patched' with the address being accessed.

This is what I meant by the von Neumann requirement: all pointers
dereference to the same address space. In a hosted system, the OS may be
able to perform some tricks to mask away different address spaces
[trapping on a memory access to a specific address range and emulating
the access for the program, for example]. In a freestanding system, your
[statically linked] C runtime is responsible for initializing the data
section of the program somehow. Usually, this involves copying data from
a read-only location [storage medium, such as memory-mapped flash] to a
read/write location [such as SRAM]. If you need to cross different
address spaces when performing this copy operation, then standard C will
not help you. You need an extension to the language of some sort [or to
use assembly]. Yes, most modern machines have split I/D caches, and the
processor's implementation is effectively that of a Harvard machine.
Some CPUs have multiple ports which can be used to service different
memory access requests to a unified memory system. Cortex-M processors,
for example, have multiple AMBA AXI ports [PPB, D-bus, I-bus, and
S-bus], but the memory system is still unified into a single address
space making the overall machine follow the von Neumann paradigm. This
is not the case with AVR, where address 0 means different things when
talking about data space and program spaces.

One thing though, I was under the impression that the PDP-11 was a von
Neumann machine with its unified memory model and all. Unix on the
machine made use of memory management facilities provided by the
hardware to provide a virtual memory environment for executing processes
[this being more of a function of the OS rather than C itself]?

> One of the big issues with systems around the time C was developed was
> the unreliability resulting from systems, runtimes, and programs using
> self-modifying code "tricks", and programs ability to change constant
> data values used in programs (PI did not always stay π).
> That was one of the benefits touted for higher level system
> implementation languages such as Bliss, C, PL/S.
> In the face of failures frequently caused by arbitrary code and data
> modifications, branches and jumps, worries about trivial issues such as
> delimited strings, unsized APIs, dynamic memory management were decades
> in the future!
>> Various implementations, ABIs, and systems require modifiable code and
> constant data, but it is *NOT* a C language assumption or requirement;
> in fact I would consider it to be a C anti-pattern, that has long
> limited the abilities of compilers and optimizers to do better jobs.
> Copying sections tends to be due to limitations of CPUs, systems,
> runtimes, or compilers to DTRT to make programs more reliable. ;^>
> 

This latter part is really dependent on the system, and I would consider
it outside the scope of the C language proper and more towards the C
runtime environment, program loader, and dynamic linker [which the
standard makes no assumption of as you state]. For example, on systems
that support ASLR, binaries may need to be `patched' when loaded to fix
relocation information. I believe DLLs in Windows have some weird thing
like they have a preferred base address to be loaded, otherwise the DLL
needs to be `rewritten' in memory as it is loaded using information from
one of the PE sections. Not too sure how that works or if they still do
that. I have not looked at Windows internals in a long time. IA-32 code
has some limitations on that too, given that %eip is not
addressable/accessible without using some hacks [which require the use
of assembly: call a stub, load the value of %eip from the stack into a
general purpose register such as %eax from the stack, then return. The
value in %eax is now the same as the address of the instruction you
returned to]. AMD64 allows for %rip to be indexed, which allows for
position independent code to be generated a lot easier,  although some
relocation information may still be needed [again, not too familiar with
this aspects of the architecture/ABI].

Cheers,
Orlando.

[1] https://www.nongnu.org/avr-libc/user-manual/pgmspace.html