Greetings,

Appending to this discussion:

> The full toolchain includes libc, libbfd, binutils, gcc, which would
> have to be bootstrapped by cross-compiling from a host system.
> 
> There is an AVR target which may support 8b and avr-libc which should
> support the toolchain.
> 
> There is also an Arduino toolchain based off the AVR toolchain which
> supports developing and loading sketches which run on pretty bare 8b MCUs.
> 
> You may be better off targeting the latter approach for simplicity.
> A lot may depend on how close a match you can find between your target
> architecture and some existing architecture.
> 

The Arduino toolchain is the same as the AVR toolchain. It is bundled
with the Wiring environment, which provides a `Board Support Package'
and hardware abstraction layer to the underlying microcontroller. The
main() function, for example, is provided by the runtime.

Now, specific to your case:

> I designed a Logisim schematic of a full system, able to run programs in
> the simulation, as shown in this video:
> 
> https://www.youtube.com/watch?v=UP6tO8x5I5A
> 
> Is based on a SAP-1 (Simplest as Possible) basis, containing 4 GP registers
> (8 bit), stack for function operations (up to 256 depth levels of
> recursivity), 24bit plain RAM component, and a simple ALU able to echo
> strings and perform not floating point math.
> 
> AFAIK, newlib is suitable for embedded devices, and I want to create the
> full toolchain for C/C++ language at least.
> 

You are getting ahead of yourself here. As others have mentioned, before
you go into the C library, you need to have a working assembler [and
possibly a bootstrapping compiler, depending on how far you want to go].
A linker is not required at this point if you're just looking to
bootstrap your toolchain. Usually, the GNU Assembler (gas) from binutils
is used for this scenario. Porting gas to a new architecture requires
adding your new target to libopcodes (assembly and disassembly) and to
libbfd (object file generation).

However, I would argue that before even going there you need to define
an application binary interface (ABI). Your CPU has 8 bit registers. The
C language requires the int type to be at least 16 bits wide. You need
to define how you plan to handle this scenario. AVR, for example, uses
register pairs to handle integers. You also need to define your calling
conventions. That is, how are arguments being passed to functions, how
returns are handled, any stack alignment constraints... As examples, in
IA-32, arguments are normally passed in the stack; in AMD64 the first 6
integer arguments are passed in registers; RISC-V uses the a* registers
to pass arguments to functions. This kind of thing is specially
important when you are mixing languages such as C and assembly.

Keep in mind that C assumes you are working on a von Neumann
architecture. This means that the language expects both program and data
to be in the same address space. If your CPU [which I can not really
check since I do not work on Logisim, I use Verilog/SystemVerilog/VHDL
myself] uses a Harvard or modified Harvard architecture, you need to
figure out a way for your C runtime to emulate a von Neumann machine.
AVR does this by ensuring that things that would go in .data, .bss, and
.rodata are copied over to the microcontroller's RAM before user code is
called [the toolchain provides ways to override this, but that is not
important at the moment]. They also extend the C language to allow for
data to be loaded from program memory under certain conditions [which is
also not important for the discussion at hand].

This brings me to the next point:
>  - How do I write CRT code? In standard ASSY? In my own assy, and somewhere
> I tell which binary format represents each mnemonic? In case of the first,
> how will  the toolchain know the opcodes and binary format of the
> instructions and data? In case of the second, could you please clarify the
> paths of those files in the docs? :)

You probably want to write your code in standard assembly. I looked over
at your Java `assembler' and you do seem to have instruction encodings
in there. The encodings seem to be rather irregular, but we are not here
to evaluate computer architecture stuff, so I digress. If you can get
your Java assembler to behave the way gas behaves, you may be able to
use it in place of gas. I do not know how scalable that would be though.

> I want to push it further, and perhaps port some existing software to it as
> FreeDOS ... , as far as I can get. All ideas are welcome.

I think FreeDOS may be too much at the moment. I would suggest looking
at FreeRTOS or RTEMS instead. These operating systems have been designed
with embedded CPUs in mind, and FreeRTOS has support for AVR, which is
an 8 bit CPU. Keep in mind that porting an OS is quite a daunting task,
and will require you to have at the very least a working assembler,
compiler, and linker.

Cheers,
Orlando.