"A 24-bit or 31-bit virtual address is expanded to 64 bits by appending 40
or 33 zeros, respectively, on the left before it is translated by means of
the DAT process, and a 24-bit or 31-bit real address is similarly expanded
to 64 bits before it is transformed by prefixing. A 24-bit or 31-bit
absolute address is expanded to 64 bits before main storage is accessed."
IBM z/Arch POO page 3-6.

I dont see 32 bits anywhere in that process. Unless and until IBM changes
the architecture definition to include 32 bits in address sizes, there is
no need for a -m32 switch.

Joe

On Sat, Jan 28, 2023 at 12:51 PM Paul Edwards <mutazilah@gmail.com> wrote:

> Hi Joe.
>
> Sorry for the delay (1 year and 4 months) in responding
> to this. There's a long and sad story as to what caused
> the delay, but we're here now.
>
> First of all, Hercules is a very important target. Even
> if gcc -m31 only allowed writing above 2 GiB on Hercules,
> that would still be an extremely important result, and
> justify changing the option to -m32, which is what it
> inherently is. Just because some arbitrary hardware
> masks bits at 24, or 31, or 32, or fails to even do a
> wrap at 64, doesn't alter the inherent fact that GCC
> is using 32-bit registers. Not 64. Not 31. Not 24.
>
> They are general purpose registers being used, so both
> address and data registers are 32 bits.
>
> If you have poorly-written assembler that only works if
> addresses are being masked to 24 bits, then there would
> be some justification in referring to that as a 24-bit
> program.
>
> If you have poorly-written assembler that only works if
> addresses are being masked to 31 bits, then there would
> be some justification in referring to that as a 31-bit
> program.
>
> But if you have a program that works in both of those
> AMODEs, ie what IBM calls "AMODE ANY", it would be a
> bit odd to call it an ANY-bit program, but that would
> be the exact name you need if you want to continue
> along that path. And an ANY-including-32-bit program
> if it also capable of running as AM32 on any real,
> emulated, or theoretical environment.
>
> If you have a poorly-written operating system (like z/OS),
> that doesn't provide address masking (via DAT) to 32 bits
> for 32-bit programs, so your only option is to run them
> as AM31, where negative indexes work, or only run programs
> that don't use negative indexes (and ensure that the
> high 32 bits of 64 bit registers are 0), then there would
> be justification in calling this an AM64-intolerant
> program or AM64-tolerant program, respectively.
>
> z/OS has an additional problem that even in AM64, and
> even with an AM64-tolerant 32-bit program, there is no
> way to request memory in the 2 GiB - 4 GiB region other
> than via crapshoot (use_2g_to_32g or whatever), and
> even if you win the crapshoot, you can't have a nice
> display of the 2 GiB boundary being crossed in a single
> instruction. You could if you switched to supervisor
> mode/key zero and didn't mind clobbering what was already
> there, but you would probably still need to switch DAT off.
> And then because you don't know what damage you have
> done, you would need to freeze the system and re-IPL.
>
> Instead of attempting that, what I did was use a
> properly-written OS, z/PDOS, that uses DAT (virtual
> memory) to map the 4 GiB to 8 GiB region to 0 to 4 GiB,
> so that even in AM64, you effectively get AM32. This is
> the proper way to handle memory when you run 32-bit
> programs on a 64-bit system. 32 and 64-bit programs
> can run transparently with no mode switching required.
> The 4 GiB to 8 GiB virtual storage region is effectively
> dead.
>
> It is only used for negative indexes, which are a
> fundmanental part of indexed addressing. Even positive
> indexes need wrapping. E.g. if you have an address at
> the 3.5 GiB mark and you wish to access memory at the
> 0.5 GiB mark, you would use a positive index of 1 GiB
> to get there. On an AM64 system, without a 32-bit mode
> in effect, this would index to location 4.5 GiB without
> an appropriate DAT mapping.
>
> Note that the index that would do such a thing may be
> in a variable (register) that is only known at runtime,
> so it is not something that you can change GCC to stop
> generating, and I was wrong to ask for that (for years).
>
> So, with that said, I have been able to satisfy your
> challenge, using real hardware. A real z114 using a
> real 3270 terminal. You can see that beautiful terminal here:
> https://groups.io/g/hercules-380/message/2391https://groups.io/g/hercules-380/message/2392
>
> The second photo of the first link shows the CPU (2818)
>
> z114 = 2818-M05/M10
>
> I can obtain a picture of the sticker if needed.
>
> No Hercules in sight.
>
> You could move the goal posts and say that running under
> z/VM doesn't count either.
>
> If you do that, I can run z/PDOS directly on an LPAR
> and run the memory test (in fact, this has already
> been done), but we don't know the procedure (and may
> not have permission) to use the HMC to display memory.
> z/PDOS can display its own memory, and this can show
> that the memory at 80000000 is different from location 0,
> if you accept z/PDOS reporting itself.
>
> But z/VM is the more "independent" way of displaying
> memory, so that there is no chance that z/PDOS can "cheat".
>
> Here is the test code in z/PDOS:
>
>         else if (memcmp(prog, "MEMTEST", 7) == 0)
>         {
>             printf("writing 4 bytes to address X'7FFFFFFE'\n");
>             memcpy((char *)0x7ffffffe, "\x01\x02\x03\x04", 4);
>             printf("done!\n");
>             *pdos->context->postecb = 0;
>             pdos->context->regs[15] = 0;
>         }
>
> and the memcpy generates a single MVC instruction:
>
>          MVC   0(4,2),0(3)
>
> Note that MVC is an instruction that has been available
> since the S/360 (in the 1960s). I am actually using the
> i370 target of GCC 3.2.3 for this test, but the principle
> is the same for s390 (as opposed to s390x) on the latest
> GCC. Both are 32-bit.
>
> Note that the i370 target was written by Jan Stein in 1989
> when he worked at Amdahl, long before AM64 existed.
>
> It only used S/370 instructions, so runs on anything from
> a S/370 up (thanks to upward compatibility).
>
> That MVC instruction works perfectly fine on z/Arch, as it
> does on S/370.
>
> Other instructions generated by GCC, such as BALR, have
> changed behavior slightly as they went from AM24 on S/370
> to AM31 on S/370 XA, and AM64 on z/Arch (and for that
> matter, AM32 on S/380 under Hercules/380, or I assume
> AM32 on a 360/67).
>
> The behavior changed in an upwardly-compatible way, so long
> as the program was written in a reasonable manner - ie to
> not be deliberately dependent on that AM24 or AM31 specific
> behavior. The code GCC generates has indeed been written
> in that "reasonable manner".
>
> Other instructions, such as BXLE, that, for certain use
> cases, break down at the top end of the lower half of the
> 32-bit address space, just as BXLEG breaks down at the
> top end of the lower half of the 64-bit address space, are
> not generated by GCC at all, so are not relevant.
>
> Bottom line - GCC generates 32-bit clean code, and as such,
> the option should be -m32, not -m31, not -m24, not -mANY.
> Keeping -m31 for compatibility reasons is obviously fine,
> as would be adding -m24. But both of those things obscure
> the fact that this is 32-bit clean code.
>
> Here is the rest of the context of the generated code:
>
>          MVC   88(4,13),=A(@@LC33)
>          LA    1,88(,13)
>          L     15,=A(@@7)
>          BALR  14,15
>          L     3,=A(@@LC34)
>          L     2,=F'2147483646'
>          MVC   0(4,2),0(3)
>          MVC   88(4,13),=A(@@LC35)
>          LA    1,88(,13)
>          L     15,=A(@@7)
>          BALR  14,15
>
>
> @@LC32   EQU   *
>          DC    C'MEMTEST'
>          DC    X'0'
> @@LC33   EQU   *
>          DC    C'writing 4 bytes to address X''7FFFFFFE'''
>          DC    X'15'
>          DC    X'0'
> @@LC34   EQU   *
>          DC    X'1'
>          DC    X'2'
>          DC    X'3'
>          DC    X'4'
>          DC    X'0'
> @@LC35   EQU   *
>          DC    C'done!'
>          DC    X'15'
>          DC    X'0'
>
> As you can see from the photo of the real 3270 terminal,
> that MVC instruction has successfully straddled the
> 2 GiB mark, even in a single instruction.
>
> As you can see from the photo in the second link above,
> the memory at location 0 is different (still contains
> the IPL PSW!) from the memory at location x'80000000'.
>
> Do you have any further objections, other than a logical
> fallacy such as argumentum ad populum or argumentum ad
> baculum, to oppose gcc having -m32 as an option for the
> S/390 target, or if the i370 code is added back in, for
> that too, given that that is the correct technical nature
> of the GCC-generated code?
>
> Thanks. Paul.
>
>
>
>
> "Simply switching off optimization made the negative
> indexes go away, allowing more than 2 GiB to be
> addressed in standard z/Arch, with "-m31".
>
> Prove it on real hardware, not hercules. Hercules doesnt count.
>
> Joe
>
> On Wed, Sep 29, 2021 at 7:09 PM Paul Edwards via Gcc <gcc@gcc.gnu.org>
> wrote:
>
> >* We have fait accompli now:
> *>>* https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html <https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html>
> *>>* Simply switching off optimization made the negative
> *>* indexes go away, allowing more than 2 GiB to be
> *>* addressed in standard z/Arch, with "-m31".
> *>>* The above request is to add "-m32" as an alias for
> *>* "-m31", but I would like to add as a request for it to
> *>* work with optimization on.
> *>>* BFN. Paul.
> *>>>>>* -----Original Message-----
> *>* From: Paul Edwards
> *>* Sent: Friday, September 3, 2021 11:12 PM
> *>* To: Jakub Jelinek
> *>* Cc: Ulrich Weigand ; gcc@gcc.gnu.org <gcc@gcc.gnu.org> ; Ulrich Weigand
> *>* Subject: Re: s390 port
> *>>* >> > This is not in one single place, but spread throughout the
> *>* >> > compiler, both common code and back-end.  I do not think it will
> *>* >> > be possible to get the compiler to generate correct code if
> *>* >> > you do not specify the address size correctly.
> *>>* >> 1. Is there any way to put a constraint on index
> *>* >> registers, to say that a particular machine can
> *>* >> only index in the range of –512 to +512 or some
> *>* >> other arbitrary set? If so, I can do 0 to 2 GiB.
> *>>* >> 2. Is there a way of saying a machine doesn’t
> *>* >> support indexing at all?
> *>>* > There is a way to do that, but it isn't about changing a single or a
> *>* > couple
> *>* > of spots, one needs to change a lot of *.md patterns, a lot of macros,
> *>* > target hooks and as Ulrich said, most important is to use the right Pmode
> *>* > which can differ from ptr_mode provided one e.g. defines ptr_extend
> *>* > pattern
> *>* > etc.
> *>>* Pardon? All that is required just to put a constraint
> *>* on an index register? If a range of a machine is
> *>* limited to -512 to +512, it shouldn't be necessary
> *>* to change md patterns etc etc.
> *>>* > Just look at the amount of work needed for the x32 or aarch64 ilp32
> *>* > support,
> *>>* That's different. That's because Intel stuffed up.
> *>* IBM didn't. IBM came within an ace of a perfect
> *>* architecture. It's as if Intel had created an x32
> *>* instead of an 80386 in 1986.
> *>>* IBM got it almost right in the 1960s.
> *>>* > and not just work spent one time on adding that support, but the
> *>* > continuous
> *>* > amount of work on maintaining it.  The initial work is certainly a few
> *>* > weeks if not months of work,
> *>>* I've been trying to figure out how to lift the 31-bit
> *>* restriction on mainframes since around 1987.
> *>>* If I have to pay someone for 2 month of work, at
> *>* this stage, I'm willing to do that, but:
> *>>* 1. I would like it done on GCC 3.2.3 plus maybe
> *>* GCC 3.4.6.
> *>>* 2. How much will it cost in US$?
> *>>* > then there needs to be somebody who regularly
> *>* > tests gcc trunk and branches in such configuration so that it doesn't
> *>* > bitrot, and not just that but somebody who actually fixes bugs in it.
> *>>* I'll take responsibility for giving the GCC 3.X.X
> *>* releases the TLC they deserve. And I'll encourage
> *>* my daughter to maintain them after I've kicked
> *>* the bucket.
> *>>* > If something doesn't fit into 2GB of address space,
> *>* > isn't it likely it won't fit into 4GB of address space
> *>* > in a year or two?
> *>>* Nope. 2 GiB is already a shitload of memory. It only
> *>* takes something like 23 MB for GCC 3.2.3 to recompile
> *>* itself, and I think 60 MB for GCC 3.4.6 to recompile
> *>* itself. That's the heaviest real workload I do. A 4 GiB
> *>* limitation instead of 2 GiB makes it just that much
> *>* less likely I'll ever hit a real limit.
> *>>* Someone told me that the only non-scientific application
> *>* they knew of that came close to hitting the 2 GiB limit
> *>* was IBM's C compiler. I doubt that IBM's C compiler
> *>* technology is evolving at such a rate that it only takes
> *>* 1-2 years for them to subsequently hit 4 GiB. Quite
> *>* apart from the fact that I don't really trust that even
> *>* IBM C is hitting a 2 GiB limit for what GCC can do in
> *>* 23 MiB. But it could be true - I'm not familiar with
> *>* compiler internals.
> *>>* BFN. Paul.
> *>>
>
>
>