"A 24-bit or 31-bit virtual address is expanded to 64 bits by appending 40 or 33 zeros, respectively, on the left before it is translated by means of the DAT process, and a 24-bit or 31-bit real address is similarly expanded to 64 bits before it is transformed by prefixing. A 24-bit or 31-bit absolute address is expanded to 64 bits before main storage is accessed." IBM z/Arch POO page 3-6. I dont see 32 bits anywhere in that process. Unless and until IBM changes the architecture definition to include 32 bits in address sizes, there is no need for a -m32 switch. Joe On Sat, Jan 28, 2023 at 12:51 PM Paul Edwards wrote: > Hi Joe. > > Sorry for the delay (1 year and 4 months) in responding > to this. There's a long and sad story as to what caused > the delay, but we're here now. > > First of all, Hercules is a very important target. Even > if gcc -m31 only allowed writing above 2 GiB on Hercules, > that would still be an extremely important result, and > justify changing the option to -m32, which is what it > inherently is. Just because some arbitrary hardware > masks bits at 24, or 31, or 32, or fails to even do a > wrap at 64, doesn't alter the inherent fact that GCC > is using 32-bit registers. Not 64. Not 31. Not 24. > > They are general purpose registers being used, so both > address and data registers are 32 bits. > > If you have poorly-written assembler that only works if > addresses are being masked to 24 bits, then there would > be some justification in referring to that as a 24-bit > program. > > If you have poorly-written assembler that only works if > addresses are being masked to 31 bits, then there would > be some justification in referring to that as a 31-bit > program. > > But if you have a program that works in both of those > AMODEs, ie what IBM calls "AMODE ANY", it would be a > bit odd to call it an ANY-bit program, but that would > be the exact name you need if you want to continue > along that path. And an ANY-including-32-bit program > if it also capable of running as AM32 on any real, > emulated, or theoretical environment. > > If you have a poorly-written operating system (like z/OS), > that doesn't provide address masking (via DAT) to 32 bits > for 32-bit programs, so your only option is to run them > as AM31, where negative indexes work, or only run programs > that don't use negative indexes (and ensure that the > high 32 bits of 64 bit registers are 0), then there would > be justification in calling this an AM64-intolerant > program or AM64-tolerant program, respectively. > > z/OS has an additional problem that even in AM64, and > even with an AM64-tolerant 32-bit program, there is no > way to request memory in the 2 GiB - 4 GiB region other > than via crapshoot (use_2g_to_32g or whatever), and > even if you win the crapshoot, you can't have a nice > display of the 2 GiB boundary being crossed in a single > instruction. You could if you switched to supervisor > mode/key zero and didn't mind clobbering what was already > there, but you would probably still need to switch DAT off. > And then because you don't know what damage you have > done, you would need to freeze the system and re-IPL. > > Instead of attempting that, what I did was use a > properly-written OS, z/PDOS, that uses DAT (virtual > memory) to map the 4 GiB to 8 GiB region to 0 to 4 GiB, > so that even in AM64, you effectively get AM32. This is > the proper way to handle memory when you run 32-bit > programs on a 64-bit system. 32 and 64-bit programs > can run transparently with no mode switching required. > The 4 GiB to 8 GiB virtual storage region is effectively > dead. > > It is only used for negative indexes, which are a > fundmanental part of indexed addressing. Even positive > indexes need wrapping. E.g. if you have an address at > the 3.5 GiB mark and you wish to access memory at the > 0.5 GiB mark, you would use a positive index of 1 GiB > to get there. On an AM64 system, without a 32-bit mode > in effect, this would index to location 4.5 GiB without > an appropriate DAT mapping. > > Note that the index that would do such a thing may be > in a variable (register) that is only known at runtime, > so it is not something that you can change GCC to stop > generating, and I was wrong to ask for that (for years). > > So, with that said, I have been able to satisfy your > challenge, using real hardware. A real z114 using a > real 3270 terminal. You can see that beautiful terminal here: > https://groups.io/g/hercules-380/message/2391https://groups.io/g/hercules-380/message/2392 > > The second photo of the first link shows the CPU (2818) > > z114 = 2818-M05/M10 > > I can obtain a picture of the sticker if needed. > > No Hercules in sight. > > You could move the goal posts and say that running under > z/VM doesn't count either. > > If you do that, I can run z/PDOS directly on an LPAR > and run the memory test (in fact, this has already > been done), but we don't know the procedure (and may > not have permission) to use the HMC to display memory. > z/PDOS can display its own memory, and this can show > that the memory at 80000000 is different from location 0, > if you accept z/PDOS reporting itself. > > But z/VM is the more "independent" way of displaying > memory, so that there is no chance that z/PDOS can "cheat". > > Here is the test code in z/PDOS: > > else if (memcmp(prog, "MEMTEST", 7) == 0) > { > printf("writing 4 bytes to address X'7FFFFFFE'\n"); > memcpy((char *)0x7ffffffe, "\x01\x02\x03\x04", 4); > printf("done!\n"); > *pdos->context->postecb = 0; > pdos->context->regs[15] = 0; > } > > and the memcpy generates a single MVC instruction: > > MVC 0(4,2),0(3) > > Note that MVC is an instruction that has been available > since the S/360 (in the 1960s). I am actually using the > i370 target of GCC 3.2.3 for this test, but the principle > is the same for s390 (as opposed to s390x) on the latest > GCC. Both are 32-bit. > > Note that the i370 target was written by Jan Stein in 1989 > when he worked at Amdahl, long before AM64 existed. > > It only used S/370 instructions, so runs on anything from > a S/370 up (thanks to upward compatibility). > > That MVC instruction works perfectly fine on z/Arch, as it > does on S/370. > > Other instructions generated by GCC, such as BALR, have > changed behavior slightly as they went from AM24 on S/370 > to AM31 on S/370 XA, and AM64 on z/Arch (and for that > matter, AM32 on S/380 under Hercules/380, or I assume > AM32 on a 360/67). > > The behavior changed in an upwardly-compatible way, so long > as the program was written in a reasonable manner - ie to > not be deliberately dependent on that AM24 or AM31 specific > behavior. The code GCC generates has indeed been written > in that "reasonable manner". > > Other instructions, such as BXLE, that, for certain use > cases, break down at the top end of the lower half of the > 32-bit address space, just as BXLEG breaks down at the > top end of the lower half of the 64-bit address space, are > not generated by GCC at all, so are not relevant. > > Bottom line - GCC generates 32-bit clean code, and as such, > the option should be -m32, not -m31, not -m24, not -mANY. > Keeping -m31 for compatibility reasons is obviously fine, > as would be adding -m24. But both of those things obscure > the fact that this is 32-bit clean code. > > Here is the rest of the context of the generated code: > > MVC 88(4,13),=A(@@LC33) > LA 1,88(,13) > L 15,=A(@@7) > BALR 14,15 > L 3,=A(@@LC34) > L 2,=F'2147483646' > MVC 0(4,2),0(3) > MVC 88(4,13),=A(@@LC35) > LA 1,88(,13) > L 15,=A(@@7) > BALR 14,15 > > > @@LC32 EQU * > DC C'MEMTEST' > DC X'0' > @@LC33 EQU * > DC C'writing 4 bytes to address X''7FFFFFFE''' > DC X'15' > DC X'0' > @@LC34 EQU * > DC X'1' > DC X'2' > DC X'3' > DC X'4' > DC X'0' > @@LC35 EQU * > DC C'done!' > DC X'15' > DC X'0' > > As you can see from the photo of the real 3270 terminal, > that MVC instruction has successfully straddled the > 2 GiB mark, even in a single instruction. > > As you can see from the photo in the second link above, > the memory at location 0 is different (still contains > the IPL PSW!) from the memory at location x'80000000'. > > Do you have any further objections, other than a logical > fallacy such as argumentum ad populum or argumentum ad > baculum, to oppose gcc having -m32 as an option for the > S/390 target, or if the i370 code is added back in, for > that too, given that that is the correct technical nature > of the GCC-generated code? > > Thanks. Paul. > > > > > "Simply switching off optimization made the negative > indexes go away, allowing more than 2 GiB to be > addressed in standard z/Arch, with "-m31". > > Prove it on real hardware, not hercules. Hercules doesnt count. > > Joe > > On Wed, Sep 29, 2021 at 7:09 PM Paul Edwards via Gcc > wrote: > > >* We have fait accompli now: > *>>* https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html > *>>* Simply switching off optimization made the negative > *>* indexes go away, allowing more than 2 GiB to be > *>* addressed in standard z/Arch, with "-m31". > *>>* The above request is to add "-m32" as an alias for > *>* "-m31", but I would like to add as a request for it to > *>* work with optimization on. > *>>* BFN. Paul. > *>>>>>* -----Original Message----- > *>* From: Paul Edwards > *>* Sent: Friday, September 3, 2021 11:12 PM > *>* To: Jakub Jelinek > *>* Cc: Ulrich Weigand ; gcc@gcc.gnu.org ; Ulrich Weigand > *>* Subject: Re: s390 port > *>>* >> > This is not in one single place, but spread throughout the > *>* >> > compiler, both common code and back-end. I do not think it will > *>* >> > be possible to get the compiler to generate correct code if > *>* >> > you do not specify the address size correctly. > *>>* >> 1. Is there any way to put a constraint on index > *>* >> registers, to say that a particular machine can > *>* >> only index in the range of –512 to +512 or some > *>* >> other arbitrary set? If so, I can do 0 to 2 GiB. > *>>* >> 2. Is there a way of saying a machine doesn’t > *>* >> support indexing at all? > *>>* > There is a way to do that, but it isn't about changing a single or a > *>* > couple > *>* > of spots, one needs to change a lot of *.md patterns, a lot of macros, > *>* > target hooks and as Ulrich said, most important is to use the right Pmode > *>* > which can differ from ptr_mode provided one e.g. defines ptr_extend > *>* > pattern > *>* > etc. > *>>* Pardon? All that is required just to put a constraint > *>* on an index register? If a range of a machine is > *>* limited to -512 to +512, it shouldn't be necessary > *>* to change md patterns etc etc. > *>>* > Just look at the amount of work needed for the x32 or aarch64 ilp32 > *>* > support, > *>>* That's different. That's because Intel stuffed up. > *>* IBM didn't. IBM came within an ace of a perfect > *>* architecture. It's as if Intel had created an x32 > *>* instead of an 80386 in 1986. > *>>* IBM got it almost right in the 1960s. > *>>* > and not just work spent one time on adding that support, but the > *>* > continuous > *>* > amount of work on maintaining it. The initial work is certainly a few > *>* > weeks if not months of work, > *>>* I've been trying to figure out how to lift the 31-bit > *>* restriction on mainframes since around 1987. > *>>* If I have to pay someone for 2 month of work, at > *>* this stage, I'm willing to do that, but: > *>>* 1. I would like it done on GCC 3.2.3 plus maybe > *>* GCC 3.4.6. > *>>* 2. How much will it cost in US$? > *>>* > then there needs to be somebody who regularly > *>* > tests gcc trunk and branches in such configuration so that it doesn't > *>* > bitrot, and not just that but somebody who actually fixes bugs in it. > *>>* I'll take responsibility for giving the GCC 3.X.X > *>* releases the TLC they deserve. And I'll encourage > *>* my daughter to maintain them after I've kicked > *>* the bucket. > *>>* > If something doesn't fit into 2GB of address space, > *>* > isn't it likely it won't fit into 4GB of address space > *>* > in a year or two? > *>>* Nope. 2 GiB is already a shitload of memory. It only > *>* takes something like 23 MB for GCC 3.2.3 to recompile > *>* itself, and I think 60 MB for GCC 3.4.6 to recompile > *>* itself. That's the heaviest real workload I do. A 4 GiB > *>* limitation instead of 2 GiB makes it just that much > *>* less likely I'll ever hit a real limit. > *>>* Someone told me that the only non-scientific application > *>* they knew of that came close to hitting the 2 GiB limit > *>* was IBM's C compiler. I doubt that IBM's C compiler > *>* technology is evolving at such a rate that it only takes > *>* 1-2 years for them to subsequently hit 4 GiB. Quite > *>* apart from the fact that I don't really trust that even > *>* IBM C is hitting a 2 GiB limit for what GCC can do in > *>* 23 MiB. But it could be true - I'm not familiar with > *>* compiler internals. > *>>* BFN. Paul. > *>> > > >