From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by sourceware.org (Postfix) with ESMTPS id 035D63858D28 for ; Sat, 28 Jan 2023 18:51:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 035D63858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x129.google.com with SMTP id b3so13267724lfv.2 for ; Sat, 28 Jan 2023 10:51:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=gxthArqwzMQht/As6dHBnyiilPWYJpbCq5qc6G9XfUU=; b=AVc8JW9YWqUaD8VzmS0Ws4DZHMXDJuuuxlJi3BQiPTLcn1fJJUZa4Tvh4T1zUsNMKo RBTIKNj7OXwQgFP+yf61AYZdAry0Iwe8ItMqhLdDwI6hWgivTib4UmWmwZbdOlUhgG/U t4cOOdmY/F6S7DvG3RpfBHoyitYzma45HJO6bBhUuBktp1y+6nHGYRWL6A4SBM2884U2 Dbz5UWlhGKLw0t6whx//064E99rbyg6Do3MTGX3e+4N0Soj2eCyuIXG7tgoTzZmXxBPU A6qo1zMqK/rOV5HBFKoEODTKE/fZPGhKu1WEGMG9BXusPIIApMUgUvoI0p+n/67+qupR yw1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gxthArqwzMQht/As6dHBnyiilPWYJpbCq5qc6G9XfUU=; b=mtoot/AhfQoKr2lvM5wCobxGUsDIcALkDkbnuikhzZPmEeFumAwFGddXlQbE33uiKI u8p8vyr5sl9wfJHDszglAts6U30v8ExqgSBUv/J2jS6ndZoRjXYsMl+r0+HTGp8KzKEY gQrNtWgp5PZsVNn0Nb1erpty79blqZGIAbN7U7IQ8asquZRGZZ52k6+90en2Zo6it/QW r8jeQQp4wUYM+nNCu3YtdvHgzJvRYdLHbSErbr2TDd5uPjOvC3mtKhcJT+kqlxYOGj70 nyLGPaAFBbeqFCfEwfp0Nu717NlLdAoABWvYdzXkQTi5r/nQRLxRIBwwPJFYxQLqDuaT jIYw== X-Gm-Message-State: AFqh2koytKJ1CDoLTvWzAHE60Z+CRSAvKLNfqBVQtPuqlKIbyRzFZbsh upW6FcHXEsx5LQY02ck+5MVuinA7jNcNSEoGOKXQ1eh8hPU= X-Google-Smtp-Source: AMrXdXvYKlTx9YcCZhhOVTe8LmwtvWFr2Lp0AXx3LqOT34GmF1/LlAj52vIUoVFj9kMGuJ8hnTlHaqah+FA0t+i42cA= X-Received: by 2002:ac2:5228:0:b0:4d5:7953:a4e4 with SMTP id i8-20020ac25228000000b004d57953a4e4mr2587978lfl.65.1674931915747; Sat, 28 Jan 2023 10:51:55 -0800 (PST) MIME-Version: 1.0 From: Paul Edwards Date: Sun, 29 Jan 2023 02:51:43 +0800 Message-ID: Subject: s390 port To: GCC Development , Joe Monk Content-Type: multipart/alternative; boundary="000000000000dc753a05f3577971" X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000dc753a05f3577971 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Joe. Sorry for the delay (1 year and 4 months) in responding to this. There's a long and sad story as to what caused the delay, but we're here now. First of all, Hercules is a very important target. Even if gcc -m31 only allowed writing above 2 GiB on Hercules, that would still be an extremely important result, and justify changing the option to -m32, which is what it inherently is. Just because some arbitrary hardware masks bits at 24, or 31, or 32, or fails to even do a wrap at 64, doesn't alter the inherent fact that GCC is using 32-bit registers. Not 64. Not 31. Not 24. They are general purpose registers being used, so both address and data registers are 32 bits. If you have poorly-written assembler that only works if addresses are being masked to 24 bits, then there would be some justification in referring to that as a 24-bit program. If you have poorly-written assembler that only works if addresses are being masked to 31 bits, then there would be some justification in referring to that as a 31-bit program. But if you have a program that works in both of those AMODEs, ie what IBM calls "AMODE ANY", it would be a bit odd to call it an ANY-bit program, but that would be the exact name you need if you want to continue along that path. And an ANY-including-32-bit program if it also capable of running as AM32 on any real, emulated, or theoretical environment. If you have a poorly-written operating system (like z/OS), that doesn't provide address masking (via DAT) to 32 bits for 32-bit programs, so your only option is to run them as AM31, where negative indexes work, or only run programs that don't use negative indexes (and ensure that the high 32 bits of 64 bit registers are 0), then there would be justification in calling this an AM64-intolerant program or AM64-tolerant program, respectively. z/OS has an additional problem that even in AM64, and even with an AM64-tolerant 32-bit program, there is no way to request memory in the 2 GiB - 4 GiB region other than via crapshoot (use_2g_to_32g or whatever), and even if you win the crapshoot, you can't have a nice display of the 2 GiB boundary being crossed in a single instruction. You could if you switched to supervisor mode/key zero and didn't mind clobbering what was already there, but you would probably still need to switch DAT off. And then because you don't know what damage you have done, you would need to freeze the system and re-IPL. Instead of attempting that, what I did was use a properly-written OS, z/PDOS, that uses DAT (virtual memory) to map the 4 GiB to 8 GiB region to 0 to 4 GiB, so that even in AM64, you effectively get AM32. This is the proper way to handle memory when you run 32-bit programs on a 64-bit system. 32 and 64-bit programs can run transparently with no mode switching required. The 4 GiB to 8 GiB virtual storage region is effectively dead. It is only used for negative indexes, which are a fundmanental part of indexed addressing. Even positive indexes need wrapping. E.g. if you have an address at the 3.5 GiB mark and you wish to access memory at the 0.5 GiB mark, you would use a positive index of 1 GiB to get there. On an AM64 system, without a 32-bit mode in effect, this would index to location 4.5 GiB without an appropriate DAT mapping. Note that the index that would do such a thing may be in a variable (register) that is only known at runtime, so it is not something that you can change GCC to stop generating, and I was wrong to ask for that (for years). So, with that said, I have been able to satisfy your challenge, using real hardware. A real z114 using a real 3270 terminal. You can see that beautiful terminal here: https://groups.io/g/hercules-380/message/2391https://groups.io/g/hercules-3= 80/message/2392 The second photo of the first link shows the CPU (2818) z114 =3D 2818-M05/M10 I can obtain a picture of the sticker if needed. No Hercules in sight. You could move the goal posts and say that running under z/VM doesn't count either. If you do that, I can run z/PDOS directly on an LPAR and run the memory test (in fact, this has already been done), but we don't know the procedure (and may not have permission) to use the HMC to display memory. z/PDOS can display its own memory, and this can show that the memory at 80000000 is different from location 0, if you accept z/PDOS reporting itself. But z/VM is the more "independent" way of displaying memory, so that there is no chance that z/PDOS can "cheat". Here is the test code in z/PDOS: else if (memcmp(prog, "MEMTEST", 7) =3D=3D 0) { printf("writing 4 bytes to address X'7FFFFFFE'\n"); memcpy((char *)0x7ffffffe, "\x01\x02\x03\x04", 4); printf("done!\n"); *pdos->context->postecb =3D 0; pdos->context->regs[15] =3D 0; } and the memcpy generates a single MVC instruction: MVC 0(4,2),0(3) Note that MVC is an instruction that has been available since the S/360 (in the 1960s). I am actually using the i370 target of GCC 3.2.3 for this test, but the principle is the same for s390 (as opposed to s390x) on the latest GCC. Both are 32-bit. Note that the i370 target was written by Jan Stein in 1989 when he worked at Amdahl, long before AM64 existed. It only used S/370 instructions, so runs on anything from a S/370 up (thanks to upward compatibility). That MVC instruction works perfectly fine on z/Arch, as it does on S/370. Other instructions generated by GCC, such as BALR, have changed behavior slightly as they went from AM24 on S/370 to AM31 on S/370 XA, and AM64 on z/Arch (and for that matter, AM32 on S/380 under Hercules/380, or I assume AM32 on a 360/67). The behavior changed in an upwardly-compatible way, so long as the program was written in a reasonable manner - ie to not be deliberately dependent on that AM24 or AM31 specific behavior. The code GCC generates has indeed been written in that "reasonable manner". Other instructions, such as BXLE, that, for certain use cases, break down at the top end of the lower half of the 32-bit address space, just as BXLEG breaks down at the top end of the lower half of the 64-bit address space, are not generated by GCC at all, so are not relevant. Bottom line - GCC generates 32-bit clean code, and as such, the option should be -m32, not -m31, not -m24, not -mANY. Keeping -m31 for compatibility reasons is obviously fine, as would be adding -m24. But both of those things obscure the fact that this is 32-bit clean code. Here is the rest of the context of the generated code: MVC 88(4,13),=3DA(@@LC33) LA 1,88(,13) L 15,=3DA(@@7) BALR 14,15 L 3,=3DA(@@LC34) L 2,=3DF'2147483646' MVC 0(4,2),0(3) MVC 88(4,13),=3DA(@@LC35) LA 1,88(,13) L 15,=3DA(@@7) BALR 14,15 @@LC32 EQU * DC C'MEMTEST' DC X'0' @@LC33 EQU * DC C'writing 4 bytes to address X''7FFFFFFE''' DC X'15' DC X'0' @@LC34 EQU * DC X'1' DC X'2' DC X'3' DC X'4' DC X'0' @@LC35 EQU * DC C'done!' DC X'15' DC X'0' As you can see from the photo of the real 3270 terminal, that MVC instruction has successfully straddled the 2 GiB mark, even in a single instruction. As you can see from the photo in the second link above, the memory at location 0 is different (still contains the IPL PSW!) from the memory at location x'80000000'. Do you have any further objections, other than a logical fallacy such as argumentum ad populum or argumentum ad baculum, to oppose gcc having -m32 as an option for the S/390 target, or if the i370 code is added back in, for that too, given that that is the correct technical nature of the GCC-generated code? Thanks. Paul. "Simply switching off optimization made the negative indexes go away, allowing more than 2 GiB to be addressed in standard z/Arch, with "-m31". Prove it on real hardware, not hercules. Hercules doesnt count. Joe On Wed, Sep 29, 2021 at 7:09 PM Paul Edwards via Gcc wrote: >* We have fait accompli now: *>>* https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html *>>* Simply switching off optimization made the negative *>* indexes go away, allowing more than 2 GiB to be *>* addressed in standard z/Arch, with "-m31". *>>* The above request is to add "-m32" as an alias for *>* "-m31", but I would like to add as a request for it to *>* work with optimization on. *>>* BFN. Paul. *>>>>>* -----Original Message----- *>* From: Paul Edwards *>* Sent: Friday, September 3, 2021 11:12 PM *>* To: Jakub Jelinek *>* Cc: Ulrich Weigand ; gcc@gcc.gnu.org ; Ulrich Weigand *>* Subject: Re: s390 port *>>* >> > This is not in one single place, but spread throughout the *>* >> > compiler, both common code and back-end. I do not think it will *>* >> > be possible to get the compiler to generate correct code if *>* >> > you do not specify the address size correctly. *>>* >> 1. Is there any way to put a constraint on index *>* >> registers, to say that a particular machine can *>* >> only index in the range of =E2=80=93512 to +512 or some *>* >> other arbitrary set? If so, I can do 0 to 2 GiB. *>>* >> 2. Is there a way of saying a machine doesn=E2=80=99t *>* >> support indexing at all? *>>* > There is a way to do that, but it isn't about changing a single or a *>* > couple *>* > of spots, one needs to change a lot of *.md patterns, a lot of macros, *>* > target hooks and as Ulrich said, most important is to use the right P= mode *>* > which can differ from ptr_mode provided one e.g. defines ptr_extend *>* > pattern *>* > etc. *>>* Pardon? All that is required just to put a constraint *>* on an index register? If a range of a machine is *>* limited to -512 to +512, it shouldn't be necessary *>* to change md patterns etc etc. *>>* > Just look at the amount of work needed for the x32 or aarch64 ilp32 *>* > support, *>>* That's different. That's because Intel stuffed up. *>* IBM didn't. IBM came within an ace of a perfect *>* architecture. It's as if Intel had created an x32 *>* instead of an 80386 in 1986. *>>* IBM got it almost right in the 1960s. *>>* > and not just work spent one time on adding that support, but the *>* > continuous *>* > amount of work on maintaining it. The initial work is certainly a few *>* > weeks if not months of work, *>>* I've been trying to figure out how to lift the 31-bit *>* restriction on mainframes since around 1987. *>>* If I have to pay someone for 2 month of work, at *>* this stage, I'm willing to do that, but: *>>* 1. I would like it done on GCC 3.2.3 plus maybe *>* GCC 3.4.6. *>>* 2. How much will it cost in US$? *>>* > then there needs to be somebody who regularly *>* > tests gcc trunk and branches in such configuration so that it doesn't *>* > bitrot, and not just that but somebody who actually fixes bugs in it. *>>* I'll take responsibility for giving the GCC 3.X.X *>* releases the TLC they deserve. And I'll encourage *>* my daughter to maintain them after I've kicked *>* the bucket. *>>* > If something doesn't fit into 2GB of address space, *>* > isn't it likely it won't fit into 4GB of address space *>* > in a year or two? *>>* Nope. 2 GiB is already a shitload of memory. It only *>* takes something like 23 MB for GCC 3.2.3 to recompile *>* itself, and I think 60 MB for GCC 3.4.6 to recompile *>* itself. That's the heaviest real workload I do. A 4 GiB *>* limitation instead of 2 GiB makes it just that much *>* less likely I'll ever hit a real limit. *>>* Someone told me that the only non-scientific application *>* they knew of that came close to hitting the 2 GiB limit *>* was IBM's C compiler. I doubt that IBM's C compiler *>* technology is evolving at such a rate that it only takes *>* 1-2 years for them to subsequently hit 4 GiB. Quite *>* apart from the fact that I don't really trust that even *>* IBM C is hitting a 2 GiB limit for what GCC can do in *>* 23 MiB. But it could be true - I'm not familiar with *>* compiler internals. *>>* BFN. Paul. *>> --000000000000dc753a05f3577971--