public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Paul Edwards <mutazilah@gmail.com>
To: Joe Monk <joemonk64@gmail.com>
Cc: GCC Development <gcc@gcc.gnu.org>
Subject: Re: s390 port
Date: Sun, 29 Jan 2023 22:30:49 +0800	[thread overview]
Message-ID: <CAMi4NxavzZk-j2OOWikw_j5J_e2FYQ0jjmbYqDL7PW6Fr=TVDQ@mail.gmail.com> (raw)
In-Reply-To: <CAPcd4G-0iiGuyBKSKyJhA=1q9kx-s_riY+YwiGL17+aJ+ef=Xw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 14923 bytes --]

Hi Joe.

They aren't 24 or 31 bit addresses.

All that code I showed was running in AM64. The very
first thing that z/PDOS does when it IPLs is activate
z/Arch mode and enable AM64. That's about 10
assembler instructions and then it's pure 64-bit for
eternity.

Only the lower 32 bits of the 64-bit registers are actually
populated though, because at least at the application
level I am using pure S/370 instructions (from a machine
definition in 1989). Even in the OS (z/PDOS) it's almost
all GCC-generated code, ie S/370 instructions also. All
32-bit registers. All 32-bit clean.

That is why the code ... WORKS.

As opposed to your irrelevant quoting from the POP,
which is ... irrelevant.

BFN. Paul.



On Sun, 29 Jan 2023 at 21:08, Joe Monk <joemonk64@gmail.com> wrote:

> "A 24-bit or 31-bit virtual address is expanded to 64 bits by appending
> 40 or 33 zeros, respectively, on the left before it is translated by means
> of the DAT process, and a 24-bit or 31-bit real address is similarly
> expanded to 64 bits before it is transformed by prefixing. A 24-bit or
> 31-bit absolute address is expanded to 64 bits before main storage is
> accessed." IBM z/Arch POO page 3-6.
>
> I dont see 32 bits anywhere in that process. Unless and until IBM changes
> the architecture definition to include 32 bits in address sizes, there is
> no need for a -m32 switch.
>
> Joe
>
> On Sat, Jan 28, 2023 at 12:51 PM Paul Edwards <mutazilah@gmail.com> wrote:
>
>> Hi Joe.
>>
>> Sorry for the delay (1 year and 4 months) in responding
>> to this. There's a long and sad story as to what caused
>> the delay, but we're here now.
>>
>> First of all, Hercules is a very important target. Even
>> if gcc -m31 only allowed writing above 2 GiB on Hercules,
>> that would still be an extremely important result, and
>> justify changing the option to -m32, which is what it
>> inherently is. Just because some arbitrary hardware
>> masks bits at 24, or 31, or 32, or fails to even do a
>> wrap at 64, doesn't alter the inherent fact that GCC
>> is using 32-bit registers. Not 64. Not 31. Not 24.
>>
>> They are general purpose registers being used, so both
>> address and data registers are 32 bits.
>>
>> If you have poorly-written assembler that only works if
>> addresses are being masked to 24 bits, then there would
>> be some justification in referring to that as a 24-bit
>> program.
>>
>> If you have poorly-written assembler that only works if
>> addresses are being masked to 31 bits, then there would
>> be some justification in referring to that as a 31-bit
>> program.
>>
>> But if you have a program that works in both of those
>> AMODEs, ie what IBM calls "AMODE ANY", it would be a
>> bit odd to call it an ANY-bit program, but that would
>> be the exact name you need if you want to continue
>> along that path. And an ANY-including-32-bit program
>> if it also capable of running as AM32 on any real,
>> emulated, or theoretical environment.
>>
>> If you have a poorly-written operating system (like z/OS),
>> that doesn't provide address masking (via DAT) to 32 bits
>> for 32-bit programs, so your only option is to run them
>> as AM31, where negative indexes work, or only run programs
>> that don't use negative indexes (and ensure that the
>> high 32 bits of 64 bit registers are 0), then there would
>> be justification in calling this an AM64-intolerant
>> program or AM64-tolerant program, respectively.
>>
>> z/OS has an additional problem that even in AM64, and
>> even with an AM64-tolerant 32-bit program, there is no
>> way to request memory in the 2 GiB - 4 GiB region other
>> than via crapshoot (use_2g_to_32g or whatever), and
>> even if you win the crapshoot, you can't have a nice
>> display of the 2 GiB boundary being crossed in a single
>> instruction. You could if you switched to supervisor
>> mode/key zero and didn't mind clobbering what was already
>> there, but you would probably still need to switch DAT off.
>> And then because you don't know what damage you have
>> done, you would need to freeze the system and re-IPL.
>>
>> Instead of attempting that, what I did was use a
>> properly-written OS, z/PDOS, that uses DAT (virtual
>> memory) to map the 4 GiB to 8 GiB region to 0 to 4 GiB,
>> so that even in AM64, you effectively get AM32. This is
>> the proper way to handle memory when you run 32-bit
>> programs on a 64-bit system. 32 and 64-bit programs
>> can run transparently with no mode switching required.
>> The 4 GiB to 8 GiB virtual storage region is effectively
>> dead.
>>
>> It is only used for negative indexes, which are a
>> fundmanental part of indexed addressing. Even positive
>> indexes need wrapping. E.g. if you have an address at
>> the 3.5 GiB mark and you wish to access memory at the
>> 0.5 GiB mark, you would use a positive index of 1 GiB
>> to get there. On an AM64 system, without a 32-bit mode
>> in effect, this would index to location 4.5 GiB without
>> an appropriate DAT mapping.
>>
>> Note that the index that would do such a thing may be
>> in a variable (register) that is only known at runtime,
>> so it is not something that you can change GCC to stop
>> generating, and I was wrong to ask for that (for years).
>>
>> So, with that said, I have been able to satisfy your
>> challenge, using real hardware. A real z114 using a
>> real 3270 terminal. You can see that beautiful terminal here:
>> https://groups.io/g/hercules-380/message/2391https://groups.io/g/hercules-380/message/2392
>>
>> The second photo of the first link shows the CPU (2818)
>>
>> z114 = 2818-M05/M10
>>
>> I can obtain a picture of the sticker if needed.
>>
>> No Hercules in sight.
>>
>> You could move the goal posts and say that running under
>> z/VM doesn't count either.
>>
>> If you do that, I can run z/PDOS directly on an LPAR
>> and run the memory test (in fact, this has already
>> been done), but we don't know the procedure (and may
>> not have permission) to use the HMC to display memory.
>> z/PDOS can display its own memory, and this can show
>> that the memory at 80000000 is different from location 0,
>> if you accept z/PDOS reporting itself.
>>
>> But z/VM is the more "independent" way of displaying
>> memory, so that there is no chance that z/PDOS can "cheat".
>>
>> Here is the test code in z/PDOS:
>>
>>         else if (memcmp(prog, "MEMTEST", 7) == 0)
>>         {
>>             printf("writing 4 bytes to address X'7FFFFFFE'\n");
>>             memcpy((char *)0x7ffffffe, "\x01\x02\x03\x04", 4);
>>             printf("done!\n");
>>             *pdos->context->postecb = 0;
>>             pdos->context->regs[15] = 0;
>>         }
>>
>> and the memcpy generates a single MVC instruction:
>>
>>          MVC   0(4,2),0(3)
>>
>> Note that MVC is an instruction that has been available
>> since the S/360 (in the 1960s). I am actually using the
>> i370 target of GCC 3.2.3 for this test, but the principle
>> is the same for s390 (as opposed to s390x) on the latest
>> GCC. Both are 32-bit.
>>
>> Note that the i370 target was written by Jan Stein in 1989
>> when he worked at Amdahl, long before AM64 existed.
>>
>> It only used S/370 instructions, so runs on anything from
>> a S/370 up (thanks to upward compatibility).
>>
>> That MVC instruction works perfectly fine on z/Arch, as it
>> does on S/370.
>>
>> Other instructions generated by GCC, such as BALR, have
>> changed behavior slightly as they went from AM24 on S/370
>> to AM31 on S/370 XA, and AM64 on z/Arch (and for that
>> matter, AM32 on S/380 under Hercules/380, or I assume
>> AM32 on a 360/67).
>>
>> The behavior changed in an upwardly-compatible way, so long
>> as the program was written in a reasonable manner - ie to
>> not be deliberately dependent on that AM24 or AM31 specific
>> behavior. The code GCC generates has indeed been written
>> in that "reasonable manner".
>>
>> Other instructions, such as BXLE, that, for certain use
>> cases, break down at the top end of the lower half of the
>> 32-bit address space, just as BXLEG breaks down at the
>> top end of the lower half of the 64-bit address space, are
>> not generated by GCC at all, so are not relevant.
>>
>> Bottom line - GCC generates 32-bit clean code, and as such,
>> the option should be -m32, not -m31, not -m24, not -mANY.
>> Keeping -m31 for compatibility reasons is obviously fine,
>> as would be adding -m24. But both of those things obscure
>> the fact that this is 32-bit clean code.
>>
>> Here is the rest of the context of the generated code:
>>
>>          MVC   88(4,13),=A(@@LC33)
>>          LA    1,88(,13)
>>          L     15,=A(@@7)
>>          BALR  14,15
>>          L     3,=A(@@LC34)
>>          L     2,=F'2147483646'
>>          MVC   0(4,2),0(3)
>>          MVC   88(4,13),=A(@@LC35)
>>          LA    1,88(,13)
>>          L     15,=A(@@7)
>>          BALR  14,15
>>
>>
>> @@LC32   EQU   *
>>          DC    C'MEMTEST'
>>          DC    X'0'
>> @@LC33   EQU   *
>>          DC    C'writing 4 bytes to address X''7FFFFFFE'''
>>          DC    X'15'
>>          DC    X'0'
>> @@LC34   EQU   *
>>          DC    X'1'
>>          DC    X'2'
>>          DC    X'3'
>>          DC    X'4'
>>          DC    X'0'
>> @@LC35   EQU   *
>>          DC    C'done!'
>>          DC    X'15'
>>          DC    X'0'
>>
>> As you can see from the photo of the real 3270 terminal,
>> that MVC instruction has successfully straddled the
>> 2 GiB mark, even in a single instruction.
>>
>> As you can see from the photo in the second link above,
>> the memory at location 0 is different (still contains
>> the IPL PSW!) from the memory at location x'80000000'.
>>
>> Do you have any further objections, other than a logical
>> fallacy such as argumentum ad populum or argumentum ad
>> baculum, to oppose gcc having -m32 as an option for the
>> S/390 target, or if the i370 code is added back in, for
>> that too, given that that is the correct technical nature
>> of the GCC-generated code?
>>
>> Thanks. Paul.
>>
>>
>>
>>
>> "Simply switching off optimization made the negative
>> indexes go away, allowing more than 2 GiB to be
>> addressed in standard z/Arch, with "-m31".
>>
>> Prove it on real hardware, not hercules. Hercules doesnt count.
>>
>> Joe
>>
>> On Wed, Sep 29, 2021 at 7:09 PM Paul Edwards via Gcc <gcc@gcc.gnu.org>
>> wrote:
>>
>> >* We have fait accompli now:
>> *>>* https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html <https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html>
>> *>>* Simply switching off optimization made the negative
>> *>* indexes go away, allowing more than 2 GiB to be
>> *>* addressed in standard z/Arch, with "-m31".
>> *>>* The above request is to add "-m32" as an alias for
>> *>* "-m31", but I would like to add as a request for it to
>> *>* work with optimization on.
>> *>>* BFN. Paul.
>> *>>>>>* -----Original Message-----
>> *>* From: Paul Edwards
>> *>* Sent: Friday, September 3, 2021 11:12 PM
>> *>* To: Jakub Jelinek
>> *>* Cc: Ulrich Weigand ; gcc@gcc.gnu.org <gcc@gcc.gnu.org> ; Ulrich Weigand
>> *>* Subject: Re: s390 port
>> *>>* >> > This is not in one single place, but spread throughout the
>> *>* >> > compiler, both common code and back-end.  I do not think it will
>> *>* >> > be possible to get the compiler to generate correct code if
>> *>* >> > you do not specify the address size correctly.
>> *>>* >> 1. Is there any way to put a constraint on index
>> *>* >> registers, to say that a particular machine can
>> *>* >> only index in the range of –512 to +512 or some
>> *>* >> other arbitrary set? If so, I can do 0 to 2 GiB.
>> *>>* >> 2. Is there a way of saying a machine doesn’t
>> *>* >> support indexing at all?
>> *>>* > There is a way to do that, but it isn't about changing a single or a
>> *>* > couple
>> *>* > of spots, one needs to change a lot of *.md patterns, a lot of macros,
>> *>* > target hooks and as Ulrich said, most important is to use the right Pmode
>> *>* > which can differ from ptr_mode provided one e.g. defines ptr_extend
>> *>* > pattern
>> *>* > etc.
>> *>>* Pardon? All that is required just to put a constraint
>> *>* on an index register? If a range of a machine is
>> *>* limited to -512 to +512, it shouldn't be necessary
>> *>* to change md patterns etc etc.
>> *>>* > Just look at the amount of work needed for the x32 or aarch64 ilp32
>> *>* > support,
>> *>>* That's different. That's because Intel stuffed up.
>> *>* IBM didn't. IBM came within an ace of a perfect
>> *>* architecture. It's as if Intel had created an x32
>> *>* instead of an 80386 in 1986.
>> *>>* IBM got it almost right in the 1960s.
>> *>>* > and not just work spent one time on adding that support, but the
>> *>* > continuous
>> *>* > amount of work on maintaining it.  The initial work is certainly a few
>> *>* > weeks if not months of work,
>> *>>* I've been trying to figure out how to lift the 31-bit
>> *>* restriction on mainframes since around 1987.
>> *>>* If I have to pay someone for 2 month of work, at
>> *>* this stage, I'm willing to do that, but:
>> *>>* 1. I would like it done on GCC 3.2.3 plus maybe
>> *>* GCC 3.4.6.
>> *>>* 2. How much will it cost in US$?
>> *>>* > then there needs to be somebody who regularly
>> *>* > tests gcc trunk and branches in such configuration so that it doesn't
>> *>* > bitrot, and not just that but somebody who actually fixes bugs in it.
>> *>>* I'll take responsibility for giving the GCC 3.X.X
>> *>* releases the TLC they deserve. And I'll encourage
>> *>* my daughter to maintain them after I've kicked
>> *>* the bucket.
>> *>>* > If something doesn't fit into 2GB of address space,
>> *>* > isn't it likely it won't fit into 4GB of address space
>> *>* > in a year or two?
>> *>>* Nope. 2 GiB is already a shitload of memory. It only
>> *>* takes something like 23 MB for GCC 3.2.3 to recompile
>> *>* itself, and I think 60 MB for GCC 3.4.6 to recompile
>> *>* itself. That's the heaviest real workload I do. A 4 GiB
>> *>* limitation instead of 2 GiB makes it just that much
>> *>* less likely I'll ever hit a real limit.
>> *>>* Someone told me that the only non-scientific application
>> *>* they knew of that came close to hitting the 2 GiB limit
>> *>* was IBM's C compiler. I doubt that IBM's C compiler
>> *>* technology is evolving at such a rate that it only takes
>> *>* 1-2 years for them to subsequently hit 4 GiB. Quite
>> *>* apart from the fact that I don't really trust that even
>> *>* IBM C is hitting a 2 GiB limit for what GCC can do in
>> *>* 23 MiB. But it could be true - I'm not familiar with
>> *>* compiler internals.
>> *>>* BFN. Paul.
>> *>>
>>
>>
>>

  reply	other threads:[~2023-01-29 14:31 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28 18:51 Paul Edwards
2023-01-29 13:08 ` Joe Monk
2023-01-29 14:30   ` Paul Edwards [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-09-30 21:39 Paul Edwards
2021-09-30  0:08 Paul Edwards
2021-09-30  0:59 ` Joe Monk
2021-09-06 22:44 Build gcc question Gary Oblock
2021-09-07  7:21 ` s390 port Joe Monk
2021-09-08  3:46   ` Paul Edwards
2021-09-02 10:56 Paul Edwards
2009-06-05 15:21 i370 port Ulrich Weigand
2021-09-02  8:15 ` s390 port Paul Edwards
2021-09-02 14:34   ` Ulrich Weigand
2021-09-02 14:50     ` Paul Edwards
2021-09-02 14:53       ` Ulrich Weigand
2021-09-02 15:01         ` Paul Edwards
2021-09-02 15:13           ` Ulrich Weigand
2021-09-02 15:26             ` Paul Edwards
2021-09-02 19:46               ` Ulrich Weigand
2021-09-02 20:05                 ` Paul Edwards
2021-09-02 20:16                   ` Andreas Schwab
2021-09-03 11:18                   ` Ulrich Weigand
2021-09-03 11:35                     ` Paul Edwards
2021-09-03 12:12                       ` Ulrich Weigand
2021-09-03 12:38                         ` Paul Edwards
2021-09-03 12:53                           ` Jakub Jelinek
2021-09-03 13:12                             ` Paul Edwards
2022-12-20  4:27                         ` Paul Edwards

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMi4NxavzZk-j2OOWikw_j5J_e2FYQ0jjmbYqDL7PW6Fr=TVDQ@mail.gmail.com' \
    --to=mutazilah@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=joemonk64@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).