From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=/cY2=5Z=gmail.com=mutazilah@sourceware.org>
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129])
	by sourceware.org (Postfix) with ESMTPS id 035D63858D28
	for <gcc@gcc.gnu.org>; Sat, 28 Jan 2023 18:51:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 035D63858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lf1-x129.google.com with SMTP id b3so13267724lfv.2
        for <gcc@gcc.gnu.org>; Sat, 28 Jan 2023 10:51:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=to:subject:message-id:date:from:mime-version:from:to:cc:subject
         :date:message-id:reply-to;
        bh=gxthArqwzMQht/As6dHBnyiilPWYJpbCq5qc6G9XfUU=;
        b=AVc8JW9YWqUaD8VzmS0Ws4DZHMXDJuuuxlJi3BQiPTLcn1fJJUZa4Tvh4T1zUsNMKo
         RBTIKNj7OXwQgFP+yf61AYZdAry0Iwe8ItMqhLdDwI6hWgivTib4UmWmwZbdOlUhgG/U
         t4cOOdmY/F6S7DvG3RpfBHoyitYzma45HJO6bBhUuBktp1y+6nHGYRWL6A4SBM2884U2
         Dbz5UWlhGKLw0t6whx//064E99rbyg6Do3MTGX3e+4N0Soj2eCyuIXG7tgoTzZmXxBPU
         A6qo1zMqK/rOV5HBFKoEODTKE/fZPGhKu1WEGMG9BXusPIIApMUgUvoI0p+n/67+qupR
         yw1w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=to:subject:message-id:date:from:mime-version:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=gxthArqwzMQht/As6dHBnyiilPWYJpbCq5qc6G9XfUU=;
        b=mtoot/AhfQoKr2lvM5wCobxGUsDIcALkDkbnuikhzZPmEeFumAwFGddXlQbE33uiKI
         u8p8vyr5sl9wfJHDszglAts6U30v8ExqgSBUv/J2jS6ndZoRjXYsMl+r0+HTGp8KzKEY
         gQrNtWgp5PZsVNn0Nb1erpty79blqZGIAbN7U7IQ8asquZRGZZ52k6+90en2Zo6it/QW
         r8jeQQp4wUYM+nNCu3YtdvHgzJvRYdLHbSErbr2TDd5uPjOvC3mtKhcJT+kqlxYOGj70
         nyLGPaAFBbeqFCfEwfp0Nu717NlLdAoABWvYdzXkQTi5r/nQRLxRIBwwPJFYxQLqDuaT
         jIYw==
X-Gm-Message-State: AFqh2koytKJ1CDoLTvWzAHE60Z+CRSAvKLNfqBVQtPuqlKIbyRzFZbsh
	upW6FcHXEsx5LQY02ck+5MVuinA7jNcNSEoGOKXQ1eh8hPU=
X-Google-Smtp-Source: AMrXdXvYKlTx9YcCZhhOVTe8LmwtvWFr2Lp0AXx3LqOT34GmF1/LlAj52vIUoVFj9kMGuJ8hnTlHaqah+FA0t+i42cA=
X-Received: by 2002:ac2:5228:0:b0:4d5:7953:a4e4 with SMTP id
 i8-20020ac25228000000b004d57953a4e4mr2587978lfl.65.1674931915747; Sat, 28 Jan
 2023 10:51:55 -0800 (PST)
MIME-Version: 1.0
From: Paul Edwards <mutazilah@gmail.com>
Date: Sun, 29 Jan 2023 02:51:43 +0800
Message-ID: <CAMi4NxZqTAw7Q9ASWQkhMcDu=MUh-=MpAFyBvKL6_BwFAm-ZHg@mail.gmail.com>
Subject: s390 port
To: GCC Development <gcc@gcc.gnu.org>, Joe Monk <joemonk64@gmail.com>
Content-Type: multipart/alternative; boundary="000000000000dc753a05f3577971"
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

--000000000000dc753a05f3577971
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Joe.

Sorry for the delay (1 year and 4 months) in responding
to this. There's a long and sad story as to what caused
the delay, but we're here now.

First of all, Hercules is a very important target. Even
if gcc -m31 only allowed writing above 2 GiB on Hercules,
that would still be an extremely important result, and
justify changing the option to -m32, which is what it
inherently is. Just because some arbitrary hardware
masks bits at 24, or 31, or 32, or fails to even do a
wrap at 64, doesn't alter the inherent fact that GCC
is using 32-bit registers. Not 64. Not 31. Not 24.

They are general purpose registers being used, so both
address and data registers are 32 bits.

If you have poorly-written assembler that only works if
addresses are being masked to 24 bits, then there would
be some justification in referring to that as a 24-bit
program.

If you have poorly-written assembler that only works if
addresses are being masked to 31 bits, then there would
be some justification in referring to that as a 31-bit
program.

But if you have a program that works in both of those
AMODEs, ie what IBM calls "AMODE ANY", it would be a
bit odd to call it an ANY-bit program, but that would
be the exact name you need if you want to continue
along that path. And an ANY-including-32-bit program
if it also capable of running as AM32 on any real,
emulated, or theoretical environment.

If you have a poorly-written operating system (like z/OS),
that doesn't provide address masking (via DAT) to 32 bits
for 32-bit programs, so your only option is to run them
as AM31, where negative indexes work, or only run programs
that don't use negative indexes (and ensure that the
high 32 bits of 64 bit registers are 0), then there would
be justification in calling this an AM64-intolerant
program or AM64-tolerant program, respectively.

z/OS has an additional problem that even in AM64, and
even with an AM64-tolerant 32-bit program, there is no
way to request memory in the 2 GiB - 4 GiB region other
than via crapshoot (use_2g_to_32g or whatever), and
even if you win the crapshoot, you can't have a nice
display of the 2 GiB boundary being crossed in a single
instruction. You could if you switched to supervisor
mode/key zero and didn't mind clobbering what was already
there, but you would probably still need to switch DAT off.
And then because you don't know what damage you have
done, you would need to freeze the system and re-IPL.

Instead of attempting that, what I did was use a
properly-written OS, z/PDOS, that uses DAT (virtual
memory) to map the 4 GiB to 8 GiB region to 0 to 4 GiB,
so that even in AM64, you effectively get AM32. This is
the proper way to handle memory when you run 32-bit
programs on a 64-bit system. 32 and 64-bit programs
can run transparently with no mode switching required.
The 4 GiB to 8 GiB virtual storage region is effectively
dead.

It is only used for negative indexes, which are a
fundmanental part of indexed addressing. Even positive
indexes need wrapping. E.g. if you have an address at
the 3.5 GiB mark and you wish to access memory at the
0.5 GiB mark, you would use a positive index of 1 GiB
to get there. On an AM64 system, without a 32-bit mode
in effect, this would index to location 4.5 GiB without
an appropriate DAT mapping.

Note that the index that would do such a thing may be
in a variable (register) that is only known at runtime,
so it is not something that you can change GCC to stop
generating, and I was wrong to ask for that (for years).

So, with that said, I have been able to satisfy your
challenge, using real hardware. A real z114 using a
real 3270 terminal. You can see that beautiful terminal here:
https://groups.io/g/hercules-380/message/2391https://groups.io/g/hercules-3=
80/message/2392

The second photo of the first link shows the CPU (2818)

z114 =3D 2818-M05/M10

I can obtain a picture of the sticker if needed.

No Hercules in sight.

You could move the goal posts and say that running under
z/VM doesn't count either.

If you do that, I can run z/PDOS directly on an LPAR
and run the memory test (in fact, this has already
been done), but we don't know the procedure (and may
not have permission) to use the HMC to display memory.
z/PDOS can display its own memory, and this can show
that the memory at 80000000 is different from location 0,
if you accept z/PDOS reporting itself.

But z/VM is the more "independent" way of displaying
memory, so that there is no chance that z/PDOS can "cheat".

Here is the test code in z/PDOS:

        else if (memcmp(prog, "MEMTEST", 7) =3D=3D 0)
        {
            printf("writing 4 bytes to address X'7FFFFFFE'\n");
            memcpy((char *)0x7ffffffe, "\x01\x02\x03\x04", 4);
            printf("done!\n");
            *pdos->context->postecb =3D 0;
            pdos->context->regs[15] =3D 0;
        }

and the memcpy generates a single MVC instruction:

         MVC   0(4,2),0(3)

Note that MVC is an instruction that has been available
since the S/360 (in the 1960s). I am actually using the
i370 target of GCC 3.2.3 for this test, but the principle
is the same for s390 (as opposed to s390x) on the latest
GCC. Both are 32-bit.

Note that the i370 target was written by Jan Stein in 1989
when he worked at Amdahl, long before AM64 existed.

It only used S/370 instructions, so runs on anything from
a S/370 up (thanks to upward compatibility).

That MVC instruction works perfectly fine on z/Arch, as it
does on S/370.

Other instructions generated by GCC, such as BALR, have
changed behavior slightly as they went from AM24 on S/370
to AM31 on S/370 XA, and AM64 on z/Arch (and for that
matter, AM32 on S/380 under Hercules/380, or I assume
AM32 on a 360/67).

The behavior changed in an upwardly-compatible way, so long
as the program was written in a reasonable manner - ie to
not be deliberately dependent on that AM24 or AM31 specific
behavior. The code GCC generates has indeed been written
in that "reasonable manner".

Other instructions, such as BXLE, that, for certain use
cases, break down at the top end of the lower half of the
32-bit address space, just as BXLEG breaks down at the
top end of the lower half of the 64-bit address space, are
not generated by GCC at all, so are not relevant.

Bottom line - GCC generates 32-bit clean code, and as such,
the option should be -m32, not -m31, not -m24, not -mANY.
Keeping -m31 for compatibility reasons is obviously fine,
as would be adding -m24. But both of those things obscure
the fact that this is 32-bit clean code.

Here is the rest of the context of the generated code:

         MVC   88(4,13),=3DA(@@LC33)
         LA    1,88(,13)
         L     15,=3DA(@@7)
         BALR  14,15
         L     3,=3DA(@@LC34)
         L     2,=3DF'2147483646'
         MVC   0(4,2),0(3)
         MVC   88(4,13),=3DA(@@LC35)
         LA    1,88(,13)
         L     15,=3DA(@@7)
         BALR  14,15


@@LC32   EQU   *
         DC    C'MEMTEST'
         DC    X'0'
@@LC33   EQU   *
         DC    C'writing 4 bytes to address X''7FFFFFFE'''
         DC    X'15'
         DC    X'0'
@@LC34   EQU   *
         DC    X'1'
         DC    X'2'
         DC    X'3'
         DC    X'4'
         DC    X'0'
@@LC35   EQU   *
         DC    C'done!'
         DC    X'15'
         DC    X'0'

As you can see from the photo of the real 3270 terminal,
that MVC instruction has successfully straddled the
2 GiB mark, even in a single instruction.

As you can see from the photo in the second link above,
the memory at location 0 is different (still contains
the IPL PSW!) from the memory at location x'80000000'.

Do you have any further objections, other than a logical
fallacy such as argumentum ad populum or argumentum ad
baculum, to oppose gcc having -m32 as an option for the
S/390 target, or if the i370 code is added back in, for
that too, given that that is the correct technical nature
of the GCC-generated code?

Thanks. Paul.


"Simply switching off optimization made the negative
indexes go away, allowing more than 2 GiB to be
addressed in standard z/Arch, with "-m31".

Prove it on real hardware, not hercules. Hercules doesnt count.

Joe

On Wed, Sep 29, 2021 at 7:09 PM Paul Edwards via Gcc <gcc@gcc.gnu.org>
wrote:

>* We have fait accompli now:
*>>* https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html
<https://gcc.gnu.org/pipermail/gcc/2021-September/237456.html>
*>>* Simply switching off optimization made the negative
*>* indexes go away, allowing more than 2 GiB to be
*>* addressed in standard z/Arch, with "-m31".
*>>* The above request is to add "-m32" as an alias for
*>* "-m31", but I would like to add as a request for it to
*>* work with optimization on.
*>>* BFN. Paul.
*>>>>>* -----Original Message-----
*>* From: Paul Edwards
*>* Sent: Friday, September 3, 2021 11:12 PM
*>* To: Jakub Jelinek
*>* Cc: Ulrich Weigand ; gcc@gcc.gnu.org <gcc@gcc.gnu.org> ; Ulrich Weigand
*>* Subject: Re: s390 port
*>>* >> > This is not in one single place, but spread throughout the
*>* >> > compiler, both common code and back-end.  I do not think it will
*>* >> > be possible to get the compiler to generate correct code if
*>* >> > you do not specify the address size correctly.
*>>* >> 1. Is there any way to put a constraint on index
*>* >> registers, to say that a particular machine can
*>* >> only index in the range of =E2=80=93512 to +512 or some
*>* >> other arbitrary set? If so, I can do 0 to 2 GiB.
*>>* >> 2. Is there a way of saying a machine doesn=E2=80=99t
*>* >> support indexing at all?
*>>* > There is a way to do that, but it isn't about changing a single or a
*>* > couple
*>* > of spots, one needs to change a lot of *.md patterns, a lot of macros,
*>* > target hooks and as Ulrich said, most important is to use the right P=
mode
*>* > which can differ from ptr_mode provided one e.g. defines ptr_extend
*>* > pattern
*>* > etc.
*>>* Pardon? All that is required just to put a constraint
*>* on an index register? If a range of a machine is
*>* limited to -512 to +512, it shouldn't be necessary
*>* to change md patterns etc etc.
*>>* > Just look at the amount of work needed for the x32 or aarch64 ilp32
*>* > support,
*>>* That's different. That's because Intel stuffed up.
*>* IBM didn't. IBM came within an ace of a perfect
*>* architecture. It's as if Intel had created an x32
*>* instead of an 80386 in 1986.
*>>* IBM got it almost right in the 1960s.
*>>* > and not just work spent one time on adding that support, but the
*>* > continuous
*>* > amount of work on maintaining it.  The initial work is certainly a few
*>* > weeks if not months of work,
*>>* I've been trying to figure out how to lift the 31-bit
*>* restriction on mainframes since around 1987.
*>>* If I have to pay someone for 2 month of work, at
*>* this stage, I'm willing to do that, but:
*>>* 1. I would like it done on GCC 3.2.3 plus maybe
*>* GCC 3.4.6.
*>>* 2. How much will it cost in US$?
*>>* > then there needs to be somebody who regularly
*>* > tests gcc trunk and branches in such configuration so that it doesn't
*>* > bitrot, and not just that but somebody who actually fixes bugs in it.
*>>* I'll take responsibility for giving the GCC 3.X.X
*>* releases the TLC they deserve. And I'll encourage
*>* my daughter to maintain them after I've kicked
*>* the bucket.
*>>* > If something doesn't fit into 2GB of address space,
*>* > isn't it likely it won't fit into 4GB of address space
*>* > in a year or two?
*>>* Nope. 2 GiB is already a shitload of memory. It only
*>* takes something like 23 MB for GCC 3.2.3 to recompile
*>* itself, and I think 60 MB for GCC 3.4.6 to recompile
*>* itself. That's the heaviest real workload I do. A 4 GiB
*>* limitation instead of 2 GiB makes it just that much
*>* less likely I'll ever hit a real limit.
*>>* Someone told me that the only non-scientific application
*>* they knew of that came close to hitting the 2 GiB limit
*>* was IBM's C compiler. I doubt that IBM's C compiler
*>* technology is evolving at such a rate that it only takes
*>* 1-2 years for them to subsequently hit 4 GiB. Quite
*>* apart from the fact that I don't really trust that even
*>* IBM C is hitting a 2 GiB limit for what GCC can do in
*>* 23 MiB. But it could be true - I'm not familiar with
*>* compiler internals.
*>>* BFN. Paul.
*>>

--000000000000dc753a05f3577971--