From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 8DBD33890435; Tue, 26 May 2020 13:14:44 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8DBD33890435
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
 s=default; t=1590498884;
 bh=KIc7n2F5vrRTbBr4l5DhcjMKfisvlrAvhLTuSoDMCmo=;
 h=From:To:Subject:Date:In-Reply-To:References:From;
 b=uYhNhPd+i1lkl6Pfs585W3zf1Y0dWbjYgz99KxwD/sO3KBujvCPWeA4tRN63a0yHq
 cRojMPNdnnZBl5UZdig/cmYTBryVxvvVnTeyM86pSxvB5ZMBjeRGicS/PeaRce802F
 8928kHBqCFwKoRw+PRyvJ37F/nqd1/6dqtCN2oMo=
From: "wilco at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/95285] AArch64:aarch64 medium code model proposal
Date: Tue, 26 May 2020 13:14:44 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: wilco at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-95285-4-d4mVCPk9b1@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-95285-4@http.gcc.gnu.org/bugzilla/>
References: <bug-95285-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 26 May 2020 13:14:44 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95285

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org
--- Comment #2 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Bu Le from comment #0)
> Created attachment 48584 [details]
> proposed patch
>=20
> I would like to propose an implementation of the medium code model in
> aarch64. A prototype is attached, passed bootstrap and the regression tes=
t.
>=20
> Mcmodel =3D medium is a missing code model in aarch64 architecture, which=
 is
> supported in x86. This code model describes a situation that some small d=
ata
> is relocated by small code model while large data is relocated by large c=
ode
> model. The official statement about medium code model in x86 ABI file page
> 34 URL : https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
>=20
> The key difference between x86 and aarch64 is that x86 can use lea+movabs
> instruction to implement a dynamic relocatable large code model. Currentl=
y,
> large code model in AArch64 relocate the symbol using ldr instruction, wh=
ich
> can only be static linked. However, the small code mode use adrp + ldr
> instruction, which can be dynamic linked. Therefore, the medium code model
> cannot be implemented directly by simply setting a threshold. As a result=
 a
> dynamic reloadable large code model is needed first for a functional medi=
um
> code model.
>=20
> I met this problem when compiling CESM, which is a climate forecast softw=
are
> that widely used in hpc field. In some configure case, when the manipulat=
ing
> large arrays, the large code model with dynamic relocation is needed. The
> following case is abstract from CESM for this scenario.
>=20
> program main
>  common/baz/a,b,c
>  real a,b,c
>  b =3D 1.0
>  call foo()
>  print*, b
>  end
>=20
>  subroutine foo()
>  common/baz/a,b,c
>  real a,b,c
>=20
>  integer, parameter :: nx =3D 1024
>  integer, parameter :: ny =3D 1024
>  integer, parameter :: nz =3D 1024
>  integer, parameter :: nf =3D 1
>  real :: bar(nf,nx*ny*nz)
>  real :: bar1(nf,nx*ny*nz)
>  bar =3D 0.0
>  bar1 =3D0.0
>  b =3D bar(1,1024*1024*100)
>  b =3D bar1(1,1)
>=20
>  return
>  end
>=20
> compile with -mcmodel=3Dsmall -fPIC will give following error due to the
> access of bar1 array
> test.f90:(.text+0x28): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against `.bss'
> test.f90:(.text+0x6c): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against `.bss'
>=20
> compile with -mcmodel=3Dlarge -fPIC will give unsupported error:
> f951: sorry, unimplemented: code model =E2=80=98large=E2=80=99 with =E2=
=80=98-fPIC=E2=80=99
>=20
> As discussed in the beginning, to tackle this problem we have to solve the
> static large code model problem. My solution here is to use
> R_AARCH64_MOVW_PREL_Gx group relocation with instructions to calculate the
> current PC value.
>=20
> Before change (mcmodel=3Dsmall) :
> adrp    x0, bar1.2782
> add     x0, x0, :lo12:bar1.2782
>=20
> After change:(mcmodel =3D medium proposed):
> movz    x0, :prel_g3:bar1.2782
> movk 	x0, :prel_g2_nc:bar1.2782
> movk 	x0, :prel_g1_nc:bar1.2782
> movk 	x0, :prel_g0_nc:bar1.2782
> adr  	x1, .
> sub  	x1, x1, 0x4
> add  	x0, x0, x1
>=20
> The first 4 movk instruction will calculate the offset between bar1 and t=
he
> last movk instruction in 64-bits, which fulfil the requirement of large c=
ode
> model(64-bit relocation).
> The adr+sub instruction will calculate the pc-address of the last movk
> instruction. By adding the offset with the PC address, bar1 can be
> dynamically located.
>=20
> Because this relocation is time consuming, a threshold is set to classify
> the size of the data to be relocated, like x86. The default value of the
> threshold is set to 65536, which is max relocation capability of small co=
de
> model.
> This implementation will also need to amend the linker in binutils so that
> the4 movk can calculated the same pc-offset of the last movk instruction.
>=20
> The good side of this implementation is that it can use existed relocation
> type to prototype a medium code model.
>=20
> The drawback of this implementation also exists.=20
> For start, these 4movk instructions and the adr instruction must be combi=
ned
> in this order. No other instruction should insert in between the sequence,
> which will leads to mistake symbol address. This might impede the insn
> schedule optimizations.=20
> Secondly, the linker need to make the change correspondingly so that every
> mov instruction calculate the same pc-offset. For example, in my
> implementation, the fisrt movz instruction will need to add 12 to the res=
ult
> of ":prel_g3:bar1.2782" to make up the pc-offset.=20=20=20
>=20
> I haven't figure out a suitable solution for these problems yet. You are
> most welcomed to leave your suggestions regarding these issues.

Is the main usage scenario huge arrays? If so, these could easily be alloca=
ted
via malloc at startup rather than using bss. It means an extra indirection =
in
some cases (to load the pointer), but it should be much more efficient than
using a large code model with all the overheads.=