public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "wilco at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/95285] AArch64:aarch64 medium code model proposal
Date: Tue, 26 May 2020 13:14:44 +0000	[thread overview]
Message-ID: <bug-95285-4-d4mVCPk9b1@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-95285-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285

Wilco <wilco at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilco at gcc dot gnu.org

--- Comment #2 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Bu Le from comment #0)
> Created attachment 48584 [details]
> proposed patch
> 
> I would like to propose an implementation of the medium code model in
> aarch64. A prototype is attached, passed bootstrap and the regression test.
> 
> Mcmodel = medium is a missing code model in aarch64 architecture, which is
> supported in x86. This code model describes a situation that some small data
> is relocated by small code model while large data is relocated by large code
> model. The official statement about medium code model in x86 ABI file page
> 34 URL : https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
> 
> The key difference between x86 and aarch64 is that x86 can use lea+movabs
> instruction to implement a dynamic relocatable large code model. Currently,
> large code model in AArch64 relocate the symbol using ldr instruction, which
> can only be static linked. However, the small code mode use adrp + ldr
> instruction, which can be dynamic linked. Therefore, the medium code model
> cannot be implemented directly by simply setting a threshold. As a result a
> dynamic reloadable large code model is needed first for a functional medium
> code model.
> 
> I met this problem when compiling CESM, which is a climate forecast software
> that widely used in hpc field. In some configure case, when the manipulating
> large arrays, the large code model with dynamic relocation is needed. The
> following case is abstract from CESM for this scenario.
> 
> program main
>  common/baz/a,b,c
>  real a,b,c
>  b = 1.0
>  call foo()
>  print*, b
>  end
> 
>  subroutine foo()
>  common/baz/a,b,c
>  real a,b,c
> 
>  integer, parameter :: nx = 1024
>  integer, parameter :: ny = 1024
>  integer, parameter :: nz = 1024
>  integer, parameter :: nf = 1
>  real :: bar(nf,nx*ny*nz)
>  real :: bar1(nf,nx*ny*nz)
>  bar = 0.0
>  bar1 =0.0
>  b = bar(1,1024*1024*100)
>  b = bar1(1,1)
> 
>  return
>  end
> 
> compile with -mcmodel=small -fPIC will give following error due to the
> access of bar1 array
> test.f90:(.text+0x28): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against `.bss'
> test.f90:(.text+0x6c): relocation truncated to fit:
> R_AARCH64_ADR_PREL_PG_HI21 against `.bss'
> 
> compile with -mcmodel=large -fPIC will give unsupported error:
> f951: sorry, unimplemented: code model ‘large’ with ‘-fPIC’
> 
> As discussed in the beginning, to tackle this problem we have to solve the
> static large code model problem. My solution here is to use
> R_AARCH64_MOVW_PREL_Gx group relocation with instructions to calculate the
> current PC value.
> 
> Before change (mcmodel=small) :
> adrp    x0, bar1.2782
> add     x0, x0, :lo12:bar1.2782
> 
> After change:(mcmodel = medium proposed):
> movz    x0, :prel_g3:bar1.2782
> movk 	x0, :prel_g2_nc:bar1.2782
> movk 	x0, :prel_g1_nc:bar1.2782
> movk 	x0, :prel_g0_nc:bar1.2782
> adr  	x1, .
> sub  	x1, x1, 0x4
> add  	x0, x0, x1
> 
> The first 4 movk instruction will calculate the offset between bar1 and the
> last movk instruction in 64-bits, which fulfil the requirement of large code
> model(64-bit relocation).
> The adr+sub instruction will calculate the pc-address of the last movk
> instruction. By adding the offset with the PC address, bar1 can be
> dynamically located.
> 
> Because this relocation is time consuming, a threshold is set to classify
> the size of the data to be relocated, like x86. The default value of the
> threshold is set to 65536, which is max relocation capability of small code
> model.
> This implementation will also need to amend the linker in binutils so that
> the4 movk can calculated the same pc-offset of the last movk instruction.
> 
> The good side of this implementation is that it can use existed relocation
> type to prototype a medium code model.
> 
> The drawback of this implementation also exists. 
> For start, these 4movk instructions and the adr instruction must be combined
> in this order. No other instruction should insert in between the sequence,
> which will leads to mistake symbol address. This might impede the insn
> schedule optimizations. 
> Secondly, the linker need to make the change correspondingly so that every
> mov instruction calculate the same pc-offset. For example, in my
> implementation, the fisrt movz instruction will need to add 12 to the result
> of ":prel_g3:bar1.2782" to make up the pc-offset.   
> 
> I haven't figure out a suitable solution for these problems yet. You are
> most welcomed to leave your suggestions regarding these issues.

Is the main usage scenario huge arrays? If so, these could easily be allocated
via malloc at startup rather than using bss. It means an extra indirection in
some cases (to load the pointer), but it should be much more efficient than
using a large code model with all the overheads.

  parent reply	other threads:[~2020-05-26 13:14 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-23  6:40 [Bug target/95285] New: " bule1 at huawei dot com
2020-05-23  6:45 ` [Bug target/95285] " bule1 at huawei dot com
2020-05-26 13:14 ` wilco at gcc dot gnu.org [this message]
2020-05-26 13:42 ` bule1 at huawei dot com
2020-05-26 14:11 ` wilco at gcc dot gnu.org
2020-05-26 15:05 ` wilco at gcc dot gnu.org
2020-05-27  8:09 ` bule1 at huawei dot com
2020-05-27  9:17 ` bule1 at huawei dot com
2020-05-27  9:43 ` wilco at gcc dot gnu.org
2020-05-27 10:19 ` wilco at gcc dot gnu.org
2020-05-27 12:05 ` bule1 at huawei dot com
2020-05-27 12:38 ` bule1 at huawei dot com
2020-05-27 12:48 ` wilco at gcc dot gnu.org
2020-05-27 13:05 ` wilco at gcc dot gnu.org
2020-05-27 13:19 ` bule1 at huawei dot com
2020-05-28 13:23 ` wilco at gcc dot gnu.org
2020-12-10 13:56 ` wdijkstr at arm dot com
2021-09-10 10:35 ` wilco at gcc dot gnu.org
2021-09-11  2:00 ` pinskia at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-95285-4-d4mVCPk9b1@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).