public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: MegaIng <trampchamp@hotmail.de>
To: Michael Matz <matz@suse.de>
Cc: binutils@sourceware.org
Subject: Re: Problems with relocations for a custom ISA
Date: Tue, 8 Aug 2023 16:35:41 +0200	[thread overview]
Message-ID: <DU0PR03MB972916B4EA49090DBED3A80DA20DA@DU0PR03MB9729.eurprd03.prod.outlook.com> (raw)
In-Reply-To: <alpine.LSU.2.20.2308081359530.25429@wotan.suse.de>

Hi,

On 2023-08-08 Michael Matz wrote:
> Hello,
>
> On Tue, 8 Aug 2023, MegaIng via Binutils wrote:
>
>> I am currently in the process of porting binutils to a custom architecture I
>> design with a few others (Spec [1], Start of our Port [2]). An interesting
>> quirk of this ISA is that its highly modular, starting with fixed-size 16bit
>> opcodes, but with extensions supporting variable length instructions similar
>> in power to what x86 has with it's addressing modes. The base ISA is fixed
>> 16bit word, but there are extensions for 32 and 64bit words.
>>
>> Most of the basics I already managed to implement, i.e. I can generate simple
>> workable ELF files. However, I am running into problems with relocations for
>> "load immediate" instructions. Without extensions, we want to potentially emit
>> long chains of instruction (3 to 8 instructions is realistic), but with proper
>> extensions in can get down to only 1 instruction of 3 or 4 bytes. I am unsure
>> how to best represent such variable length relocations in BFD and ELF.
> The normal way would be to not do that.  It seems the assembler will
> already see either a long chain of small insns, or a single large insn,
> right?

Our idea was that the user can use a simple pseudo instruction to 
represent the
entire process of loading a symbol (or any immediate for that matter).
Maybe this is a misguided idea?

> So at that point you can already emit the correct relocs.  For
> example, if I have three insns: setlo, sethi and setall, setting the low
> 16 bits, the high 16 bits, or all 32 bits of a 32bit immediate, then I
> also would have three reloc types: LOW16, HIGH16 and ABS32, which the
> assembler would appropriately emit:
>
>     setlo %r1, lo(sym)  --> RELOC_LOW16, symbol 'sym'
>     sethi %r1, hi(sym)  --> RELOC_HIGH16, symbol 'sym'
>     setall %r1, sym     --> RELOC_ABS32, symbol 'sym'
>
> (obviously details will differ, your 16bit insns won't be able to quite
> set all 16 bits :) ).
> If you really want to optimize these sequences also at link time (but
> why?) then all of this becomes more complicated, but remains essentially
> the same.  The secret will then be in linking from one of the small relocs
> (say, the high16 one) to the other, for the linker to easily recognize the
> whole insn pair and appropriately do something about those byte sequences.
> In that scheme you need to differ between relocations applied to relaxable
> code and relocation applied to random non-relaxable data.  E.g. you
> probably need two variants of the RELOC_LOW16 relocation.

Not sure if you took a look at our instruction set: The way you would 
load an arbitrary 16bit word is via a sequence of `slo` (shift left 5 
and or) instructions which use a 5bit immediate (the largest we have in 
base). So breaking it up into two RELOC_LOW_16 or similar wouldn't quite 
work. It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 
RELOC_BITS_10_15 or something like that. And you couldn't exactly remove 
one of those without changing the others. But ofcourse, we don't always 
need all 4 instructions, sometimes we can get away with only two or 
three, for example if it's only an 8bit value, we only need 2 
instructions. We would like to optimize these cases somewhere.
After a bit more discussion we came to the idea of having many relocations
that potentially cover multiple instructions so that the entire load-immediate
sequence can be covered by one relocation, but this is quite a large amount of
relocations.

> Some bfd targets chose to limit themself to only simple sequences of
> relaxable instructions, e.g. if the low16/high16 setter always comes in
> sequence directly after each other (the compiler or asm author will need
> to ensure this if it wants to benefit from relaxation then), then one
> reloc doesn't need to link to the other.
>
> I wouldn't go that way if I were you: it seems the assembler/compiler
> needs to know if targeting the extended ISA or not anyway, so generating
> the right instructions and relocations from the start in the assembler
> seems the right choice, and then doesn't need any relax complications at
> link time.

As long as the range (or even the exact value) of the symbol is known at
assembly time, this is ofcourse true, but what about situations where 
nothing
about the range of the value is known? It seems like other assembler targets
truncate the values in those cases? If we went for the minimal 
representation
we would basically limit external symbols to 5bit, which isn't exactly 
ideal.
And from what I can tell, growing a relocation also isn't really 
something bfd
is designed to deal with, right?

Many thanks already,
MegaIng


  reply	other threads:[~2023-08-08 14:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07 23:08 MegaIng
2023-08-08 14:13 ` Michael Matz
2023-08-08 14:35   ` MegaIng [this message]
2023-08-08 14:55     ` Xi Ruoyao
2023-08-08 15:35     ` Michael Matz
2023-08-08 17:26       ` MegaIng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DU0PR03MB972916B4EA49090DBED3A80DA20DA@DU0PR03MB9729.eurprd03.prod.outlook.com \
    --to=trampchamp@hotmail.de \
    --cc=binutils@sourceware.org \
    --cc=matz@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).