Problems with relocations for a custom ISA

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* Problems with relocations for a custom ISA
@ 2023-08-07 23:08 MegaIng
  2023-08-08 14:13 ` Michael Matz
  0 siblings, 1 reply; 6+ messages in thread
From: MegaIng @ 2023-08-07 23:08 UTC (permalink / raw)
  To: binutils

Hello,

I am currently in the process of porting binutils to a custom 
architecture I design with a few others (Spec [1], Start of our Port 
[2]). An interesting quirk of this ISA is that its highly modular, 
starting with fixed-size 16bit opcodes, but with extensions supporting 
variable length instructions similar in power to what x86 has with it's 
addressing modes. The base ISA is fixed 16bit word, but there are 
extensions for 32 and 64bit words.

Most of the basics I already managed to implement, i.e. I can generate 
simple workable ELF files. However, I am running into problems with 
relocations for "load immediate" instructions. Without extensions, we 
want to potentially emit long chains of instruction (3 to 8 instructions 
is realistic), but with proper extensions in can get down to only 1 
instruction of 3 or 4 bytes. I am unsure how to best represent such 
variable length relocations in BFD and ELF. It seems like those always 
assume fixed size relocations that get relaxed away in their entirety if 
no longer needed. Is the best solution really to emit multiple 
relocations and treat them as one in our custom 
elf_relocate_sectionfunction?

In a similar vein I noticed that it seems impossible to teach 
bfd_perform_relocation to correctly perform the non-trivial 
transformation required to encode the signed offsets of jumps (since 
they are non-consecutive bitfields), which means that we get garbage if 
that function is called, for example when `--oformat` is not elf. Is 
this really unavoidable? I tried using special_function, but since it's 
also called from bfd_install_relocation, I couldn't figure out what the 
correct behavior inside of it would be.

Thanks in advance,

MegaIng

1. https://github.com/ETC-A/etca-spec
2. https://github.com/ETC-A/etca-binutils-gdb

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems with relocations for a custom ISA
  2023-08-07 23:08 Problems with relocations for a custom ISA MegaIng
@ 2023-08-08 14:13 ` Michael Matz
  2023-08-08 14:35   ` MegaIng
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Matz @ 2023-08-08 14:13 UTC (permalink / raw)
  To: MegaIng; +Cc: binutils

Hello,

On Tue, 8 Aug 2023, MegaIng via Binutils wrote:

> I am currently in the process of porting binutils to a custom architecture I
> design with a few others (Spec [1], Start of our Port [2]). An interesting
> quirk of this ISA is that its highly modular, starting with fixed-size 16bit
> opcodes, but with extensions supporting variable length instructions similar
> in power to what x86 has with it's addressing modes. The base ISA is fixed
> 16bit word, but there are extensions for 32 and 64bit words.
> 
> Most of the basics I already managed to implement, i.e. I can generate simple
> workable ELF files. However, I am running into problems with relocations for
> "load immediate" instructions. Without extensions, we want to potentially emit
> long chains of instruction (3 to 8 instructions is realistic), but with proper
> extensions in can get down to only 1 instruction of 3 or 4 bytes. I am unsure
> how to best represent such variable length relocations in BFD and ELF.

The normal way would be to not do that.  It seems the assembler will 
already see either a long chain of small insns, or a single large insn, 
right?  So at that point you can already emit the correct relocs.  For 
example, if I have three insns: setlo, sethi and setall, setting the low 
16 bits, the high 16 bits, or all 32 bits of a 32bit immediate, then I  
also would have three reloc types: LOW16, HIGH16 and ABS32, which the 
assembler would appropriately emit:

   setlo %r1, lo(sym)  --> RELOC_LOW16, symbol 'sym'
   sethi %r1, hi(sym)  --> RELOC_HIGH16, symbol 'sym'
   setall %r1, sym     --> RELOC_ABS32, symbol 'sym'

(obviously details will differ, your 16bit insns won't be able to quite 
set all 16 bits :) ).

If you really want to optimize these sequences also at link time (but 
why?) then all of this becomes more complicated, but remains essentially 
the same.  The secret will then be in linking from one of the small relocs 
(say, the high16 one) to the other, for the linker to easily recognize the 
whole insn pair and appropriately do something about those byte sequences.  
In that scheme you need to differ between relocations applied to relaxable 
code and relocation applied to random non-relaxable data.  E.g. you 
probably need two variants of the RELOC_LOW16 relocation.

Some bfd targets chose to limit themself to only simple sequences of 
relaxable instructions, e.g. if the low16/high16 setter always comes in 
sequence directly after each other (the compiler or asm author will need 
to ensure this if it wants to benefit from relaxation then), then one 
reloc doesn't need to link to the other.

I wouldn't go that way if I were you: it seems the assembler/compiler 
needs to know if targeting the extended ISA or not anyway, so generating 
the right instructions and relocations from the start in the assembler 
seems the right choice, and then doesn't need any relax complications at 
link time.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems with relocations for a custom ISA
  2023-08-08 14:13 ` Michael Matz
@ 2023-08-08 14:35   ` MegaIng
  2023-08-08 14:55     ` Xi Ruoyao
  2023-08-08 15:35     ` Michael Matz
  0 siblings, 2 replies; 6+ messages in thread
From: MegaIng @ 2023-08-08 14:35 UTC (permalink / raw)
  To: Michael Matz; +Cc: binutils

Hi,

On 2023-08-08 Michael Matz wrote:
> Hello,
>
> On Tue, 8 Aug 2023, MegaIng via Binutils wrote:
>
>> I am currently in the process of porting binutils to a custom architecture I
>> design with a few others (Spec [1], Start of our Port [2]). An interesting
>> quirk of this ISA is that its highly modular, starting with fixed-size 16bit
>> opcodes, but with extensions supporting variable length instructions similar
>> in power to what x86 has with it's addressing modes. The base ISA is fixed
>> 16bit word, but there are extensions for 32 and 64bit words.
>>
>> Most of the basics I already managed to implement, i.e. I can generate simple
>> workable ELF files. However, I am running into problems with relocations for
>> "load immediate" instructions. Without extensions, we want to potentially emit
>> long chains of instruction (3 to 8 instructions is realistic), but with proper
>> extensions in can get down to only 1 instruction of 3 or 4 bytes. I am unsure
>> how to best represent such variable length relocations in BFD and ELF.
> The normal way would be to not do that.  It seems the assembler will
> already see either a long chain of small insns, or a single large insn,
> right?

Our idea was that the user can use a simple pseudo instruction to 
represent the
entire process of loading a symbol (or any immediate for that matter).
Maybe this is a misguided idea?

> So at that point you can already emit the correct relocs.  For
> example, if I have three insns: setlo, sethi and setall, setting the low
> 16 bits, the high 16 bits, or all 32 bits of a 32bit immediate, then I
> also would have three reloc types: LOW16, HIGH16 and ABS32, which the
> assembler would appropriately emit:
>
>     setlo %r1, lo(sym)  --> RELOC_LOW16, symbol 'sym'
>     sethi %r1, hi(sym)  --> RELOC_HIGH16, symbol 'sym'
>     setall %r1, sym     --> RELOC_ABS32, symbol 'sym'
>
> (obviously details will differ, your 16bit insns won't be able to quite
> set all 16 bits :) ).
> If you really want to optimize these sequences also at link time (but
> why?) then all of this becomes more complicated, but remains essentially
> the same.  The secret will then be in linking from one of the small relocs
> (say, the high16 one) to the other, for the linker to easily recognize the
> whole insn pair and appropriately do something about those byte sequences.
> In that scheme you need to differ between relocations applied to relaxable
> code and relocation applied to random non-relaxable data.  E.g. you
> probably need two variants of the RELOC_LOW16 relocation.

Not sure if you took a look at our instruction set: The way you would 
load an arbitrary 16bit word is via a sequence of `slo` (shift left 5 
and or) instructions which use a 5bit immediate (the largest we have in 
base). So breaking it up into two RELOC_LOW_16 or similar wouldn't quite 
work. It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 
RELOC_BITS_10_15 or something like that. And you couldn't exactly remove 
one of those without changing the others. But ofcourse, we don't always 
need all 4 instructions, sometimes we can get away with only two or 
three, for example if it's only an 8bit value, we only need 2 
instructions. We would like to optimize these cases somewhere.
After a bit more discussion we came to the idea of having many relocations
that potentially cover multiple instructions so that the entire load-immediate
sequence can be covered by one relocation, but this is quite a large amount of
relocations.

> Some bfd targets chose to limit themself to only simple sequences of
> relaxable instructions, e.g. if the low16/high16 setter always comes in
> sequence directly after each other (the compiler or asm author will need
> to ensure this if it wants to benefit from relaxation then), then one
> reloc doesn't need to link to the other.
>
> I wouldn't go that way if I were you: it seems the assembler/compiler
> needs to know if targeting the extended ISA or not anyway, so generating
> the right instructions and relocations from the start in the assembler
> seems the right choice, and then doesn't need any relax complications at
> link time.

As long as the range (or even the exact value) of the symbol is known at
assembly time, this is ofcourse true, but what about situations where 
nothing
about the range of the value is known? It seems like other assembler targets
truncate the values in those cases? If we went for the minimal 
representation
we would basically limit external symbols to 5bit, which isn't exactly 
ideal.
And from what I can tell, growing a relocation also isn't really 
something bfd
is designed to deal with, right?

Many thanks already,
MegaIng

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems with relocations for a custom ISA
  2023-08-08 14:35   ` MegaIng
@ 2023-08-08 14:55     ` Xi Ruoyao
  2023-08-08 15:35     ` Michael Matz
  1 sibling, 0 replies; 6+ messages in thread
From: Xi Ruoyao @ 2023-08-08 14:55 UTC (permalink / raw)
  To: MegaIng, Michael Matz; +Cc: binutils

On Tue, 2023-08-08 at 16:35 +0200, MegaIng via Binutils wrote:
> Our idea was that the user can use a simple pseudo instruction to 
> represent the
> entire process of loading a symbol (or any immediate for that matter).
> Maybe this is a misguided idea?

I'd say it's a bad idea.  A stack-based pseudo instruction reloc
approach had been used for LoongArch.  But the "pseudo instruction" has
never been implemented completely (doing so is just impossible unless
you rewrite the entire libbfd) so actually we could only handle some
special cases, and this approach caused much more trouble than the
benefit.  Now we've made these nasty things deprecated and we will
remove the support of them in a future Binutils release.

See https://github.com/loongson/LoongArch-Documentation/issues/9 for the
"much more trouble".

> It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 
> RELOC_BITS_10_15 or something like that

Yes, now we use some traditional reloc types for LoongArch like them. 
And this approach works much better than the previous stack-based one. 
We now really wish we'd never tried the stack-based approach at all.

ELF allows 2^31-1 reloc types, so a larger reloc type set is not an
issue.

> but what about situations where nothing about the range of the value
> is known

You need to design some code models (sets of assumptions for the
ranges), like other BFD ports do.  If you really need the marginal
performance gain by exploiting the range limitations, you can also
implement linker relaxation.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems with relocations for a custom ISA
  2023-08-08 14:35   ` MegaIng
  2023-08-08 14:55     ` Xi Ruoyao
@ 2023-08-08 15:35     ` Michael Matz
  2023-08-08 17:26       ` MegaIng
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Matz @ 2023-08-08 15:35 UTC (permalink / raw)
  To: MegaIng; +Cc: binutils

Hello,

On Tue, 8 Aug 2023, MegaIng wrote:

> > > Most of the basics I already managed to implement, i.e. I can generate
> > > simple
> > > workable ELF files. However, I am running into problems with relocations
> > > for
> > > "load immediate" instructions. Without extensions, we want to potentially
> > > emit
> > > long chains of instruction (3 to 8 instructions is realistic), but with
> > > proper
> > > extensions in can get down to only 1 instruction of 3 or 4 bytes. I am
> > > unsure
> > > how to best represent such variable length relocations in BFD and ELF.
> > The normal way would be to not do that.  It seems the assembler will
> > already see either a long chain of small insns, or a single large insn,
> > right?
> 
> Our idea was that the user can use a simple pseudo instruction to 
> represent the entire process of loading a symbol (or any immediate for 
> that matter).

Pseudo instruction makes sense.  But then it would still be the assembler 
that expands it to either a couple base insns or a single extended insn.
The linker would see only one or the other, and hence also only the base 
or the extended relocs.

Or did you really want to reserve some specific byte encoding for this 
pseudo instruction to transfer it from assembler via object file to linker 
and let only the linker replace that by one or the other variant?  That 
seems an unnecessarily complicated scheme.  It depends on if the assembler 
does or doesn't know if it can target the extended insns, or only the base 
ones.  I would definitely suggest that the assembler at latest should know 
this.

> > (obviously details will differ, your 16bit insns won't be able to quite
> > set all 16 bits :) ).
> > If you really want to optimize these sequences also at link time (but
> > why?) then all of this becomes more complicated, but remains essentially
> > the same.  The secret will then be in linking from one of the small relocs
> > (say, the high16 one) to the other, for the linker to easily recognize the
> > whole insn pair and appropriately do something about those byte sequences.
> > In that scheme you need to differ between relocations applied to relaxable
> > code and relocation applied to random non-relaxable data.  E.g. you
> > probably need two variants of the RELOC_LOW16 relocation.
> 
> Not sure if you took a look at our instruction set: The way you would load an
> arbitrary 16bit word is via a sequence of `slo` (shift left 5 and or)
> instructions which use a 5bit immediate (the largest we have in base). So
> breaking it up into two RELOC_LOW_16 or similar wouldn't quite work.

Sure, as I said above: "obviously details will differ".

> It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 RELOC_BITS_10_15 
> or something like that. And you couldn't exactly remove one of those 
> without changing the others.

Yes, this is the usual way to express that.  There are many architectures 
which have similar ISA restrictions and they all do it essentially the 
same way: "select X bits from value, put them into Y bits of field", for 
potentially many combinations of (not necessarily consecutive) X and Y.

> But ofcourse, we don't always need all 4 
> instructions, sometimes we can get away with only two or three, for 
> example if it's only an 8bit value, we only need 2 instructions. We 
> would like to optimize these cases somewhere.

I see.  Yeah, that will ultimately need some linker relaxation as only 
that one will know for sure which values symbols have, and hence if they 
do or do not fit certain constraints.

> After a bit more 
> discussion we came to the idea of having many relocations that 
> potentially cover multiple instructions so that the entire 
> load-immediate sequence can be covered by one relocation,

As you have only such a short immediate field in the base ISA this seems 
like a sensible idea, as otherwise, as you say, you need 7 relocations 
(and insns) for a full 32bit load.

> but this is quite a large amount of relocations.

Hmm?  I don't understand this remark.  If you cover a range of 
instructions by one relocation you necessarily need fewer relocs than if 
you use one reloc per insn?

> > I wouldn't go that way if I were you: it seems the assembler/compiler
> > needs to know if targeting the extended ISA or not anyway, so generating
> > the right instructions and relocations from the start in the assembler
> > seems the right choice, and then doesn't need any relax complications at
> > link time.
> 
> As long as the range (or even the exact value) of the symbol is known at
> assembly time, this is ofcourse true, but what about situations where nothing
> about the range of the value is known?

The compiler/assembler would always emit the full sequence (e.g. assumes 
that the symbol in question happens to be full 32bit).  If you want to 
optimize this use in case the symbol happens to need fewer bits, then yes, 
you do need linker relaxation.  As said, you then need a way in the linker 
to recognize an insn sequence that "belongs" together, so that you can 
appropriately optimize this, either by referring from one to the next 
reloc in such a chain, or by simply assuming that such sequences are 
always done in a certain order (i.e. a simple pattern match; unrecognized 
patterns would remain unrelaxed/unoptimized).

The basic form of relocations doesn't depend on that, though.  You still 
need to differ between the lowest N bits of the requested value, the next 
N bits, the next N bits, and so on, so you do need roundup(32/N) reloc 
types either way.

By restricting certain insn sequences and flexibility you can get away 
with fewer relocations than this.  E.g. with your idea of covering 
multiple insns with one reloc.  Say, if you require that the low 10 bits 
of a value are always set in this way (and given your ISA that makes 
sense):

   shiftset5 %r1, bit04(sym)
   shiftset5 %r1, bit59(sym)

and never with another insn in between, and never in a difference order, 
then of course you can get away with a relocation (say) RELOC_SHIFTSET10, 
that takes the low 10 bits of 'sym' and appropriate distributes those 10 
bits into the right 5 bit field of the instruction.  It would implicitely 
cover both instructions, i.e. a 32bit place in the code section.

If you extend this idea to cover seven instructions of the base ISA you 
can get away with a single reloc that is able to set the whole 32bit of a 
value (at the expense of not being able to place unrelated instructions 
between those seven).

> It seems like other assembler targets truncate the values in those 
> cases? If we went for the minimal representation we would basically 
> limit external symbols to 5bit, which isn't exactly ideal. And from what 
> I can tell, growing a relocation also isn't really something bfd is 
> designed to deal with, right?

I'm not super fluent in the actual implementation of bfd linker 
relaxation.  But I don't see why it can't also grow sections.  It's true 
that the usual relaxation shrinks sizes, and it's probably better to 
follow that as well, but in principle enlarging is no proble either (if 
you enlarge _and_ shrink in your relaxation you can run into 
endless oscillation between the two, so that needs to be watched for).

But one thing about terminology: relocations themself don't grow or 
shrink.  A relocation in principle applies to a certain address without 
range.  The semantics of a specific relocation type will usually say that 
these-and-those bits in a field will be changed by it, and you can say 
that that's the size of a relocation.  But not all relocations are like 
that, and nothing really prevents you from either changing the relocation 
type when you want something else (in linker relaxation), or even defining 
a funny type that applies to either (say) a byte or a word, as needed.  
You need to implement special functions for such relocs then, and can't 
use the generic simple BFD reloc howto model, but still.

Just to expand on this: in principle one could invent a relocation type 
that says "when the symbol has value '1' change the byte 45 bytes 
from here to 42, when it has another value then encode that one into the 
word 7 bytes from here".  That's obviously a crazy semantics for a 
relocation, but nothing inherently prevents you from that.  (Of course, 
making sure that there actually _is_ something 45 bytes from the relocs 
place is a problem :) )  The "size" of such relocation wouldn't be 
well-defined anymore (or be 46), but what I'm saying is, that this is 
okayish.

What does grow or shrink is the section content, and hence distance 
between labels might change during relaxation, which requires delaying 
resolving jumps until relaxation time as well.  This can get quite slow at 
link time (riscv is plagued by this).  Just to make you aware :)

One remark: you _really_ should think long and hard about your immediate 
size in the base ISA.  5 bits is terribly small.  Maybe you can snatch 
away some bits here and there in your 16bit insns to make this 8 bits 
(something that divides 32 would be ideal), but even 6 would bring the 
full-32-bit sequence from 7 to 6 instructions.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems with relocations for a custom ISA
  2023-08-08 15:35     ` Michael Matz
@ 2023-08-08 17:26       ` MegaIng
  0 siblings, 0 replies; 6+ messages in thread
From: MegaIng @ 2023-08-08 17:26 UTC (permalink / raw)
  To: Michael Matz; +Cc: binutils


Am 2023-08-08 um 17:35 schrieb Michael Matz:
> Hello,
>
> On Tue, 8 Aug 2023, MegaIng wrote:
>
>>>> Most of the basics I already managed to implement, i.e. I can generate
>>>> simple
>>>> workable ELF files. However, I am running into problems with relocations
>>>> for
>>>> "load immediate" instructions. Without extensions, we want to potentially
>>>> emit
>>>> long chains of instruction (3 to 8 instructions is realistic), but with
>>>> proper
>>>> extensions in can get down to only 1 instruction of 3 or 4 bytes. I am
>>>> unsure
>>>> how to best represent such variable length relocations in BFD and ELF.
>>> The normal way would be to not do that.  It seems the assembler will
>>> already see either a long chain of small insns, or a single large insn,
>>> right?
>> Our idea was that the user can use a simple pseudo instruction to
>> represent the entire process of loading a symbol (or any immediate for
>> that matter).
> Pseudo instruction makes sense.  But then it would still be the assembler
> that expands it to either a couple base insns or a single extended insn.
> The linker would see only one or the other, and hence also only the base
> or the extended relocs.
>
> Or did you really want to reserve some specific byte encoding for this
> pseudo instruction to transfer it from assembler via object file to linker
> and let only the linker replace that by one or the other variant?  That
> seems an unnecessarily complicated scheme.  It depends on if the assembler
> does or doesn't know if it can target the extended insns, or only the base
> ones.  I would definitely suggest that the assembler at latest should know
> this.

It wasn't our idea to have a specific bit pattern reserved for that, that
would be quite weird, I agree :-) I think the linker needs knowlegde about
which extensions are available, for that we would use an attributes section
similar to what RISC-V seems to use. (although, maybe we don't need it if we
have many relocation types)

>>> (obviously details will differ, your 16bit insns won't be able to quite
>>> set all 16 bits :) ).
>>> If you really want to optimize these sequences also at link time (but
>>> why?) then all of this becomes more complicated, but remains essentially
>>> the same.  The secret will then be in linking from one of the small relocs
>>> (say, the high16 one) to the other, for the linker to easily recognize the
>>> whole insn pair and appropriately do something about those byte sequences.
>>> In that scheme you need to differ between relocations applied to relaxable
>>> code and relocation applied to random non-relaxable data.  E.g. you
>>> probably need two variants of the RELOC_LOW16 relocation.
>> Not sure if you took a look at our instruction set: The way you would load an
>> arbitrary 16bit word is via a sequence of `slo` (shift left 5 and or)
>> instructions which use a 5bit immediate (the largest we have in base). So
>> breaking it up into two RELOC_LOW_16 or similar wouldn't quite work.
> Sure, as I said above: "obviously details will differ".
>
>> It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 RELOC_BITS_10_15
>> or something like that. And you couldn't exactly remove one of those
>> without changing the others.
> Yes, this is the usual way to express that.  There are many architectures
> which have similar ISA restrictions and they all do it essentially the
> same way: "select X bits from value, put them into Y bits of field", for
> potentially many combinations of (not necessarily consecutive) X and Y.
>
>> But ofcourse, we don't always need all 4
>> instructions, sometimes we can get away with only two or three, for
>> example if it's only an 8bit value, we only need 2 instructions. We
>> would like to optimize these cases somewhere.
> I see.  Yeah, that will ultimately need some linker relaxation as only
> that one will know for sure which values symbols have, and hence if they
> do or do not fit certain constraints.
>
>> After a bit more
>> discussion we came to the idea of having many relocations that
>> potentially cover multiple instructions so that the entire
>> load-immediate sequence can be covered by one relocation,
> As you have only such a short immediate field in the base ISA this seems
> like a sensible idea, as otherwise, as you say, you need 7 relocations
> (and insns) for a full 32bit load.
>
>> but this is quite a large amount of relocations.
> Hmm?  I don't understand this remark.  If you cover a range of
> instructions by one relocation you necessarily need fewer relocs than if
> you use one reloc per insn?

I was considering a large amount of relocation types as a drawback, but 
I now realize
that this can't be avoided no matter which path we chose. We are now 
going to have the
large multi-instruction relocations that can be relaxed one instruction 
at a time instead
of the bit-selection relocations.

>>> I wouldn't go that way if I were you: it seems the assembler/compiler
>>> needs to know if targeting the extended ISA or not anyway, so generating
>>> the right instructions and relocations from the start in the assembler
>>> seems the right choice, and then doesn't need any relax complications at
>>> link time.
>> As long as the range (or even the exact value) of the symbol is known at
>> assembly time, this is ofcourse true, but what about situations where nothing
>> about the range of the value is known?
> The compiler/assembler would always emit the full sequence (e.g. assumes
> that the symbol in question happens to be full 32bit).  If you want to
> optimize this use in case the symbol happens to need fewer bits, then yes,
> you do need linker relaxation.  As said, you then need a way in the linker
> to recognize an insn sequence that "belongs" together, so that you can
> appropriately optimize this, either by referring from one to the next
> reloc in such a chain, or by simply assuming that such sequences are
> always done in a certain order (i.e. a simple pattern match; unrecognized
> patterns would remain unrelaxed/unoptimized).
>
> The basic form of relocations doesn't depend on that, though.  You still
> need to differ between the lowest N bits of the requested value, the next
> N bits, the next N bits, and so on, so you do need roundup(32/N) reloc
> types either way.
>
> By restricting certain insn sequences and flexibility you can get away
> with fewer relocations than this.  E.g. with your idea of covering
> multiple insns with one reloc.  Say, if you require that the low 10 bits
> of a value are always set in this way (and given your ISA that makes
> sense):
>
>     shiftset5 %r1, bit04(sym)
>     shiftset5 %r1, bit59(sym)
>
> and never with another insn in between, and never in a difference order,
> then of course you can get away with a relocation (say) RELOC_SHIFTSET10,
> that takes the low 10 bits of 'sym' and appropriate distributes those 10
> bits into the right 5 bit field of the instruction.  It would implicitely
> cover both instructions, i.e. a 32bit place in the code section.
>
> If you extend this idea to cover seven instructions of the base ISA you
> can get away with a single reloc that is able to set the whole 32bit of a
> value (at the expense of not being able to place unrelated instructions
> between those seven).
My primary interested is to support to load-immediate pseudo opcode, so
I am not going to worry about stuff users could manually write. I don't 
think
there could ever be a benefit to put instruction in the middle of that, so I
am not gonna worry about that.
Although, we might have to split into multiple relocations since bfd 
set's an
upper limit on the amount of bytes a relocation can cover by using a 4-wide
bitfield for that.
>> It seems like other assembler targets truncate the values in those
>> cases? If we went for the minimal representation we would basically
>> limit external symbols to 5bit, which isn't exactly ideal. And from what
>> I can tell, growing a relocation also isn't really something bfd is
>> designed to deal with, right?
> I'm not super fluent in the actual implementation of bfd linker
> relaxation.  But I don't see why it can't also grow sections.  It's true
> that the usual relaxation shrinks sizes, and it's probably better to
> follow that as well, but in principle enlarging is no proble either (if
> you enlarge _and_ shrink in your relaxation you can run into
> endless oscillation between the two, so that needs to be watched for).
>
> But one thing about terminology: relocations themself don't grow or
> shrink.  A relocation in principle applies to a certain address without
> range.  The semantics of a specific relocation type will usually say that
> these-and-those bits in a field will be changed by it, and you can say
> that that's the size of a relocation.  But not all relocations are like
> that, and nothing really prevents you from either changing the relocation
> type when you want something else (in linker relaxation), or even defining
> a funny type that applies to either (say) a byte or a word, as needed.
> You need to implement special functions for such relocs then, and can't
> use the generic simple BFD reloc howto model, but still.
>
> Just to expand on this: in principle one could invent a relocation type
> that says "when the symbol has value '1' change the byte 45 bytes
> from here to 42, when it has another value then encode that one into the
> word 7 bytes from here".  That's obviously a crazy semantics for a
> relocation, but nothing inherently prevents you from that.  (Of course,
> making sure that there actually _is_ something 45 bytes from the relocs
> place is a problem :) )  The "size" of such relocation wouldn't be
> well-defined anymore (or be 46), but what I'm saying is, that this is
> okayish.
>
> What does grow or shrink is the section content, and hence distance
> between labels might change during relaxation, which requires delaying
> resolving jumps until relaxation time as well.  This can get quite slow at
> link time (riscv is plagued by this).  Just to make you aware :)
Yeah, thank you, my word choice was a bit confused. The speed penalty is 
something
we are probably not gonna worry about for the moment, but we will keep 
it in mind.
> One remark: you _really_ should think long and hard about your immediate
> size in the base ISA.  5 bits is terribly small.  Maybe you can snatch
> away some bits here and there in your 16bit insns to make this 8 bits
> (something that divides 32 would be ideal), but even 6 would bring the
> full-32-bit sequence from 7 to 6 instructions.
This is something we had discussed a few times and came to the conclusion
that we prefer the current encoding. We wanted 16bit opcodes and 
byte-aligned
sections and from there the choices do get quite limited. We also wanted
a simple encoding, so we didn't want to have too many complex tricks.
>
> Ciao,
> Michael.
Thank you for taking your time :-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-08-08 17:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-07 23:08 Problems with relocations for a custom ISA MegaIng
2023-08-08 14:13 ` Michael Matz
2023-08-08 14:35   ` MegaIng
2023-08-08 14:55     ` Xi Ruoyao
2023-08-08 15:35     ` Michael Matz
2023-08-08 17:26       ` MegaIng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).