From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 6E7303858D20 for ; Tue, 8 Aug 2023 15:35:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6E7303858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 9347C22482; Tue, 8 Aug 2023 15:35:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1691508924; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fi5VjMKdfauZAp6+V2uq2VfOkOm7mMH9J4s1wMEvKnc=; b=BfenNXXrfWqJpnWy+nxzvFZb5ElQhJ55KKhFTfZGJjCe2gbdZMcYLZgJiW2ma+ylI3lNFB jGNE0EiNjcdVZh5035GR9Cte/JdZcQlREOjH6mvWVKPu0/GOY78Iw/4912eThaotn71EKF TSQkc1sbPpuuJAWFReZL/psgOBVuWlQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1691508924; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fi5VjMKdfauZAp6+V2uq2VfOkOm7mMH9J4s1wMEvKnc=; b=4UJkbGNIn/5huoPuw76TTYeqEwuKX4/4XDpwIVq8ptmP6p1moU7vXchJacvHf3w3fyD8Tq 4mkd566+n3jJPZCg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 86F5B2C142; Tue, 8 Aug 2023 15:35:24 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id 7ACB2696E; Tue, 8 Aug 2023 15:35:24 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id 798C36414; Tue, 8 Aug 2023 15:35:24 +0000 (UTC) Date: Tue, 8 Aug 2023 15:35:24 +0000 (UTC) From: Michael Matz To: MegaIng cc: binutils@sourceware.org Subject: Re: Problems with relocations for a custom ISA In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, On Tue, 8 Aug 2023, MegaIng wrote: > > > Most of the basics I already managed to implement, i.e. I can generate > > > simple > > > workable ELF files. However, I am running into problems with relocations > > > for > > > "load immediate" instructions. Without extensions, we want to potentially > > > emit > > > long chains of instruction (3 to 8 instructions is realistic), but with > > > proper > > > extensions in can get down to only 1 instruction of 3 or 4 bytes. I am > > > unsure > > > how to best represent such variable length relocations in BFD and ELF. > > The normal way would be to not do that. It seems the assembler will > > already see either a long chain of small insns, or a single large insn, > > right? > > Our idea was that the user can use a simple pseudo instruction to > represent the entire process of loading a symbol (or any immediate for > that matter). Pseudo instruction makes sense. But then it would still be the assembler that expands it to either a couple base insns or a single extended insn. The linker would see only one or the other, and hence also only the base or the extended relocs. Or did you really want to reserve some specific byte encoding for this pseudo instruction to transfer it from assembler via object file to linker and let only the linker replace that by one or the other variant? That seems an unnecessarily complicated scheme. It depends on if the assembler does or doesn't know if it can target the extended insns, or only the base ones. I would definitely suggest that the assembler at latest should know this. > > (obviously details will differ, your 16bit insns won't be able to quite > > set all 16 bits :) ). > > If you really want to optimize these sequences also at link time (but > > why?) then all of this becomes more complicated, but remains essentially > > the same. The secret will then be in linking from one of the small relocs > > (say, the high16 one) to the other, for the linker to easily recognize the > > whole insn pair and appropriately do something about those byte sequences. > > In that scheme you need to differ between relocations applied to relaxable > > code and relocation applied to random non-relaxable data. E.g. you > > probably need two variants of the RELOC_LOW16 relocation. > > Not sure if you took a look at our instruction set: The way you would load an > arbitrary 16bit word is via a sequence of `slo` (shift left 5 and or) > instructions which use a 5bit immediate (the largest we have in base). So > breaking it up into two RELOC_LOW_16 or similar wouldn't quite work. Sure, as I said above: "obviously details will differ". > It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 RELOC_BITS_10_15 > or something like that. And you couldn't exactly remove one of those > without changing the others. Yes, this is the usual way to express that. There are many architectures which have similar ISA restrictions and they all do it essentially the same way: "select X bits from value, put them into Y bits of field", for potentially many combinations of (not necessarily consecutive) X and Y. > But ofcourse, we don't always need all 4 > instructions, sometimes we can get away with only two or three, for > example if it's only an 8bit value, we only need 2 instructions. We > would like to optimize these cases somewhere. I see. Yeah, that will ultimately need some linker relaxation as only that one will know for sure which values symbols have, and hence if they do or do not fit certain constraints. > After a bit more > discussion we came to the idea of having many relocations that > potentially cover multiple instructions so that the entire > load-immediate sequence can be covered by one relocation, As you have only such a short immediate field in the base ISA this seems like a sensible idea, as otherwise, as you say, you need 7 relocations (and insns) for a full 32bit load. > but this is quite a large amount of relocations. Hmm? I don't understand this remark. If you cover a range of instructions by one relocation you necessarily need fewer relocs than if you use one reloc per insn? > > I wouldn't go that way if I were you: it seems the assembler/compiler > > needs to know if targeting the extended ISA or not anyway, so generating > > the right instructions and relocations from the start in the assembler > > seems the right choice, and then doesn't need any relax complications at > > link time. > > As long as the range (or even the exact value) of the symbol is known at > assembly time, this is ofcourse true, but what about situations where nothing > about the range of the value is known? The compiler/assembler would always emit the full sequence (e.g. assumes that the symbol in question happens to be full 32bit). If you want to optimize this use in case the symbol happens to need fewer bits, then yes, you do need linker relaxation. As said, you then need a way in the linker to recognize an insn sequence that "belongs" together, so that you can appropriately optimize this, either by referring from one to the next reloc in such a chain, or by simply assuming that such sequences are always done in a certain order (i.e. a simple pattern match; unrecognized patterns would remain unrelaxed/unoptimized). The basic form of relocations doesn't depend on that, though. You still need to differ between the lowest N bits of the requested value, the next N bits, the next N bits, and so on, so you do need roundup(32/N) reloc types either way. By restricting certain insn sequences and flexibility you can get away with fewer relocations than this. E.g. with your idea of covering multiple insns with one reloc. Say, if you require that the low 10 bits of a value are always set in this way (and given your ISA that makes sense): shiftset5 %r1, bit04(sym) shiftset5 %r1, bit59(sym) and never with another insn in between, and never in a difference order, then of course you can get away with a relocation (say) RELOC_SHIFTSET10, that takes the low 10 bits of 'sym' and appropriate distributes those 10 bits into the right 5 bit field of the instruction. It would implicitely cover both instructions, i.e. a 32bit place in the code section. If you extend this idea to cover seven instructions of the base ISA you can get away with a single reloc that is able to set the whole 32bit of a value (at the expense of not being able to place unrelated instructions between those seven). > It seems like other assembler targets truncate the values in those > cases? If we went for the minimal representation we would basically > limit external symbols to 5bit, which isn't exactly ideal. And from what > I can tell, growing a relocation also isn't really something bfd is > designed to deal with, right? I'm not super fluent in the actual implementation of bfd linker relaxation. But I don't see why it can't also grow sections. It's true that the usual relaxation shrinks sizes, and it's probably better to follow that as well, but in principle enlarging is no proble either (if you enlarge _and_ shrink in your relaxation you can run into endless oscillation between the two, so that needs to be watched for). But one thing about terminology: relocations themself don't grow or shrink. A relocation in principle applies to a certain address without range. The semantics of a specific relocation type will usually say that these-and-those bits in a field will be changed by it, and you can say that that's the size of a relocation. But not all relocations are like that, and nothing really prevents you from either changing the relocation type when you want something else (in linker relaxation), or even defining a funny type that applies to either (say) a byte or a word, as needed. You need to implement special functions for such relocs then, and can't use the generic simple BFD reloc howto model, but still. Just to expand on this: in principle one could invent a relocation type that says "when the symbol has value '1' change the byte 45 bytes from here to 42, when it has another value then encode that one into the word 7 bytes from here". That's obviously a crazy semantics for a relocation, but nothing inherently prevents you from that. (Of course, making sure that there actually _is_ something 45 bytes from the relocs place is a problem :) ) The "size" of such relocation wouldn't be well-defined anymore (or be 46), but what I'm saying is, that this is okayish. What does grow or shrink is the section content, and hence distance between labels might change during relaxation, which requires delaying resolving jumps until relaxation time as well. This can get quite slow at link time (riscv is plagued by this). Just to make you aware :) One remark: you _really_ should think long and hard about your immediate size in the base ISA. 5 bits is terribly small. Maybe you can snatch away some bits here and there in your 16bit insns to make this 8 bits (something that divides 32 would be ideal), but even 6 would bring the full-32-bit sequence from 7 to 6 instructions. Ciao, Michael.