From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=GU9v=DZ=hotmail.de=trampchamp@sourceware.org>
Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05olkn2034.outbound.protection.outlook.com [40.92.89.34])
	by sourceware.org (Postfix) with ESMTPS id 0527C3858D20
	for <binutils@sourceware.org>; Tue,  8 Aug 2023 17:26:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0527C3858D20
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=hotmail.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=hotmail.de
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=i9/IGpVVqOg98HPIWYgUoLk07eHmG0xzZMPPezpJby+Hs307nPs9e6XMXBYJvNcsb81u+Wof8+pO04hAh5XaSkDmfsSOjnbjNWXenwAT7qfz3B0uH4f9/jb0TsaSU4zAsvEznWz2aIEsttsqJjPot+62gSjB5/zpJl0y/Wi3QflZjLGox7PjnaxuNHIl4zMxFt5SGG8KKrtH3PFZVsccZb/usUDLeCstb1LfXoNzFhDduwGlxXOKhydGd3IchVLH0rBF7A8jdAbEIoNqU4GV0ztqrIEEvcgiBF4qcLYJkv+soVa5H68OIQ5ABgsXmT3WvgJco53YfwxnZ3wr0Xx8HQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=TMgKnARTjG6ZNvvfqWHjq0Dp8z/fYOnWLJaVaKEN1RQ=;
 b=aaGU9sKQlzPM1NHFjB20ER5hWp5qrqKchyAT1Cqbu8DWn1pL3TWbo16Hl8ZmC8K51zpFhvG8Z84IfuHHIYJeX+xKL78GjEFUIPK0bJQhL+KBfQq4sHzP/p/lkZaG34azJz3KR7j9b9xTLlHGB+ylAWgl6j3oTZYSc3++mP5RV0i4PBZwTzQePRi7l7pbiwBUwyoOGsrnzcoX/aM+rcpWc7YGht56mXoddHeeeLq2R3S3Zh4FH5NuahewkUqBCtorsfdPT6Ob6pL2olajtja7scAuy3vUN6fMXOYhKnUTl3iSa8/GuffOybO2XEEhAQEZNTCWtfIsw+bN4CuWc5j4QA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none;
 dkim=none; arc=none
Received: from DU0PR03MB9729.eurprd03.prod.outlook.com (2603:10a6:10:44f::14)
 by AS2PR03MB9877.eurprd03.prod.outlook.com (2603:10a6:20b:546::21) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.27; Tue, 8 Aug
 2023 17:26:44 +0000
Received: from DU0PR03MB9729.eurprd03.prod.outlook.com
 ([fe80::db56:e4e2:dff0:1]) by DU0PR03MB9729.eurprd03.prod.outlook.com
 ([fe80::db56:e4e2:dff0:1%5]) with mapi id 15.20.6652.026; Tue, 8 Aug 2023
 17:26:44 +0000
Message-ID:
 <DU0PR03MB972945AC19E39D4883E65318A20DA@DU0PR03MB9729.eurprd03.prod.outlook.com>
Date: Tue, 8 Aug 2023 19:26:42 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.14.0
Subject: Re: Problems with relocations for a custom ISA
Content-Language: de-DE, en-US
To: Michael Matz <matz@suse.de>
References: <DU0PR03MB972945947ABA24B8B484C16FA20CA@DU0PR03MB9729.eurprd03.prod.outlook.com>
 <alpine.LSU.2.20.2308081359530.25429@wotan.suse.de>
 <DU0PR03MB972916B4EA49090DBED3A80DA20DA@DU0PR03MB9729.eurprd03.prod.outlook.com>
 <alpine.LSU.2.20.2308081444200.25429@wotan.suse.de>
From: MegaIng <trampchamp@hotmail.de>
Cc: binutils@sourceware.org
In-Reply-To: <alpine.LSU.2.20.2308081444200.25429@wotan.suse.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-TMN: [0gGiIw+H6Y7wX4vtdW/Uo8YgKMAKk9HU]
X-ClientProxiedBy: FR2P281CA0101.DEUP281.PROD.OUTLOOK.COM
 (2603:10a6:d10:9c::7) To DU0PR03MB9729.eurprd03.prod.outlook.com
 (2603:10a6:10:44f::14)
X-Microsoft-Original-Message-ID:
 <63653cc4-2c25-841f-449e-bce56b4947f6@hotmail.de>
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: DU0PR03MB9729:EE_|AS2PR03MB9877:EE_
X-MS-Office365-Filtering-Correlation-Id: 16e0396e-854c-4daa-b98d-08db9834a2f1
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info:
	5PLFan9p55fH2YsuT+dTsDgU/u8MtrEPJEUOGp3VYVyFufTrVp1OSKUZQFOlVVuXpvnI2KlX4srU2Qb+0kfreFgO5wf7Xxm/kyaWxqt9lgaQzEXBVCwfpqrYcFwZ/JN7jTHFNfSsLTwAfmXVMPhgvyzBSHPwWGV9797iTC3zGCd1/4hdftryq21wCQbVsgxL7hgtiHM37BhEq0NKD4lr0F7L/+AeajPEKloqfiSaD9Oly14CGQkwV5uJj4S6cf70XDVPWd8AWTJJE1knrMHJNk/6Aybh877UXYGq97Abis/3ZIE9GwsTgj3qIxq3saG6PJqcOj0NfFGjktOwYyO0ycnttScZJVz8weJPaept/22mfMA6bma4r37USdYhx+5csU2hNgynnGuE4w9bWQta+8SE38x3qXG+mpQaRSG0lZkgab99CQsLi5zA2Aq7r0EEAkB7g1OAAOtwGpLhfrTAYPdrZRkK/8D1U75z8rbJL1lXBfqYosIUOW9PZKe2L5ZhkJgiazrs031olekHUUmHk6Xao3gagOV7k9sz1h3oMqcTIpQpN+C3p3r8s29e/yCl
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0:
	=?utf-8?B?OVorSDl6UG5PdFh0ZmpZeThpZnRzalBwWlZ2YkdhWW1aY2tnU0V3UnRaZ0Yz?=
 =?utf-8?B?UldRUkRUYUZIS09QdFhWVk9xbW5xaDB4ZTRETk5qbVZ0Y0pySzFocjhDL05m?=
 =?utf-8?B?UnhiNzlpOCtKdngydzk0WWpxZHo4WTJEM2hmanJhUFEzRFZ6WW1ydU5oOFBl?=
 =?utf-8?B?cVl0Qm14MWUrMTBOUUo5Y1JGQms3a2JKMW8zekh5SHM2ZkFzTFQ1WXM1Z1FJ?=
 =?utf-8?B?cnZlSmhZME83Vk1rbVBuWmNHS0VPTFU3U2d4dXVMV2E2U2FRaHZOL0JrOStq?=
 =?utf-8?B?Z0ZCc0FiQWIrTHlueEVyUGlsSGFzSytPTW9FN0ZzaE1SSjlZcHBLSXc4VWlX?=
 =?utf-8?B?aWMzOVVBQUFRMk1mT3YvRlY0OTlFUSt1VWxmczBrTk5vNVJjVlhCLzlGMThz?=
 =?utf-8?B?cVJEMTFFL0U1TG9PYm54STR6cTAzdTVJVHZxMWp0NzVDMlBIbG5mRmswRThl?=
 =?utf-8?B?TGdYMzQzd3c3alJQZWd4ZmFJTlFqbE53LzE1cFE4aitWbnRYUTl6TWJieXJK?=
 =?utf-8?B?ZDBiOEQ2UFBVZmhEQ0tBVVg3SGNwVk5lUEkyeW5FNVpMeVR1STlFbytEbmNZ?=
 =?utf-8?B?VEFQbm9FbE5GUTQrY216d0taVG1oZXZndUV3V2dxc2VZS2VCK3lBVjJBSjNz?=
 =?utf-8?B?aXN2eXJXWE5nOXVNUGV4RWI1ZStNNjBDWnZ0YlpmUy9LOGZ6K2lLMjY5ZzFu?=
 =?utf-8?B?ZVM5SDF2TFJURUp0eW9VYlN1WmZHbEpQTTIvaHBSaFFBUWtDZG5pOTgxRzZl?=
 =?utf-8?B?cUl2MStiM21Ra3R0cWFWL1M2VG0vU0F2Q1l5WTlUWlZ6czliSmlhMDluejR0?=
 =?utf-8?B?OFVVa0dJcjVaTWZpdzNXdG8yOCtrbGY3OGs5b2VHOUFHbFREZHl4QkhKcGFW?=
 =?utf-8?B?bEM0S0lFQUFwWFNJRXB2ejE5R2EvRVJtbVBTNFRkNXJYejBGaHVNMDhkb3Vj?=
 =?utf-8?B?WXZ6cEEyRTc1em1JNWh6WVFRYTUxOUthWWZuWXF5c3FnWSt2YjllSkdMb0Uw?=
 =?utf-8?B?U29Nb0x5L0o3anNFb2xuaVQwZTA5TlJtc2lkaGR1Z1pPT0VLcG4vek42REF1?=
 =?utf-8?B?bTBSQlZZS2Q3N2JkakxnWFlBbmk3NXgweXc0ZFVWUjlBdVdhUnQxVE9EYWxW?=
 =?utf-8?B?SEttbnRxMVlvNUZmd1cySld6RjFlL2lBNVJtMFl3NDF0ZXduakY1VS9iZDlP?=
 =?utf-8?B?UEZsOWRlUjZzNzY1VVB6aUp2ZGppK1ZMcWgyNjZOc1BZczZRU1JuODNTT2oy?=
 =?utf-8?B?Mi9mM1BadTY5a0FLanNxUS80MnZyS3pLR1NpRUZpZG5sSHViWTd1YzFad3J0?=
 =?utf-8?B?WFNNb2VDbnhVdHdQWmFSNzh5MnFVbThlaUVlVWxkcHpmZklBRm55b3lrRHNn?=
 =?utf-8?B?VjVLaTM2UmliZXFVZ255bmNJV1VLK1k0QldudmdGeEFSb1lLUnJwSFE2a21o?=
 =?utf-8?B?RjhpdkRqcHBVODJXdVpRWld5RUx0M0lXZlJ1V3dIVHJjOFFBTCt3aUlmU3d3?=
 =?utf-8?B?MzRUWXpTSG4rK3JEdzkrV1VQK242eGpzc0l4T2hRMHFjcDVCTjJLdG9BODY0?=
 =?utf-8?B?YjUzblpqZXpxL1h5R2JtUk02Wm52WmVxQ2NOTzZLaFRiYmZKdTJzNjhIdHJB?=
 =?utf-8?B?Y1FHUHdFeGVXV3VZbzcydTYzalpxQmc9PQ==?=
X-OriginatorOrg: sct-15-20-4755-11-msonline-outlook-76d7b.templateTenant
X-MS-Exchange-CrossTenant-Network-Message-Id: 16e0396e-854c-4daa-b98d-08db9834a2f1
X-MS-Exchange-CrossTenant-AuthSource: DU0PR03MB9729.eurprd03.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Aug 2023 17:26:44.5478
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa
X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg:
	00000000-0000-0000-0000-000000000000
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR03MB9877
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,FORGED_MUA_MOZILLA,FREEMAIL_FROM,KAM_DMARC_STATUS,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <binutils.sourceware.org>


Am 2023-08-08 um 17:35 schrieb Michael Matz:
> Hello,
>
> On Tue, 8 Aug 2023, MegaIng wrote:
>
>>>> Most of the basics I already managed to implement, i.e. I can generate
>>>> simple
>>>> workable ELF files. However, I am running into problems with relocations
>>>> for
>>>> "load immediate" instructions. Without extensions, we want to potentially
>>>> emit
>>>> long chains of instruction (3 to 8 instructions is realistic), but with
>>>> proper
>>>> extensions in can get down to only 1 instruction of 3 or 4 bytes. I am
>>>> unsure
>>>> how to best represent such variable length relocations in BFD and ELF.
>>> The normal way would be to not do that.  It seems the assembler will
>>> already see either a long chain of small insns, or a single large insn,
>>> right?
>> Our idea was that the user can use a simple pseudo instruction to
>> represent the entire process of loading a symbol (or any immediate for
>> that matter).
> Pseudo instruction makes sense.  But then it would still be the assembler
> that expands it to either a couple base insns or a single extended insn.
> The linker would see only one or the other, and hence also only the base
> or the extended relocs.
>
> Or did you really want to reserve some specific byte encoding for this
> pseudo instruction to transfer it from assembler via object file to linker
> and let only the linker replace that by one or the other variant?  That
> seems an unnecessarily complicated scheme.  It depends on if the assembler
> does or doesn't know if it can target the extended insns, or only the base
> ones.  I would definitely suggest that the assembler at latest should know
> this.

It wasn't our idea to have a specific bit pattern reserved for that, that
would be quite weird, I agree :-) I think the linker needs knowlegde about
which extensions are available, for that we would use an attributes section
similar to what RISC-V seems to use. (although, maybe we don't need it if we
have many relocation types)

>>> (obviously details will differ, your 16bit insns won't be able to quite
>>> set all 16 bits :) ).
>>> If you really want to optimize these sequences also at link time (but
>>> why?) then all of this becomes more complicated, but remains essentially
>>> the same.  The secret will then be in linking from one of the small relocs
>>> (say, the high16 one) to the other, for the linker to easily recognize the
>>> whole insn pair and appropriately do something about those byte sequences.
>>> In that scheme you need to differ between relocations applied to relaxable
>>> code and relocation applied to random non-relaxable data.  E.g. you
>>> probably need two variants of the RELOC_LOW16 relocation.
>> Not sure if you took a look at our instruction set: The way you would load an
>> arbitrary 16bit word is via a sequence of `slo` (shift left 5 and or)
>> instructions which use a 5bit immediate (the largest we have in base). So
>> breaking it up into two RELOC_LOW_16 or similar wouldn't quite work.
> Sure, as I said above: "obviously details will differ".
>
>> It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 RELOC_BITS_10_15
>> or something like that. And you couldn't exactly remove one of those
>> without changing the others.
> Yes, this is the usual way to express that.  There are many architectures
> which have similar ISA restrictions and they all do it essentially the
> same way: "select X bits from value, put them into Y bits of field", for
> potentially many combinations of (not necessarily consecutive) X and Y.
>
>> But ofcourse, we don't always need all 4
>> instructions, sometimes we can get away with only two or three, for
>> example if it's only an 8bit value, we only need 2 instructions. We
>> would like to optimize these cases somewhere.
> I see.  Yeah, that will ultimately need some linker relaxation as only
> that one will know for sure which values symbols have, and hence if they
> do or do not fit certain constraints.
>
>> After a bit more
>> discussion we came to the idea of having many relocations that
>> potentially cover multiple instructions so that the entire
>> load-immediate sequence can be covered by one relocation,
> As you have only such a short immediate field in the base ISA this seems
> like a sensible idea, as otherwise, as you say, you need 7 relocations
> (and insns) for a full 32bit load.
>
>> but this is quite a large amount of relocations.
> Hmm?  I don't understand this remark.  If you cover a range of
> instructions by one relocation you necessarily need fewer relocs than if
> you use one reloc per insn?

I was considering a large amount of relocation types as a drawback, but 
I now realize
that this can't be avoided no matter which path we chose. We are now 
going to have the
large multi-instruction relocations that can be relaxed one instruction 
at a time instead
of the bit-selection relocations.

>>> I wouldn't go that way if I were you: it seems the assembler/compiler
>>> needs to know if targeting the extended ISA or not anyway, so generating
>>> the right instructions and relocations from the start in the assembler
>>> seems the right choice, and then doesn't need any relax complications at
>>> link time.
>> As long as the range (or even the exact value) of the symbol is known at
>> assembly time, this is ofcourse true, but what about situations where nothing
>> about the range of the value is known?
> The compiler/assembler would always emit the full sequence (e.g. assumes
> that the symbol in question happens to be full 32bit).  If you want to
> optimize this use in case the symbol happens to need fewer bits, then yes,
> you do need linker relaxation.  As said, you then need a way in the linker
> to recognize an insn sequence that "belongs" together, so that you can
> appropriately optimize this, either by referring from one to the next
> reloc in such a chain, or by simply assuming that such sequences are
> always done in a certain order (i.e. a simple pattern match; unrecognized
> patterns would remain unrelaxed/unoptimized).
>
> The basic form of relocations doesn't depend on that, though.  You still
> need to differ between the lowest N bits of the requested value, the next
> N bits, the next N bits, and so on, so you do need roundup(32/N) reloc
> types either way.
>
> By restricting certain insn sequences and flexibility you can get away
> with fewer relocations than this.  E.g. with your idea of covering
> multiple insns with one reloc.  Say, if you require that the low 10 bits
> of a value are always set in this way (and given your ISA that makes
> sense):
>
>     shiftset5 %r1, bit04(sym)
>     shiftset5 %r1, bit59(sym)
>
> and never with another insn in between, and never in a difference order,
> then of course you can get away with a relocation (say) RELOC_SHIFTSET10,
> that takes the low 10 bits of 'sym' and appropriate distributes those 10
> bits into the right 5 bit field of the instruction.  It would implicitely
> cover both instructions, i.e. a 32bit place in the code section.
>
> If you extend this idea to cover seven instructions of the base ISA you
> can get away with a single reloc that is able to set the whole 32bit of a
> value (at the expense of not being able to place unrelated instructions
> between those seven).
My primary interested is to support to load-immediate pseudo opcode, so
I am not going to worry about stuff users could manually write. I don't 
think
there could ever be a benefit to put instruction in the middle of that, so I
am not gonna worry about that.
Although, we might have to split into multiple relocations since bfd 
set's an
upper limit on the amount of bytes a relocation can cover by using a 4-wide
bitfield for that.
>> It seems like other assembler targets truncate the values in those
>> cases? If we went for the minimal representation we would basically
>> limit external symbols to 5bit, which isn't exactly ideal. And from what
>> I can tell, growing a relocation also isn't really something bfd is
>> designed to deal with, right?
> I'm not super fluent in the actual implementation of bfd linker
> relaxation.  But I don't see why it can't also grow sections.  It's true
> that the usual relaxation shrinks sizes, and it's probably better to
> follow that as well, but in principle enlarging is no proble either (if
> you enlarge _and_ shrink in your relaxation you can run into
> endless oscillation between the two, so that needs to be watched for).
>
> But one thing about terminology: relocations themself don't grow or
> shrink.  A relocation in principle applies to a certain address without
> range.  The semantics of a specific relocation type will usually say that
> these-and-those bits in a field will be changed by it, and you can say
> that that's the size of a relocation.  But not all relocations are like
> that, and nothing really prevents you from either changing the relocation
> type when you want something else (in linker relaxation), or even defining
> a funny type that applies to either (say) a byte or a word, as needed.
> You need to implement special functions for such relocs then, and can't
> use the generic simple BFD reloc howto model, but still.
>
> Just to expand on this: in principle one could invent a relocation type
> that says "when the symbol has value '1' change the byte 45 bytes
> from here to 42, when it has another value then encode that one into the
> word 7 bytes from here".  That's obviously a crazy semantics for a
> relocation, but nothing inherently prevents you from that.  (Of course,
> making sure that there actually _is_ something 45 bytes from the relocs
> place is a problem :) )  The "size" of such relocation wouldn't be
> well-defined anymore (or be 46), but what I'm saying is, that this is
> okayish.
>
> What does grow or shrink is the section content, and hence distance
> between labels might change during relaxation, which requires delaying
> resolving jumps until relaxation time as well.  This can get quite slow at
> link time (riscv is plagued by this).  Just to make you aware :)
Yeah, thank you, my word choice was a bit confused. The speed penalty is 
something
we are probably not gonna worry about for the moment, but we will keep 
it in mind.
> One remark: you _really_ should think long and hard about your immediate
> size in the base ISA.  5 bits is terribly small.  Maybe you can snatch
> away some bits here and there in your 16bit insns to make this 8 bits
> (something that divides 32 would be ideal), but even 6 would bring the
> full-32-bit sequence from 7 to 6 instructions.
This is something we had discussed a few times and came to the conclusion
that we prefer the current encoding. We wanted 16bit opcodes and 
byte-aligned
sections and from there the choices do get quite limited. We also wanted
a simple encoding, so we didn't want to have too many complex tricks.
>
> Ciao,
> Michael.
Thank you for taking your time :-)