From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05olkn2034.outbound.protection.outlook.com [40.92.89.34]) by sourceware.org (Postfix) with ESMTPS id 0527C3858D20 for ; Tue, 8 Aug 2023 17:26:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0527C3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=hotmail.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=hotmail.de ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=i9/IGpVVqOg98HPIWYgUoLk07eHmG0xzZMPPezpJby+Hs307nPs9e6XMXBYJvNcsb81u+Wof8+pO04hAh5XaSkDmfsSOjnbjNWXenwAT7qfz3B0uH4f9/jb0TsaSU4zAsvEznWz2aIEsttsqJjPot+62gSjB5/zpJl0y/Wi3QflZjLGox7PjnaxuNHIl4zMxFt5SGG8KKrtH3PFZVsccZb/usUDLeCstb1LfXoNzFhDduwGlxXOKhydGd3IchVLH0rBF7A8jdAbEIoNqU4GV0ztqrIEEvcgiBF4qcLYJkv+soVa5H68OIQ5ABgsXmT3WvgJco53YfwxnZ3wr0Xx8HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TMgKnARTjG6ZNvvfqWHjq0Dp8z/fYOnWLJaVaKEN1RQ=; b=aaGU9sKQlzPM1NHFjB20ER5hWp5qrqKchyAT1Cqbu8DWn1pL3TWbo16Hl8ZmC8K51zpFhvG8Z84IfuHHIYJeX+xKL78GjEFUIPK0bJQhL+KBfQq4sHzP/p/lkZaG34azJz3KR7j9b9xTLlHGB+ylAWgl6j3oTZYSc3++mP5RV0i4PBZwTzQePRi7l7pbiwBUwyoOGsrnzcoX/aM+rcpWc7YGht56mXoddHeeeLq2R3S3Zh4FH5NuahewkUqBCtorsfdPT6Ob6pL2olajtja7scAuy3vUN6fMXOYhKnUTl3iSa8/GuffOybO2XEEhAQEZNTCWtfIsw+bN4CuWc5j4QA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none Received: from DU0PR03MB9729.eurprd03.prod.outlook.com (2603:10a6:10:44f::14) by AS2PR03MB9877.eurprd03.prod.outlook.com (2603:10a6:20b:546::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.27; Tue, 8 Aug 2023 17:26:44 +0000 Received: from DU0PR03MB9729.eurprd03.prod.outlook.com ([fe80::db56:e4e2:dff0:1]) by DU0PR03MB9729.eurprd03.prod.outlook.com ([fe80::db56:e4e2:dff0:1%5]) with mapi id 15.20.6652.026; Tue, 8 Aug 2023 17:26:44 +0000 Message-ID: Date: Tue, 8 Aug 2023 19:26:42 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: Problems with relocations for a custom ISA Content-Language: de-DE, en-US To: Michael Matz References: From: MegaIng Cc: binutils@sourceware.org In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TMN: [0gGiIw+H6Y7wX4vtdW/Uo8YgKMAKk9HU] X-ClientProxiedBy: FR2P281CA0101.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:9c::7) To DU0PR03MB9729.eurprd03.prod.outlook.com (2603:10a6:10:44f::14) X-Microsoft-Original-Message-ID: <63653cc4-2c25-841f-449e-bce56b4947f6@hotmail.de> MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU0PR03MB9729:EE_|AS2PR03MB9877:EE_ X-MS-Office365-Filtering-Correlation-Id: 16e0396e-854c-4daa-b98d-08db9834a2f1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5PLFan9p55fH2YsuT+dTsDgU/u8MtrEPJEUOGp3VYVyFufTrVp1OSKUZQFOlVVuXpvnI2KlX4srU2Qb+0kfreFgO5wf7Xxm/kyaWxqt9lgaQzEXBVCwfpqrYcFwZ/JN7jTHFNfSsLTwAfmXVMPhgvyzBSHPwWGV9797iTC3zGCd1/4hdftryq21wCQbVsgxL7hgtiHM37BhEq0NKD4lr0F7L/+AeajPEKloqfiSaD9Oly14CGQkwV5uJj4S6cf70XDVPWd8AWTJJE1knrMHJNk/6Aybh877UXYGq97Abis/3ZIE9GwsTgj3qIxq3saG6PJqcOj0NfFGjktOwYyO0ycnttScZJVz8weJPaept/22mfMA6bma4r37USdYhx+5csU2hNgynnGuE4w9bWQta+8SE38x3qXG+mpQaRSG0lZkgab99CQsLi5zA2Aq7r0EEAkB7g1OAAOtwGpLhfrTAYPdrZRkK/8D1U75z8rbJL1lXBfqYosIUOW9PZKe2L5ZhkJgiazrs031olekHUUmHk6Xao3gagOV7k9sz1h3oMqcTIpQpN+C3p3r8s29e/yCl X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?OVorSDl6UG5PdFh0ZmpZeThpZnRzalBwWlZ2YkdhWW1aY2tnU0V3UnRaZ0Yz?= =?utf-8?B?UldRUkRUYUZIS09QdFhWVk9xbW5xaDB4ZTRETk5qbVZ0Y0pySzFocjhDL05m?= =?utf-8?B?UnhiNzlpOCtKdngydzk0WWpxZHo4WTJEM2hmanJhUFEzRFZ6WW1ydU5oOFBl?= =?utf-8?B?cVl0Qm14MWUrMTBOUUo5Y1JGQms3a2JKMW8zekh5SHM2ZkFzTFQ1WXM1Z1FJ?= =?utf-8?B?cnZlSmhZME83Vk1rbVBuWmNHS0VPTFU3U2d4dXVMV2E2U2FRaHZOL0JrOStq?= =?utf-8?B?Z0ZCc0FiQWIrTHlueEVyUGlsSGFzSytPTW9FN0ZzaE1SSjlZcHBLSXc4VWlX?= =?utf-8?B?aWMzOVVBQUFRMk1mT3YvRlY0OTlFUSt1VWxmczBrTk5vNVJjVlhCLzlGMThz?= =?utf-8?B?cVJEMTFFL0U1TG9PYm54STR6cTAzdTVJVHZxMWp0NzVDMlBIbG5mRmswRThl?= =?utf-8?B?TGdYMzQzd3c3alJQZWd4ZmFJTlFqbE53LzE1cFE4aitWbnRYUTl6TWJieXJK?= =?utf-8?B?ZDBiOEQ2UFBVZmhEQ0tBVVg3SGNwVk5lUEkyeW5FNVpMeVR1STlFbytEbmNZ?= =?utf-8?B?VEFQbm9FbE5GUTQrY216d0taVG1oZXZndUV3V2dxc2VZS2VCK3lBVjJBSjNz?= =?utf-8?B?aXN2eXJXWE5nOXVNUGV4RWI1ZStNNjBDWnZ0YlpmUy9LOGZ6K2lLMjY5ZzFu?= =?utf-8?B?ZVM5SDF2TFJURUp0eW9VYlN1WmZHbEpQTTIvaHBSaFFBUWtDZG5pOTgxRzZl?= =?utf-8?B?cUl2MStiM21Ra3R0cWFWL1M2VG0vU0F2Q1l5WTlUWlZ6czliSmlhMDluejR0?= =?utf-8?B?OFVVa0dJcjVaTWZpdzNXdG8yOCtrbGY3OGs5b2VHOUFHbFREZHl4QkhKcGFW?= =?utf-8?B?bEM0S0lFQUFwWFNJRXB2ejE5R2EvRVJtbVBTNFRkNXJYejBGaHVNMDhkb3Vj?= =?utf-8?B?WXZ6cEEyRTc1em1JNWh6WVFRYTUxOUthWWZuWXF5c3FnWSt2YjllSkdMb0Uw?= =?utf-8?B?U29Nb0x5L0o3anNFb2xuaVQwZTA5TlJtc2lkaGR1Z1pPT0VLcG4vek42REF1?= =?utf-8?B?bTBSQlZZS2Q3N2JkakxnWFlBbmk3NXgweXc0ZFVWUjlBdVdhUnQxVE9EYWxW?= =?utf-8?B?SEttbnRxMVlvNUZmd1cySld6RjFlL2lBNVJtMFl3NDF0ZXduakY1VS9iZDlP?= =?utf-8?B?UEZsOWRlUjZzNzY1VVB6aUp2ZGppK1ZMcWgyNjZOc1BZczZRU1JuODNTT2oy?= =?utf-8?B?Mi9mM1BadTY5a0FLanNxUS80MnZyS3pLR1NpRUZpZG5sSHViWTd1YzFad3J0?= =?utf-8?B?WFNNb2VDbnhVdHdQWmFSNzh5MnFVbThlaUVlVWxkcHpmZklBRm55b3lrRHNn?= =?utf-8?B?VjVLaTM2UmliZXFVZ255bmNJV1VLK1k0QldudmdGeEFSb1lLUnJwSFE2a21o?= =?utf-8?B?RjhpdkRqcHBVODJXdVpRWld5RUx0M0lXZlJ1V3dIVHJjOFFBTCt3aUlmU3d3?= =?utf-8?B?MzRUWXpTSG4rK3JEdzkrV1VQK242eGpzc0l4T2hRMHFjcDVCTjJLdG9BODY0?= =?utf-8?B?YjUzblpqZXpxL1h5R2JtUk02Wm52WmVxQ2NOTzZLaFRiYmZKdTJzNjhIdHJB?= =?utf-8?B?Y1FHUHdFeGVXV3VZbzcydTYzalpxQmc9PQ==?= X-OriginatorOrg: sct-15-20-4755-11-msonline-outlook-76d7b.templateTenant X-MS-Exchange-CrossTenant-Network-Message-Id: 16e0396e-854c-4daa-b98d-08db9834a2f1 X-MS-Exchange-CrossTenant-AuthSource: DU0PR03MB9729.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Aug 2023 17:26:44.5478 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR03MB9877 X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,FORGED_MUA_MOZILLA,FREEMAIL_FROM,KAM_DMARC_STATUS,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Am 2023-08-08 um 17:35 schrieb Michael Matz: > Hello, > > On Tue, 8 Aug 2023, MegaIng wrote: > >>>> Most of the basics I already managed to implement, i.e. I can generate >>>> simple >>>> workable ELF files. However, I am running into problems with relocations >>>> for >>>> "load immediate" instructions. Without extensions, we want to potentially >>>> emit >>>> long chains of instruction (3 to 8 instructions is realistic), but with >>>> proper >>>> extensions in can get down to only 1 instruction of 3 or 4 bytes. I am >>>> unsure >>>> how to best represent such variable length relocations in BFD and ELF. >>> The normal way would be to not do that. It seems the assembler will >>> already see either a long chain of small insns, or a single large insn, >>> right? >> Our idea was that the user can use a simple pseudo instruction to >> represent the entire process of loading a symbol (or any immediate for >> that matter). > Pseudo instruction makes sense. But then it would still be the assembler > that expands it to either a couple base insns or a single extended insn. > The linker would see only one or the other, and hence also only the base > or the extended relocs. > > Or did you really want to reserve some specific byte encoding for this > pseudo instruction to transfer it from assembler via object file to linker > and let only the linker replace that by one or the other variant? That > seems an unnecessarily complicated scheme. It depends on if the assembler > does or doesn't know if it can target the extended insns, or only the base > ones. I would definitely suggest that the assembler at latest should know > this. It wasn't our idea to have a specific bit pattern reserved for that, that would be quite weird, I agree :-) I think the linker needs knowlegde about which extensions are available, for that we would use an attributes section similar to what RISC-V seems to use. (although, maybe we don't need it if we have many relocation types) >>> (obviously details will differ, your 16bit insns won't be able to quite >>> set all 16 bits :) ). >>> If you really want to optimize these sequences also at link time (but >>> why?) then all of this becomes more complicated, but remains essentially >>> the same. The secret will then be in linking from one of the small relocs >>> (say, the high16 one) to the other, for the linker to easily recognize the >>> whole insn pair and appropriately do something about those byte sequences. >>> In that scheme you need to differ between relocations applied to relaxable >>> code and relocation applied to random non-relaxable data. E.g. you >>> probably need two variants of the RELOC_LOW16 relocation. >> Not sure if you took a look at our instruction set: The way you would load an >> arbitrary 16bit word is via a sequence of `slo` (shift left 5 and or) >> instructions which use a 5bit immediate (the largest we have in base). So >> breaking it up into two RELOC_LOW_16 or similar wouldn't quite work. > Sure, as I said above: "obviously details will differ". > >> It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 RELOC_BITS_10_15 >> or something like that. And you couldn't exactly remove one of those >> without changing the others. > Yes, this is the usual way to express that. There are many architectures > which have similar ISA restrictions and they all do it essentially the > same way: "select X bits from value, put them into Y bits of field", for > potentially many combinations of (not necessarily consecutive) X and Y. > >> But ofcourse, we don't always need all 4 >> instructions, sometimes we can get away with only two or three, for >> example if it's only an 8bit value, we only need 2 instructions. We >> would like to optimize these cases somewhere. > I see. Yeah, that will ultimately need some linker relaxation as only > that one will know for sure which values symbols have, and hence if they > do or do not fit certain constraints. > >> After a bit more >> discussion we came to the idea of having many relocations that >> potentially cover multiple instructions so that the entire >> load-immediate sequence can be covered by one relocation, > As you have only such a short immediate field in the base ISA this seems > like a sensible idea, as otherwise, as you say, you need 7 relocations > (and insns) for a full 32bit load. > >> but this is quite a large amount of relocations. > Hmm? I don't understand this remark. If you cover a range of > instructions by one relocation you necessarily need fewer relocs than if > you use one reloc per insn? I was considering a large amount of relocation types as a drawback, but I now realize that this can't be avoided no matter which path we chose. We are now going to have the large multi-instruction relocations that can be relaxed one instruction at a time instead of the bit-selection relocations. >>> I wouldn't go that way if I were you: it seems the assembler/compiler >>> needs to know if targeting the extended ISA or not anyway, so generating >>> the right instructions and relocations from the start in the assembler >>> seems the right choice, and then doesn't need any relax complications at >>> link time. >> As long as the range (or even the exact value) of the symbol is known at >> assembly time, this is ofcourse true, but what about situations where nothing >> about the range of the value is known? > The compiler/assembler would always emit the full sequence (e.g. assumes > that the symbol in question happens to be full 32bit). If you want to > optimize this use in case the symbol happens to need fewer bits, then yes, > you do need linker relaxation. As said, you then need a way in the linker > to recognize an insn sequence that "belongs" together, so that you can > appropriately optimize this, either by referring from one to the next > reloc in such a chain, or by simply assuming that such sequences are > always done in a certain order (i.e. a simple pattern match; unrecognized > patterns would remain unrelaxed/unoptimized). > > The basic form of relocations doesn't depend on that, though. You still > need to differ between the lowest N bits of the requested value, the next > N bits, the next N bits, and so on, so you do need roundup(32/N) reloc > types either way. > > By restricting certain insn sequences and flexibility you can get away > with fewer relocations than this. E.g. with your idea of covering > multiple insns with one reloc. Say, if you require that the low 10 bits > of a value are always set in this way (and given your ISA that makes > sense): > > shiftset5 %r1, bit04(sym) > shiftset5 %r1, bit59(sym) > > and never with another insn in between, and never in a difference order, > then of course you can get away with a relocation (say) RELOC_SHIFTSET10, > that takes the low 10 bits of 'sym' and appropriate distributes those 10 > bits into the right 5 bit field of the instruction. It would implicitely > cover both instructions, i.e. a 32bit place in the code section. > > If you extend this idea to cover seven instructions of the base ISA you > can get away with a single reloc that is able to set the whole 32bit of a > value (at the expense of not being able to place unrelated instructions > between those seven). My primary interested is to support to load-immediate pseudo opcode, so I am not going to worry about stuff users could manually write. I don't think there could ever be a benefit to put instruction in the middle of that, so I am not gonna worry about that. Although, we might have to split into multiple relocations since bfd set's an upper limit on the amount of bytes a relocation can cover by using a 4-wide bitfield for that. >> It seems like other assembler targets truncate the values in those >> cases? If we went for the minimal representation we would basically >> limit external symbols to 5bit, which isn't exactly ideal. And from what >> I can tell, growing a relocation also isn't really something bfd is >> designed to deal with, right? > I'm not super fluent in the actual implementation of bfd linker > relaxation. But I don't see why it can't also grow sections. It's true > that the usual relaxation shrinks sizes, and it's probably better to > follow that as well, but in principle enlarging is no proble either (if > you enlarge _and_ shrink in your relaxation you can run into > endless oscillation between the two, so that needs to be watched for). > > But one thing about terminology: relocations themself don't grow or > shrink. A relocation in principle applies to a certain address without > range. The semantics of a specific relocation type will usually say that > these-and-those bits in a field will be changed by it, and you can say > that that's the size of a relocation. But not all relocations are like > that, and nothing really prevents you from either changing the relocation > type when you want something else (in linker relaxation), or even defining > a funny type that applies to either (say) a byte or a word, as needed. > You need to implement special functions for such relocs then, and can't > use the generic simple BFD reloc howto model, but still. > > Just to expand on this: in principle one could invent a relocation type > that says "when the symbol has value '1' change the byte 45 bytes > from here to 42, when it has another value then encode that one into the > word 7 bytes from here". That's obviously a crazy semantics for a > relocation, but nothing inherently prevents you from that. (Of course, > making sure that there actually _is_ something 45 bytes from the relocs > place is a problem :) ) The "size" of such relocation wouldn't be > well-defined anymore (or be 46), but what I'm saying is, that this is > okayish. > > What does grow or shrink is the section content, and hence distance > between labels might change during relaxation, which requires delaying > resolving jumps until relaxation time as well. This can get quite slow at > link time (riscv is plagued by this). Just to make you aware :) Yeah, thank you, my word choice was a bit confused. The speed penalty is something we are probably not gonna worry about for the moment, but we will keep it in mind. > One remark: you _really_ should think long and hard about your immediate > size in the base ISA. 5 bits is terribly small. Maybe you can snatch > away some bits here and there in your 16bit insns to make this 8 bits > (something that divides 32 would be ideal), but even 6 would bring the > full-32-bit sequence from 7 to 6 instructions. This is something we had discussed a few times and came to the conclusion that we prefer the current encoding. We wanted 16bit opcodes and byte-aligned sections and from there the choices do get quite limited. We also wanted a simple encoding, so we didn't want to have too many complex tricks. > > Ciao, > Michael. Thank you for taking your time :-)