From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from de-smtp-delivery-102.mimecast.com (de-smtp-delivery-102.mimecast.com [194.104.109.102]) by sourceware.org (Postfix) with ESMTPS id E245A385734F for ; Wed, 4 May 2022 07:58:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E245A385734F Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04lp2055.outbound.protection.outlook.com [104.47.13.55]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id de-mta-13-lHuQnKmCNJyQm_PTEtLjmQ-1; Wed, 04 May 2022 09:58:05 +0200 X-MC-Unique: lHuQnKmCNJyQm_PTEtLjmQ-1 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bKNh+3NW5gUqG6fW7VKCBrYxR11w8IaH4eqUuirxBpsqzsSc1iiT0r+i2K/z3jnCR3mDnlkN7LPFZNcLl+bjo6EP6ksCrZnl4Dh1fsJoHOHafMC5SEDd3BGCIIpFXMV0+YOLJg3wt5MCA1d3ekPTQOXdksrP3l6aH6rWU3k5Ea8BpGb5mxrObwotR/+ZT9UfVXuSENMF7d17b8ZyoaekY0QD9b19E9ynE6Y6BTtGtmkTcNs+gVy7W+rx/NRYJ0lxO3OWVo/V+0Ngbdj7CRDDpfUjQHctYT9hwZ5QUS020OyiD5mlVYq58Enu0i6gJhuqTj36ufBYidnrzWntZ+qd9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=W6/rePnfpSVgC36q15ediQbLq1B8yO2CdOU+rZhp9zo=; b=JtvD16MqOaEVI36QoFAsI2qDppSRtEE4Nf4BHVlcmhrtJolOfmplTur/T5N1eW0FqNu9hOlxkWlG3+E6PfFbdXZe3yDr1XzP+o/BZI9Eveb0RquQ5LYVKjH54bx5eu4+BrMPML4ErspKcSfSCKds6FK8ajGBg+lmXUxmHyiSAfXjGghtsrLKlHZAGrzn8u1tEhIYcRIFki1ntIKF1FRqb7Ec/MWn8m7nEBkmCzhWMlhSyqx6xE7kw8jCeKqtFIxU6FZDm9gl3y/8jOAgo7IQQaaniLpEZN9tGx70nl2bMfpJCU4JGDd8jrzprO9GlfSvdeTsZIohsjTCZO+iSbN4BQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none Received: from DU2PR04MB8616.eurprd04.prod.outlook.com (2603:10a6:10:2db::16) by AM0PR04MB5890.eurprd04.prod.outlook.com (2603:10a6:208:130::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5206.25; Wed, 4 May 2022 07:58:03 +0000 Received: from DU2PR04MB8616.eurprd04.prod.outlook.com ([fe80::5cb0:5195:4203:7c2f]) by DU2PR04MB8616.eurprd04.prod.outlook.com ([fe80::5cb0:5195:4203:7c2f%8]) with mapi id 15.20.5206.013; Wed, 4 May 2022 07:58:03 +0000 Message-ID: <559aaea5-bf16-f48e-fc0e-e750a2795b99@suse.com> Date: Wed, 4 May 2022 09:58:01 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Content-Language: en-US To: Andrew Burgess Cc: binutils@sourceware.org, "H.J. Lu" References: <388c1dd1235a3c95aefc7caee5726b869b6894e0.1651239378.git.aburgess@redhat.com> <87zgjyn4k1.fsf@redhat.com> From: Jan Beulich In-Reply-To: <87zgjyn4k1.fsf@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: AM6P192CA0101.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:8d::42) To DU2PR04MB8616.eurprd04.prod.outlook.com (2603:10a6:10:2db::16) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d75a5963-e276-4167-6edf-08da2da3d09c X-MS-TrafficTypeDiagnostic: AM0PR04MB5890:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fe9izu6rMLm9nbAbuQImK63WE69cc4Q7H36mtcDezqQJFcDG6wzKeNcAONRA5kQWHlViMyzN29ooOMwJ3+mSBX73uije5ncEFUl0vLmRkuPkIwWyYVkLpGcT9/uI8Cfw8avBi2v/FD6aeGKDt7X/x4Jbla8fWpxc/e/bv+PjSvW/DLEHw2NlH/KsnLBYRpk8PO2Pplh0+TaL/lQyzvlbm//ST0ooeHFmL7EiaArU6hm9QQ+8LMmgXXAXolJ2fMLFXNHmgp5yOTJZJfw0/y4p0prZqI5W2aRtn1iipZiChPkRgW6KiIbhgwmMHc9LRHlOc7/DHqr005jkpWfVRtnXkHXQNhF53nm6NoNVUrMw16zJ8Z8WFrwM3+08/lFvaQu856eRWMaPAwvU6PNiR4gfYUTthyqjlfuzxxen4lc7c4qSeYX9x+RMccUbF7c3WfyXJYIR/XmcVw5wH8q12F6NYOXc8vRc8iq6tsgZLfV7CelXexdin5fmANcT0EYthxhdhGM2ilXFIuYH8FGp0G4MXqWBVcyfjEYbD2ybcVRbgVEwLj3IN1ylSqadltOgJ/VglbdD9MIHHw64W9oe2LzCdLJ605Hnzfv8QDdFngxAaAFtt0QwZ9Uxv4rfx/lSlHsN3BrV08HaM0F1b6n7nXmyH9y3IyKP+sQCizxXzzfPPlKBXwxHO1yKPx62oFuH7N3ZasW/AQ+PzSgGwPVvXhks4jidyjZaoHdCHZyBcg8Q4p4= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DU2PR04MB8616.eurprd04.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(366004)(316002)(6916009)(186003)(2906002)(2616005)(38100700002)(36756003)(6486002)(508600001)(6512007)(26005)(6506007)(53546011)(66946007)(5660300002)(66556008)(4326008)(66476007)(86362001)(8676002)(31696002)(31686004)(83380400001)(8936002)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?YzlkTHIzci9EYWc5YVNKMERJdk1tM09scnVYaHBSQUkxaTlXeWJLWGpiZ1lX?= =?utf-8?B?ZDZha21hWHoxTEVBZTkyRTQvSnhGaUdEVlVWdFZOcjQ5L3hvdWJ5V3lyR015?= =?utf-8?B?R2RQRFByTzlQRmMvemZxQU5CZWJmQkRCNjZiL2pFOVB1SEh5SklPUTZJVEZ0?= =?utf-8?B?cVJ1bGRrd0dFMVhtNUU5M1dFL3hLanowUlJtRHBaWE5memRwM0UzekEvWG83?= =?utf-8?B?ZTBGc1lTSFI2UkExTXRkVUVBZ3pBTmtzZ1B4a0crSzJBWXkwcFVFS0JYOEdr?= =?utf-8?B?T0EzWE5mOFBLaDRaUmEvYVV1VktESzFmUWpDRlhNOS84YXR5VHVnd2hJNlBD?= =?utf-8?B?NzdQRWlxTDV0U0NqWXU1Y0lQNXpGbWRsbWRJN0UxZ1hNTCtCdndRVmo4ZGRh?= =?utf-8?B?R2RuZmo0djlUZy90YWF6VkRZR1IxWWNDNWtRRjdOOHBMTDNhcXM2cUUyUXB1?= =?utf-8?B?Sld1dzEyK2F6RzdQUU15VnhrN0UxaHhPL1JGU1g1clNyTFRJZ292RnpLRmlQ?= =?utf-8?B?RmJnVzRLMUxLeis4Vjc3N084QTZQdGRLZ3MrN2dOUGNZaFl4eDduVlpiRnFv?= =?utf-8?B?Rk5tMFhmR1g0SVEvRGNGZGFLaldCUDcyeC9qdjlxSW9VU3hiMnJpYW14RWht?= =?utf-8?B?d1Q1czNMSHdOWWlxQWZscW50OXFIM3lxT2FIS2F4c0JYaTkzUGNwSEVBbXU5?= =?utf-8?B?ay9wekRzSTNwTEwydVhXenVJUmZsbHZPclZhd3R6T3J2NEhncCt0bHJodFkw?= =?utf-8?B?Ukh6MEJ1Y1lGb0VoOERqZHc0VGNxai9FZG9JcUV2U1JDTkZkYTAyay9TVUFX?= =?utf-8?B?amo3blV0R1NlYlFkK1NnM2IvM25Id2lpdjhuQm1FWFNDWkdUYWorQVJLSjg5?= =?utf-8?B?c1VVME9PeU1pd2tiZE9tei9ISkt2eXBJWVhpYW5TM01icDJZNGU0b2NIM2FX?= =?utf-8?B?UmZnVTFkTW9TVjFBNUVkZlBWVnk1L3Z5OEZMamRndGgrRmNMNWRDbkZ1WEJS?= =?utf-8?B?N044Nk41OWlqWlF0SFlXU04wbjNZODN0ejBaVzU0V3R5cytMMS9lZ053WUpo?= =?utf-8?B?WU5YNkFKenk0cHZJN1V1SkFlKzBsaGpoSWJTUG1IYUplaWFvUWlSNUVTR0dm?= =?utf-8?B?MTNHV1gySHJRbDRsMXNUK0taR1FFR1Y3R29yMUtKNE4zOU1zcEdQb0p3UGdR?= =?utf-8?B?ejhKNlNiMk55YjlRSlhwT3I0Y0ZRclY0K0pwb1I5RzJCVzhwQnp1QW5hV0ht?= =?utf-8?B?YStsWlNURjIwWGVSTDdFR1VwTDBhcVJjSUhwaW1RUE01T3JHbWxycW95RDFJ?= =?utf-8?B?VVVIMjV6UVFKN2FycmNZU1RBZG5KN0YzOTBvVlRpR0w5L2dBL05XYTVhbVRQ?= =?utf-8?B?VkEwYURkRENDbTNHZnR4WExkdzJsUE9vZ0U4OG1tU09pMXZrN0RmQTBBZFRa?= =?utf-8?B?WVBqSzBZSFU0SUNyNTlXYVhhRnp2MlovTmZzdG5sQzBrKzkyb08vb3k1N2Zp?= =?utf-8?B?YkppRkJ2bkRSUzB4RGRHOXF1cFk4NnVHNUowdjIxNkxPamFmOE9jUm9oZTVi?= =?utf-8?B?eGxIelNKRmhZdU0yQlFKUG5saWdvUkpkRUgwQm9HWi9BZ1NqTTZLYU1yRlBL?= =?utf-8?B?ckd1WHRQODVsbk9OU3RnQUZTdi9QQnF1eU1weG1SeWJ1bm5UT3B3OThGT296?= =?utf-8?B?ZXpVMUVmbkU5VldFdnZNb3MvMnp0OW9HVjhldkwwSkZmd0pmWVl0MkJtRXFE?= =?utf-8?B?MFI5M04xcGJvNmcyeEZONXcwMGFJVUhTSm1jZHdQNDhGemtGWWdVeSswdVM1?= =?utf-8?B?NjZ4bHo2MUJ6MXFoOVAzbFpRVUdrKzlmaDNuYXc4Z0MyRElSMHh2OG01c2kw?= =?utf-8?B?eEEyTDdVT2tiYzJQc1ZYTHVFWkJlamZMUHY5Y3QrVjJCdWtaaldJMW1kd0Jr?= =?utf-8?B?WjRJaWZSaWlVVGJrYXVIL0JNeDZqSDNTSVBuMDZPRWxhR3lEdkFPbGxvVFBL?= =?utf-8?B?QVhiMHRJQ0dLNzRBQ2pyNXhwY2F6Yk1EK09NZXpIYkEwb0RVQ29yVStvbTRF?= =?utf-8?B?c2ZYRHFrWnUzQ2dGRXNxdWpDTTZQQW5BeVVKY2VuME40YlpnTFNYWkhkRGZ4?= =?utf-8?B?RFFHOXYxb2RJUFh3MEl4dVZJWTkvcEZ3enlBSG44Ryt0L2tIdlM4ZjN4NG9I?= =?utf-8?B?Q1h5bC9xTCtRWEdyd09BeEcrdm5IK2Q2bmdaVE5kVUtPVkVoS29vU2FCV2VO?= =?utf-8?B?MXNCV09FeFpZOTJBUWNsMzRNc283MlgyL0NEbDU5djd0VVVWb3pwOEZPOTc0?= =?utf-8?B?TGx2QUpwSEw1b0F0Z0Q3cllxOTBUSzFYM1R3YnZUc1J0VGgzNTIyZz09?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: d75a5963-e276-4167-6edf-08da2da3d09c X-MS-Exchange-CrossTenant-AuthSource: DU2PR04MB8616.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 May 2022 07:58:03.0897 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: OeCWNFnPK1hZlmAmqIphzH/poRep0BUh2/vzvWxQAKHxjRso7GdNMlwyJsQppHr6yQ1NSpPGaDnX+4K+H9fi8w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR04MB5890 X-Spam-Status: No, score=-3032.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2022 07:58:11 -0000 On 03.05.2022 15:12, Andrew Burgess wrote: > Jan Beulich via Binutils writes: > >> On 29.04.2022 15:42, Andrew Burgess via Binutils wrote: >>> The i386 disassembler is pretty complex. Most disassembly is done >>> indirectly; operands are built into buffers within a struct instr_info >>> instance, before finally being printed later in the disassembly >>> process. >>> >>> Sometimes the operand buffers are built in a different order to the >>> order in which they will eventually be printed. >>> >>> Each operand can contain multiple components, e.g. multiple registers, >>> immediates, other textual elements (commas, brackets, etc). >>> >>> When looking for how to apply styling I guess the ideal solution would >>> be to move away from the operands being a single string that is built >>> up, and instead have each operand be a list of "parts", where each >>> part is some text and a style. Then, when we eventually print the >>> operand we would loop over the parts and print each part with the >>> correct style. >>> >>> But it feels like a huge amount of work to move from where we are >>> now to that potentially ideal solution. Plus, the above solution >>> would be pretty complex. >>> >>> So, instead I propose a .... different solution here, one that works >>> with the existing infrastructure. >>> >>> As each operand is built up, piece be piece, we pass through style >>> information. This style information is then encoded into the operand >>> buffer (see below for details). After this the code can continue to >>> operate as it does right now in order to manage the set of operand >>> buffers. >>> >>> Then, as each operand is printed we can split the operand buffer into >>> chunks at the style marker boundaries, with each chunk being printed >>> in the correct style. >>> >>> For encoding the style information I use the format "~%x~". As far as >>> I can tell the '~' is not otherwise used in the i386 disassembler, so >>> this should serve as a unique marker. To speed up writing and then >>> reading the style markers, I take advantage of the fact that there are >>> less than 16 styles so I know the '%x' will only ever be a single hex >>> character. >> >> Like H.J. I'd like to ask that you avoid ~ here (I actually have plans >> to use it to make at least some 64-bit constants better recognizable); >> I'm not sure about using non-ASCII though, as that may cause issues with >> compilers treating non-ASCII wrong. I'd soften this to non-alnum, non- >> operator characters (perhaps more generally non-printable). Otoh I guess >> about _any_ character could be used in symbol names, so I'm not >> convinced such an escaping model can be generally conflict free. > > Hi Jan, > > I've addressed all the simple feedback from H.J. and Vladimir, and I > just need to figure out something for the escaping mechanism. > > I'm still keen to try and go with an escaping based solution, my > reasoning is that I think that this is the solution least likely to > introduce latent disassembler bugs. > > However, that position is based on my belief that there's no exhaustive > test for the i386 based disassembler, i.e. one that tests every single > valid instruction disassembles correctly. If there was such a test then > I might be more tempted to try something more radical... > > That said, if I was going to stick with an escaping scheme, then I have > some ideas for moving forward. > > The current scheme relies on the fact that symbols are not printed > directly from the i386 disassembler, instead the i386 disassembler calls > back into the driver application (objdump, gdb) to print the symbol. As > a result, symbols don't go through the instr_info::obuf buffer. This > means that we never try to interpret a symbol name for escape > characters. Hmm, indeed. I have to admit that I view it as a significant shortcoming of the disassembler that it doesn't resolve addresses in the output. So I'd like to at least not see the road being closed towards improving this. > This means we avoid one of the issues that you raised, what if the > escape character appears in a symbol name; the answer is, I just don't > need to worry about this! > > So, I only need to ensure that the escape character is: > > (a) not a character that the disassembler currently tries to directly > print itself, and > > (b) not something that will ever be printed as part of an immediate. Or, more generally, as part of any kind of operand. > Clearly my choice passes both right now, but looks like it will not pass > (b) forever. > > One possible solution would be to replace all the remaining places where > we directly write to instr_info::obuf with calls to oappend_char. I guess this might be troublesome. The way the disassembler works is a little quirky here and there, and hence one needs to play tricks every now and then to half-way reasonably deal with certain special cases. > I > could then extend the oappend API such that we do "real" escaping, that > is (assuming the continued use of '~' for now): '~X' would indicate a > style marker, with X being the style number, and '~~' would indicate a > literal '~' character. In this was we really wouldn't care which > character we used (though we'd probably pick one that didn't crop up too > ofter just for ease of parsing the buffers). > > An alternative solution would be to pick a non-printable character, > e.g. \001, and use this as the escape character in place of the current > '~'. This seems to pass the (a) and (b) tests above, and if such a > character does ever appear in a symbol name, then, as I've said above, I > don't believe this would cause us any problems. I suppose \001 (or a character very close to this, as iirc \001 has some meaning internally in gas, and I'm not entirely certain none of these uses can ever "escape" gas) is good to start with. Provided it is properly abstracted so it can, if necessary, be _very_ easily changed (by modifying exactly one line, or - if you need both a single-quoted and a double-quoted instance - two adjacent ones). Albeit, thinking of this last aspect, maybe it would be better to only have a double-quoted instance in the first place, and allow for the escape to be more than a single character if need be ... And yes - if a symbol name was possible to hit and if that symbol name contained such an escape sequence, aiui the worst that would happen is bogus coloring? IOW the escape would not be looked for and replaced / processed when coloring is disabled? Jan