From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-VI1-obe.outbound.protection.outlook.com (mail-vi1eur02on2056.outbound.protection.outlook.com [40.107.241.56]) by sourceware.org (Postfix) with ESMTPS id 4FB993858D39 for ; Thu, 2 Nov 2023 11:39:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4FB993858D39 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4FB993858D39 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.241.56 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1698925189; cv=pass; b=krpTRxInIw2Sm7fBWG3uVUp/b2FfahNYFHDMGv18iyIPP2LdWNt9fz+rR4SPV3M/8baexTZTMBPn2/LH5ANrfY8vSp1ceJ+2kubnnwhdMJyWcByBVuA/n+3xrsgRXxmbBDOGINH7Y5rYfizcoR4e4N/OnzfviFOseuiEXWsG5nY= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1698925189; c=relaxed/simple; bh=cX5lF33IMOJM84Jlz4Q2WGjImpzojQ9Pnwf306OEI8c=; h=DKIM-Signature:Message-ID:Date:Subject:To:From:MIME-Version; b=kDjbIa4NpGWSIsS3CDA3THcD8wiELzpLJmve7XfF2MCkc5lm2DiYNKIpOsS0u0+bFGCFKFESdPPjty7lDiKrVOFgM2wDHxEzDextFgrkR8Ys2ufVo9wAoZ0nBZDBnS4HlWxrBw3rcrFGK+bg9ps2eA7gMnWAuVeuo60dpJUECV4= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ERzEtFAr7wH/ocuhMYKFT8pRBPZcV/Xmh18665aQPiuBZKSEVsQqzq3Vnqtx7sNUofVIxmHDbRbYlp/GHHDDwa2ocC5bS9JTt8axiw/6KAqFu9GwVhR67zoppgCOVUrn7dmsnwFo2gR2xIY5tNhviS0XitKT2C1NA1wc5PCE7UowSrP0zHmLqWuY4Z6jVqQK8rHNVjKX7uYRMgLx6P8oAwJn2mpFLEv8SJGrcGm95NSEGeTpImXfABn2iw8r4CBfKmFLHqJOz/KabNeie44VyFCPPXehk08c0ar9bWsoz30ExyQmYDjQbWJYvnoudBeMeXM+mV/c6poUxQ9b97XbAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7XjLGzYwfiANazqOHdt5OoBSrn+haYmJSEOPxffUWFY=; b=N4ZLVKs3C4VEcFsEe2P/q3E+5c/mX5cXj91IYwm/JNZHvGpSaAH5i4bECuHiKEMiDvkEgTkQ6fUuSob0xtwoq72LWl7ZYgZvbfGhTKfU3GfSsjKTOBlhHC9F4w5AjqwKL1203RUGGGGes4aaSWQpDlL8W0UH+jtW5oT0VaDyYoYdFpmtpwPvMoCFZ0B3dZvV/HdPhG9RIs/gqjFgUJDFGP9f+zq8YfX3PB7dAjVhA+QqBhYsdIGn6OR6Yuy9zHPL0dIKnvJebLFc/OTu6uD6yKcbVHOzjBBqpXTBIysAe4732a8fA0LEGaq09pW46yx5oll5FfyvNPG0Fu37PuYvSg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7XjLGzYwfiANazqOHdt5OoBSrn+haYmJSEOPxffUWFY=; b=hkXCfq6jgpBdqSfQf8Ekdeay02E3k12Lv0dNDz2enW+S3S93BRij54o3v7tQML8FZ8JNj6Q5YbJMqFtZdbmctNZ+k6AvoLhnHMxvcy75i66rro2ijezfEPlbOS7TdT1g+HeyZvSs9oeLLbweAkmfRqCWrGU+60oOzNQXPxYcrV+nVLckvsnOlaVu1Wd3QLnHj3/yMWDMR0H9L4LK3ufIUgPu6j76RmGdYzEcOGvr2dp+MTGXMny5VEJ8ouNUxrFjrXE6dcGbXjtFSnBlgsN/oen4fb/VmDS2AMcRfW+e6LXsIfRB87smi9i4ES8L+SFOC0Pr4EOukPhvOVtKgdi3Fg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) by DU2PR04MB8789.eurprd04.prod.outlook.com (2603:10a6:10:2e0::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.19; Thu, 2 Nov 2023 11:39:35 +0000 Received: from DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::d924:b650:a2ad:7b25]) by DU2PR04MB8790.eurprd04.prod.outlook.com ([fe80::d924:b650:a2ad:7b25%3]) with mapi id 15.20.6954.019; Thu, 2 Nov 2023 11:39:34 +0000 Message-ID: <060736aa-3a4f-320c-d0f6-37512be2e216@suse.com> Date: Thu, 2 Nov 2023 12:39:32 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH, V2 07/10] gas: synthesize CFI for hand-written asm Content-Language: en-US To: Indu Bhagat Cc: binutils@sourceware.org References: <20231030165137.2570939-1-indu.bhagat@oracle.com> <20231030165137.2570939-8-indu.bhagat@oracle.com> <7a101e5c-fe1d-4b3f-afdb-0ee11cf6a7c9@oracle.com> From: Jan Beulich In-Reply-To: <7a101e5c-fe1d-4b3f-afdb-0ee11cf6a7c9@oracle.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR4P281CA0004.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:c8::16) To DU2PR04MB8790.eurprd04.prod.outlook.com (2603:10a6:10:2e1::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU2PR04MB8790:EE_|DU2PR04MB8789:EE_ X-MS-Office365-Filtering-Correlation-Id: 2f3aa745-aa79-4477-8830-08dbdb986315 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5GgUj9au3DzFJjzTBeGBfmktjO6y22MPjubC+7lwIqktQd0aFNhOovK/GBGTP/84but7/m2FQ2EPDBrpdubP8JrgHnhUhQvhMz9Atj4rtRkJfcuQ2DRTW08rFeoVZAbWy/0jokMckt5fC6EQGiiPlOzMFxDB6IV31whAdSJFOOTsc8V+T7u87CT2Vzzo1/rA6R2wnMg7lUvVpaqx7S4wco5tOIol4mmCT5sgMh7iSFAzluOTjMHGFBrA1IrOs4ge9xGxxYkxDDDtQa5Sj2WbxMtzfajusiRHdaPZgjX/1BqacsOYebcExtSXqTmpDDk+q+bi20xX3MVQbAeEHAMrM48M8uclYiKxZVsFd70CF3hoDfAts9JP//eKneJOaae3p6kzrQWhSeoh21nKecpZ6DxsGtstOCrIBi2IYWSTFK58BKjSIejZDN8PTETUGRSxjQmSsImbPMTnarMgW6gd4w2qOyLrpRyGjz1guJxvi5dKgzRx+JSBQ1GR6/GdAxtpPjjKYmEI2tX7IKJEiKbP9piGk9Em6CPpMeu6ZuI/4DBBA2W/FUhSpwFndc+3/7UYu6x0mc1UeJ2cCCRhLWUWTisMiDsm4/Kp07OZwGni1U/wCvhNs6B5HF85jvDxW25Z1rRvDzoo7AOaefegANZrpA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DU2PR04MB8790.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39860400002)(136003)(346002)(366004)(396003)(376002)(230922051799003)(1800799009)(186009)(451199024)(64100799003)(8936002)(30864003)(5660300002)(31686004)(6486002)(4326008)(8676002)(41300700001)(2906002)(6916009)(66899024)(66476007)(66556008)(66946007)(6512007)(86362001)(31696002)(316002)(478600001)(2616005)(26005)(53546011)(36756003)(6506007)(83380400001)(38100700002)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?OWxOdjJha0dqTnlkeDJVYjdKVE40RkE5Q1FQVk1NZVhoTWJqcjZhS0pNNXpq?= =?utf-8?B?MFhTNmhNTlFSalIvS0RmL2c3QkFDNzZOQk4xUmtSdjFhOVd6Ty9NdkIyRjhY?= =?utf-8?B?dGNEL0hnMHNJWTEyV1hROXhFem1TWTVveGJRUmU2bGNIOXQ3N3NZTllqZkhi?= =?utf-8?B?d1ZtSjB4OHFFbTQwalNSeXFmTVN3TTRHbHQ5dDRIQVBpVmRYOE4yOWdNalVS?= =?utf-8?B?VEovWVBrUUordlhLVGpKN3RyRGRyMUUwYm1TSzN4NTNiN01KNXJWSnFPV2JV?= =?utf-8?B?L2c5SXhEZGpaeVh4TkpWVXcwbGNLNmw4NjZ3MjRiR3V1VWVadWxZekVTQ2JQ?= =?utf-8?B?a1ViVGVpTmdXdmdDK0NGVDU0RmhlM2hlams5MmE1eUIzVFhiR0RlQ3g3YXdj?= =?utf-8?B?M3Y4ZUhvTGpRZDBUcHF6M1VDUVFmRjJWQ241alM4aVJVc2hSTzBxcUhLK3N5?= =?utf-8?B?VkRoL3d3Y0o1bTJtckIyMXJ3dGJYSTJRVUJwNW9hcmp6ZmdOdDJPNnZzYWw1?= =?utf-8?B?KzBpTVF5bnZ5TTJVR3VCVnI3S1ZFTThPRXlOMG4vbTNONHQrbHVMOXE1QlFj?= =?utf-8?B?MVI3aWhhMXlvd05yblhkdlJDUGxkMCtoaVVsTXVDcUFzKzVEUzNJN1R3TXpl?= =?utf-8?B?SjBINE91WEpBblRXWVU2Y3pTM3BIdDdDMEJRbXduanRZU3JkekNVcTZKTnRh?= =?utf-8?B?WjNnTG9sVi9WS1JyTlQ1QVR6N0FPaEI5OUU2UndkY0ozbWtvcDZQem94QnA3?= =?utf-8?B?S2Q5VElXc3BwRUpuaTNaUkFZODdqQW9hWkRnUWV6Ty9FM3JHSVBnNDZoN2JE?= =?utf-8?B?WEQ3NG9HTWdRUUx0K0pNemQ5ZFZ6M0QwSzQyS0s0NkYvdzJqd1FwOW53OFVq?= =?utf-8?B?aks5Z1A3MG1TSkFSOGhHSDRKWG43RFp0bDJsNnhtREZKeVdGbUJaY0FwSWtl?= =?utf-8?B?QVh5bFZaaVV1Zll6MENWR1plcGc4Z25jWW5rU3JlOE90ZGkwaC9yclY3YVYw?= =?utf-8?B?MkthVUR5ZGJUdW9INWFWWGlYZ2ZvSUM0YW5ScFhUQnFxNTJsbklHeFAxalpv?= =?utf-8?B?SVp1bzhKQmw2RVlrV3RCQzhESTV6WWNlU3dpU3c1aGhsOFNRbUR3TytEU09R?= =?utf-8?B?ODh6QzdoQUh1ZE1ORmFWaXJSN0VObEszZG90amtNaVRMdWZaYWMrd1FtQXF3?= =?utf-8?B?K2FJcXJTUGt2ZUxHR2dnVFliaVRoc09sODJvN1psZU02OG9PZmZ6RU5jMVF1?= =?utf-8?B?Z0tDRmhDU0VXWVB5aVFXYklkNGZLTHEycXF0a0pWdmttM2RKcXFUS1RiTGV2?= =?utf-8?B?NHdiS3RUV3pmWjBIdkFtRjBDUlArWXV2VHdsY2lkYlRSWEw1YUxzeWtTcDR0?= =?utf-8?B?dWFFaW1wUlA0bUg1bHBHaGkwd3k2MVZ4TGlCZ3V1UVVkSGJKWHQ3OS9QcW5x?= =?utf-8?B?ZTNybFU0cHMveGw1T1N3czh0TGtXRU00Z01MM0NwdDEzcUE3WVIvWkM1Vmsx?= =?utf-8?B?TVMyS0dCd0dLRVh5cHduTzF1Y2RZbTRMWXR0dXFPV2tQSlJVd1VpMHVWUE42?= =?utf-8?B?Y0R3WkIzQ1JKY3JyRUxxTDVucGxQSWNocURYVmUycFVyUnZMSi8vbXYzaWNX?= =?utf-8?B?QnN4Sk1LT3I0cmZVTURNMzhGTE8wSWRSaEpHOUpzcGJOajVWS3dGc3pvcjlO?= =?utf-8?B?MWxvQXB3SEMxOTFnb0VFNHpGazdDdmdKbFJabGZBMllxekl0Y0lUNjUwdEVk?= =?utf-8?B?dDg5MHJLbTREMFlKMEZGcCtCaHFoQk9qd0svbHJtL3ZGZTBGT0oybWM4UUhu?= =?utf-8?B?UEFab0NvSncrcnZDTDBkSDFjMHpXTmM3cHRjNDd4SGNraUZYL0p1RHBoY3NI?= =?utf-8?B?cWs4T3BESUY1RDZVS3VKTTBKc0hORGp6Y3pYWEZJZ3JvYTZ1YWN4em1mcXhk?= =?utf-8?B?OURHMzIyazFWdE9hUjQ2c21qU0J4eXY5NWNybFZYek1zeFBVdm1MNUdCbjBh?= =?utf-8?B?RHpXWEdTWUlkbkJtSUM3V1l2c3dOYkVnWW81L0t2cTNFdWdYVSs1NldWcmhM?= =?utf-8?B?cC9WNEtsMmdBYzl6VDFqbklQUTJIcDZVM3Rvd0c4WHZISXM0OEQ0cU9IeTN3?= =?utf-8?Q?nDoQTjYZA3bzJMoV8aFyq+aWr?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2f3aa745-aa79-4477-8830-08dbdb986315 X-MS-Exchange-CrossTenant-AuthSource: DU2PR04MB8790.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Nov 2023 11:39:34.8599 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: vaITtkYWl4Q4vZzHxK3HXmeGM0Dz24/RLX038LvPB1vPbKGIJ1yZrW/pcZSdWa3RNRYWu/V06Ub49HSEIXa2rQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU2PR04MB8789 X-Spam-Status: No, score=-3028.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 02.11.2023 09:15, Indu Bhagat wrote: > On 10/31/23 07:10, Jan Beulich wrote: >> On 30.10.2023 17:51, Indu Bhagat wrote: >>> gas/ >>> * Makefile.am: Add new files. >>> * Makefile.in: Regenerated. >>> * as.c (defined): Handle documentation and listing option for >>> ginsns and SCFI. >>> * config/obj-elf.c (obj_elf_size): Invoke ginsn_data_end. >>> (obj_elf_type): Invoke ginsn_data_begin. >>> * config/tc-i386.c (ginsn_new): New functionality to generate >>> ginsns. >>> (x86_scfi_callee_saved_p): New function. >>> (ginsn_dw2_regnum): Likewise. >>> (ginsn_set_where): Likewise. >>> (x86_ginsn_alu): Likewise. >>> (x86_ginsn_move): Likewise. >>> (x86_ginsn_lea): Likewise. >>> (x86_ginsn_jump): Likewise. >>> (x86_ginsn_jump_cond): Likewise. >>> (md_assemble): Invoke ginsn_new. >>> (s_insn): Likewise. >>> (i386_target_format): Add hard error for usage of --scfi with non AMD64 ABIs. >>> * config/tc-i386.h (TARGET_USE_GINSN): New definition. >>> (TARGET_USE_SCFI): Likewise. >>> (SCFI_NUM_REGS): Likewise. >>> (REG_FP): Likewise. >>> (REG_SP): Likewise. >>> (SCFI_INIT_CFA_OFFSET): Likewise. >>> (SCFI_CALLEE_SAVED_REG_P): Likewise. >>> (x86_scfi_callee_saved_p): Likewise. >> >> For this arch-specific code there's a fundamental question of maintenance >> cost here: It doesn't look very reasonable to me to demand of people adding >> support for new ISA extensions to also take into consideration all of this >> new machinery. Yet if any such addition affects SCFI, things will go out- >> of-sync without updating this code as well. It may not be very often that >> such updating is necessary, but right now there's APX work in progress >> which I expect will need dealing with here as well. >> > > I understand your concerns. FWIW, for SCFI, we will need to translate > only a subset of instructions into ginsns (change of flow insns, > save/restore and arith on REG_SP/REG_FP). Considering what you say further down regarding untraceability, I'm afraid you will need to care about all insns which have SP/FP as their destination. > For APX, I see that there is > are new instructions that fall in this set unfortunately. Hopefully the > set remains small. But in general, for future additions (and for APX > currently), there will be time to act as SCFI is for hand-written asm > for which synthesizing CFI is desired, not for all code that GAS > necessarily has to deal with. I understand that, but if a new ISA extension is supported in a new gas version, I'd expect it to be legitimate for people to expect that SCFI is then also supported for them right away. Else detecting what may or may not be used with particular versions of gas is going to become a nightmare. >>> +static ginsnS * >>> +x86_ginsn_alu (i386_insn insn, symbolS *insn_end_sym) >>> +{ >>> + offsetT src_imm; >>> + uint32_t dw2_regnum; >>> + enum ginsn_src_type src_type; >>> + enum ginsn_dst_type dst_type; >>> + ginsnS *ginsn = NULL; >>> + >>> + /* FIXME - create ginsn for REG_SP target only ? */ >>> + /* Map for insn.tm.extension_opcode >>> + 000 ADD 100 AND >>> + 001 OR 101 SUB >>> + 010 ADC 110 XOR >>> + 011 SBB 111 CMP */ >>> + >>> + /* add/sub imm, %reg. >>> + and imm, %reg only at this time for SCFI. */ >>> + if (!(insn.tm.extension_opcode == 0 >>> + || insn.tm.extension_opcode == 4 >>> + || insn.tm.extension_opcode == 5)) >>> + return ginsn; >> >> Why is AND permitted, but OR/XOR aren't? >> >> Also this is about ALU insns with immediate operands only, yet that >> fact isn't reflected in the function name. >> > > OR/XOR should be handled. I will fix this. > > This function is handling opcodes with imm only, true. ADD/SUB with reg > are handled elsewhere (with opcode == 0x1 and opcode == 0x29). I saw that, but this doesn't eliminate my remark regarding the name of this function. Also the two other opcodes you mention aren't quite enough. >>> + /* TBD_GINSN_REPRESENTATION_LIMIT: There is no representation for when a >>> + symbol is used as an operand, like so: >>> + addq $simd_cmp_op+8, %rdx >>> + Skip generating any ginsn for this. */ >>> + if (insn.imm_operands == 1 >>> + && insn.op[0].imms->X_op == O_symbol) >>> + return ginsn; >>> + >>> + gas_assert (insn.imm_operands == 1 >>> + && insn.op[0].imms->X_op == O_constant); >> >> Perhaps less fragile if you use != O_constant in the preceding if()? >> The remaining half could the move ahead of that if(), allowing half >> of its condition to also be dropped. >> >>> + src_imm = insn.op[0].imms->X_add_number; >>> + /* The second operand may be a register or indirect access. */ >>> + if (insn.mem_operands == 1 && insn.base_reg) >>> + { >>> + dw2_regnum = ginsn_dw2_regnum (insn.base_reg); >>> + src_type = GINSN_SRC_INDIRECT; >>> + dst_type = GINSN_DST_INDIRECT; >> >> The possibly in use index register isn't of interest in this case? >> Nor the displacement in the memory operand, ... >> > > Not of interest, No. When src_type and dst_type is INDIRECT. It's not clear to me what you mean with your reply. If the addressing form used isn't relevant, why do you handle base-only and ... >>> + } >>> + else if (insn.mem_operands == 1 && insn.index_reg) >>> + { >>> + dw2_regnum = ginsn_dw2_regnum (insn.index_reg); >>> + src_type = GINSN_SRC_INDIRECT; >>> + dst_type = GINSN_DST_INDIRECT; >> >> ... similarly applicable here? ... index-only separately in the first place? >> Nor a segment override? > > Segment override probably is not of interest for SCFI. > >> >>> + } >>> + else >>> + { >>> + gas_assert (insn.reg_operands == 1); >> >> Afaict this will trigger when the memory operand has neither base >> nor index. >> > > memory operand with neither base nor index ? An asm instruction example > will be helpful. add $1, symbol add %eax, symbol Maybe you really don't need to consider those (if only GPR destinations are of interest), but then a comment should be saying so and you'd probably want to bail from the function early on in such cases. >>> +static ginsnS * >>> +x86_ginsn_lea (i386_insn insn, symbolS *insn_end_sym) >>> +{ >>> + offsetT src_disp = 0; >>> + ginsnS *ginsn = NULL; >>> + uint32_t base_reg; >>> + uint32_t index_reg; >>> + offsetT index_scale; >>> + uint32_t dst_reg; >>> + >>> + if (!insn.index_reg && !insn.base_reg) >>> + { >>> + /* lea symbol, %rN. */ >>> + dst_reg = ginsn_dw2_regnum (insn.op[1].regs); >>> + /* FIXME - Skip encoding information about the symbol. >>> + This is TBD_GINSN_INFO_LOSS, but it is fine if the mode is >>> + GINSN_GEN_SCFI. */ >>> + ginsn = ginsn_new_mov (insn_end_sym, false, >>> + GINSN_SRC_IMM, 0xf /* arbitrary const. */, 0, >>> + GINSN_DST_REG, dst_reg, 0); >>> + } >>> + else if (insn.base_reg && !insn.index_reg) >>> + { >>> + /* lea -0x2(%base),%dst. */ >>> + base_reg = ginsn_dw2_regnum (insn.base_reg); >>> + dst_reg = ginsn_dw2_regnum (insn.op[1].regs); >>> + >>> + if (insn.disp_operands) >>> + src_disp = insn.op[0].disps->X_add_number; >> >> What if the displacement expression is other than O_constant? >> > > For SCFI, a "complex" lea instruction will imply some untraceable change > to the destination reg. For SCFI, the extent of untraceable change is > not of interest, hence, in general, some information loss in ginsn is > tolerable. As a fundamental request: For anything that's of no interest, can you please try to cover this with as little (and thus clear) code as possible? No taking apart of sub-cases when those are indifferent in the end anyway. > I will have to take a look at such instructions to see what the options > are. If you have a example handy, it will be helpful. lea symbol(%rax), %rbp >>> + if (src_disp) >>> + /* Generate an ADD ginsn. */ >>> + ginsn = ginsn_new_add (insn_end_sym, true, >>> + GINSN_SRC_REG, base_reg, 0, >>> + GINSN_SRC_IMM, 0, src_disp, >>> + GINSN_DST_REG, dst_reg, 0); >>> + else >>> + /* Generate a MOV ginsn. */ >>> + ginsn = ginsn_new_mov (insn_end_sym, true, >>> + GINSN_SRC_REG, base_reg, 0, >>> + GINSN_DST_REG, dst_reg, 0); >>> + } >>> + else if (!insn.base_reg && insn.index_reg) >>> + { >>> + /* lea (,%index,imm), %dst. */ >>> + /* FIXME - Skip encoding an explicit multiply operation, instead use >>> + GINSN_TYPE_OTHER. This is TBD_GINSN_INFO_LOSS, but it is fine if >>> + the mode is GINSN_GEN_SCFI. */ >>> + index_scale = insn.log2_scale_factor; >>> + index_reg = ginsn_dw2_regnum (insn.index_reg); >>> + dst_reg = ginsn_dw2_regnum (insn.op[1].regs); >>> + ginsn = ginsn_new_other (insn_end_sym, true, >>> + GINSN_SRC_REG, index_reg, >>> + GINSN_SRC_IMM, index_scale, >>> + GINSN_DST_REG, dst_reg); >>> + } >>> + else >>> + { >>> + /* lea disp(%base,%index,imm) %dst. */ >>> + /* FIXME - Skip encoding information about the disp and imm for index >>> + reg. This is TBD_GINSN_INFO_LOSS, but it is fine if the mode is >>> + GINSN_GEN_SCFI. */ >>> + base_reg = ginsn_dw2_regnum (insn.base_reg); >>> + index_reg = ginsn_dw2_regnum (insn.index_reg); >>> + dst_reg = ginsn_dw2_regnum (insn.op[1].regs); >>> + /* Generate an ADD ginsn. */ >>> + ginsn = ginsn_new_add (insn_end_sym, true, >>> + GINSN_SRC_REG, base_reg, 0, >>> + GINSN_SRC_REG, index_reg, 0, >>> + GINSN_DST_REG, dst_reg, 0); >> >> The comment mentions the displacement, but it's not passed on? In the earlier >> "else if" block you also pay attention to the scla factor, but here you don't? >> That said, I think I'm confused anyway about the FIXME comments. >> > > Ah, This is what I was hinting at in the previous comment. > > So whats going on here is the following - > > First, An "add reg1, reg2, reg3" (reg3 being the destination) ginsn is > only interesting for SCFI if the reg3 is either REG_FP or REG_SP. We > still generate them all nevertheless for now. > > Second, We say that "add reg1, reg2, reg3" makes reg3 "untraceable". > This is based on the observation that static stack usage in a function > will use imm based arithmetic insns. That is, you only intend to support that way of allocating stack space (i.e. one of the apparently many constraints to using this machinery). > So, if reg3 is REG_SP, REG_FP, > these instructions make REG_SP / REG_FP "untraceable". > > Note that, We are not allowing all registers for CFA tracking. (Now > _this_ is a constraint, yes. And which is why we dont support DRAP usage > currently. But its a specific pattern that we can accommodate in future.) > > Hence, tracking more information for such (untraceable) instructions is > unnecessary for SCFI purposes. If and when GAS uses ginsn (in the i386 > backend) for other purposes, such information loss will need to be > revisited. I guess there wants to be a central comment (either in common code, or once per arch) which other comments then can refer to. To the reader of such code it needs to be clear what bits of information are relevant and which ones can be ignored (for now). >>> + ginsn = ginsn_new_jump (insn_end_sym, true, >>> + GINSN_SRC_SYMBOL, 0, src_symbol); >>> + >>> + ginsn_set_where (ginsn); >>> + } >>> + >>> + return ginsn; >>> +} >> >> Further, what about XABORT transferring control to what XBEGIN has >> supplied? (XBEGIN can, in a sense, also be considered a [conditional] >> branch.) >> > > My plan after writing this x86 -> ginsn translator was: > There are and will remain many machine instructions that may need to be > supported depending on hand-written assmebly programmers use. This is > the reason I had left some asserts towards the end in the ginsn_new () > API: if any instructions which affect the control flow, or where the > destination is REG_FP / REG_SP, or push* / pop* are not handled, we > will have an assert fail and GAS will need to handle it. Elsewhere you mention you want to use this for some of the kernel's assembly files. If this feature went in with incomplete coverage, the kernel build system detecting its availability will likely be misled. Furthermore I'm of the opinion that at the time this series goes in, assertions should remain only to check that state is as expected. They should not be used to check that user input is unexpected. Such may only be communicated back by ordinary diagnostics; the assembler should not crash (and even deliberately) on certain input. Besides such being a bug, it also gets in the way of fuzzing. >>> +static ginsnS * >>> +x86_ginsn_jump_cond (i386_insn insn, symbolS *insn_end_sym) >>> +{ >>> + ginsnS *ginsn = NULL; >>> + symbolS *src_symbol; >>> + >>> + /* TBD_GINSN_GEN_NOT_SCFI: Ignore move to or from xmm reg for mode. */ >>> + if (i.tm.opcode_space == SPACE_0F) >>> + return ginsn; >> >> What is the comment about? And what about SPACE_0F-encoded conditional >> jumps (Jcc )? And where are LOOP and J{E,R}CXZ handled? >> > > These need to be handled. I will take a look. Just to mention it: The Jcc forms should appear only during relaxation, while - aiui - all your code runs earlier than that. I was misled by you (elsewhere) handling both 0xE9 and 0xEB as JMP opcodes, despite the form there - iirc - also only appearing during relaxation. >>> + gas_assert (insn.disp_operands == 1); >>> + >>> + if (insn.op[0].disps->X_op == O_symbol) >>> + { >>> + src_symbol = insn.op[0].disps->X_add_symbol; >>> + /* The jump target is expected to be a symbol with 0 addend. >>> + Assert for now to see if this assumption is true. */ >>> + gas_assert (insn.op[0].disps->X_add_number == 0); >>> + ginsn = ginsn_new_jump_cond (insn_end_sym, true, >>> + GINSN_SRC_SYMBOL, 0, src_symbol); >>> + ginsn_set_where (ginsn); >>> + } >>> + else >>> + /* Catch them for now so we know what we are dealing with. */ >>> + gas_assert (0); >> >> I'm going from the assumption that this (and alike) will be addressed >> before this series is committed? > > I will work on the ones you have raised above in the review. My plan was > to exercise this code in different ways after commmitting and address > the failures as I run into them. Well, as said, I think full insn coverage is a prereq to committing, or else there needs to be a way to easily detect an assumed fully functional implementation in a later gas version, no matter that a partially functional one may have been present in a number of earlier versions already. But of course Nick in particular may have a different view on this, and his view would trump mine. >>> +{ >>> + uint16_t opcode; >>> + uint32_t dw2_regnum; >>> + uint32_t src2_dw2_regnum; >>> + int32_t gdisp = 0; >>> + ginsnS *ginsn = NULL; >>> + ginsnS *ginsn_next = NULL; >>> + ginsnS *ginsn_last = NULL; >>> + >>> + /* FIXME - Need a way to check whether the decoding is sane. The specific >>> + checks around i.tm.opcode_space were added as issues were seen. Likely >>> + insufficient. */ >> >> Furthermore opcode encoding space (SPACE_...) need to be taken into >> account in all cases. >> >>> + /* Currently supports generation of selected ginsns, sufficient for >>> + the use-case of SCFI only. To remove this condition will require >>> + work on this target-specific process of creation of ginsns. Some >>> + of such places are tagged with TBD_GINSN_GEN_NOT_SCFI to serve as >>> + examples. */ >>> + if (gmode != GINSN_GEN_SCFI) >>> + return ginsn; >>> + >>> + opcode = i.tm.base_opcode; >>> + >>> + switch (opcode) >>> + { >>> + case 0x1: >>> + /* add reg, reg. */ >>> + dw2_regnum = ginsn_dw2_regnum (i.op[0].regs); >> >> You don't care about opcode 0 (byte operation). Then what about 16-bit >> operand size? Or, since we're talking of a 64-bit-ABI-only feature, >> even 32-bit operand size? >> > > Operand size in some cases does not affect SCFI. So we dont keep > information about operand sizes. The case that I should handle operand > size is when they are pushed to / popped from stack. So you care about recognizing when %rbp is overwritten by an insn. And you also care about the same for %ebp and %bp. In that sense operand size may indeed be unnecessary to determine. Except that then you also want to deal with byte operations, i.e. %bpl being the destination. >> Also what about opcode 0x3? > > LSL instruction ? It doesnt look like it can affect DWARF CFI of a > function. Please correct me if this is wrong. LSL is 0f03, i.e. opcode 0x3 in SPACE_0F. What my comment was about is ADD with inversed operands (ModRM.rm soure, ModRM.reg destination). All four of these flavors of ADD are covered by a single template, using the D and W attributes to establish opcode bits 0 and 1 based on actual operands (and possibly pseudo-prefixes). >>> + if (i.reg_operands == 2) >>> + { >>> + src2_dw2_regnum = ginsn_dw2_regnum (i.op[1].regs); >>> + ginsn = ginsn_new_add (insn_end_sym, true, >>> + GINSN_SRC_REG, dw2_regnum, 0, >>> + GINSN_SRC_REG, src2_dw2_regnum, 0, >>> + GINSN_DST_REG, src2_dw2_regnum, 0); >>> + ginsn_set_where (ginsn); >>> + } >>> + else if (i.mem_operands && i.base_reg) >>> + { >>> + src2_dw2_regnum = ginsn_dw2_regnum (i.base_reg); >>> + if (i.disp_operands == 1) >>> + gdisp = i.op[1].disps->X_add_number; >>> + >>> + ginsn = ginsn_new_add (insn_end_sym, true, >>> + GINSN_SRC_REG, dw2_regnum, 0, >>> + GINSN_SRC_INDIRECT, src2_dw2_regnum, gdisp, >>> + GINSN_DST_INDIRECT, src2_dw2_regnum, gdisp); >>> + ginsn_set_where (ginsn); >>> + } >>> + else >>> + /* Catch them for now so we know what we are dealing with. */ >>> + gas_assert (0); >>> + >>> + break; >>> + case 0x29: >>> + /* If opcode_space == SPACE_0F, this is a movaps insn. Skip it >>> + for GINSN_GEN_SCFI. */ >>> + if (i.tm.opcode_space == SPACE_0F) >>> + break; >> >> Extending on the earlier related comment: Why would you exclude just SPACE_0F? >> > > Perhaps here I should check that space == SPACE_BASE and only allow > that? And also for others. Indeed. >>> + dw2_regnum = ginsn_dw2_regnum (i.op[0].regs); >>> + /* push fs / push gs. */ >>> + ginsn = ginsn_new_sub (insn_end_sym, false, >>> + GINSN_SRC_REG, REG_SP, 0, >>> + GINSN_SRC_IMM, 0, 8, >> >> Note how the 8 here assumes flag_code == CODE_64BIT. (You further assume >> no operand size override here.) >> > > Hmm. Yes, I do assume no operand size override here. I will need to > understand how to calculate the operand size here. Looks like I need to > check for instruction prefixes 66H or REX.W. Is there an API in the > backend that be used for this ? i.prefixes[] and (depending on the phase of assembly) i.rex should be telling you. >>> + dw2_regnum = ginsn_dw2_regnum (i.base_reg); >>> + ginsn = ginsn_new_load (insn_end_sym, false, >>> + GINSN_SRC_INDIRECT, REG_SP, 0, >>> + GINSN_DST_INDIRECT, dw2_regnum); >> >> When both operands are "indirect", what's the difference between >> move, load, and store? IOW if the above is permitted, can't all >> three be folded into a single ginsn kind? >> > > In theory, any of mov/load/store ginsn type may be used if both operands > are indirect mem access. Which suggests that things can be simplified then. >>> + case 0xc2: >>> + case 0xc3: >>> + /* Near ret. */ >>> + ginsn = ginsn_new_return (insn_end_sym, true); >>> + ginsn_set_where (ginsn); >>> + break; >> >> Where did the immediate operand of 0xC2 go? And what about far return >> (and more generally far branches)? >> > > A return ginsn for SCFI can afford this information loss. Hmm, right, I guess by how much the stack pointer is incremented at the return point isn't of any interest anymore. Unless of course you intended to support co-routines. >>> + case 0xc9: >>> + /* The 'leave' instruction copies the contents of the RBP register >>> + into the RSP register to release all stack space allocated to the >>> + procedure. */ >>> + ginsn = ginsn_new_mov (insn_end_sym, false, >>> + GINSN_SRC_REG, REG_FP, 0, >>> + GINSN_DST_REG, REG_SP, 0); >>> + ginsn_set_where (ginsn); >>> + >>> + /* Then it restores the old value of the RBP register from the stack. */ >>> + ginsn_next = ginsn_new_load (insn_end_sym, false, >>> + GINSN_SRC_INDIRECT, REG_SP, 0, >>> + GINSN_DST_REG, REG_FP); >>> + ginsn_set_where (ginsn_next); >>> + >>> + gas_assert (!ginsn_link_next (ginsn, ginsn_next)); >>> + ginsn_last = ginsn_new_add (insn_end_sym, false, >>> + GINSN_SRC_REG, REG_SP, 0, >>> + GINSN_SRC_IMM, 0, 8, >>> + GINSN_DST_REG, REG_SP, 0); >>> + ginsn_set_where (ginsn_next); >>> + >>> + gas_assert (!ginsn_link_next (ginsn_next, ginsn_last)); >>> + break; >> >> You handle LEAVE, but not ENTER? >> > > I should have. From the three variants ([Enter imm16,0], [ENTER imm16,1] > and [ENTER imm16, imm8], the last one is not super clear currently. Is > it possible for you to pass down some understanding to me on that ? I can't say more than what the CPU doc says: The second operand tells how many earlier frame pointers are copied to the local frame. Perhaps you want to simply treat non-zero values as untraceable SP/FP? Jan