From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) by sourceware.org (Postfix) with ESMTPS id BACA03858C42 for ; Mon, 4 Dec 2023 04:34:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BACA03858C42 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BACA03858C42 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::32e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701664453; cv=none; b=rzsASnQuNTdg+WN/ohwRA1onJC94lf0YIjeAp7t4mlH77u9Cyvygd0S0tFOFuf2xwXA5zehho5btPMDpfdcfi9S8LnmViZtw2ZPVlHewuvKI73TTxovOb1wFyNtg+JXbSX5pTZCJIXJf+Oa9aKFaoSV0z5lwnhyoUQbOkZEG7Hg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701664453; c=relaxed/simple; bh=29gz33iXyE8OHeXah+Bq7JhDMaOGn7+5QKwkTMDZ03E=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject; b=G3hvnSQzRBeyIodfyx5XTqH5oqYlh+2/D/Ar6qIWUkCdMigDm2NJK+5+1rliudN99JR04C21VoqQZC6WE5RxQqirHj+3esckisOTQ4i5/QF7rVy29E8rM0MxQsUN6oCxfmeaAK+PD72K14bvnmeX34uwSBZQcIpqcKz7ugxhtcc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ot1-x32e.google.com with SMTP id 46e09a7af769-6d7fa93afe9so2000459a34.2 for ; Sun, 03 Dec 2023 20:34:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701664445; x=1702269245; darn=sourceware.org; h=subject:to:content-language:from:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=D686oQw42DgyAD5spWA4/LOj0ZP3IbGybl6mietMXQE=; b=iPFDY+R36+Ok9XJKtLkkwlIQlvIxY8kD5bp3bJrx3B3D0X2IQZlzSMucZEKDkTSg0x NoCqFNoQ8tQbBgprX/oPE0/MPKEsDHjL9ROJKOWzEDcNbD/Fxl8FI3YRHlbwc037Tvop ewUamNZp645GgX3NFwkwqtZA7WNgoU0ckDavhJW1Bab0BTTlkL2e7KvpCJFMrJvMabtL 0RO2kEnyLuPnlQAN1Qgt3ylfMyFsAIBB5nWZGVKHGqUTkeVokQiT/wVicWNt2yOSByjg KMWdOpcw9Xvmi3Y3yNbpYJHTKzi8zucivmVPjjnJaKwnmSwaIb2Ei9d2wV4ZSxCB2hpP 55jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701664445; x=1702269245; h=subject:to:content-language:from:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=D686oQw42DgyAD5spWA4/LOj0ZP3IbGybl6mietMXQE=; b=O2bvwh2ecsmoNIhl8AFWj5fvr5d/vy2NYu60vTIeH+pQ4KGAOKM0v/oT9lWDjfIfwx 3i1PeeLf+5LVeKHrb94/zoeIuRuC0RTXrD+clqJ4+nwn/V2NtgVo63IGBEMzews2jJgQ IUQFennP4tSHDkBl8N5pk66JgYaR7WLe7esZZILQzYKfnqbwDrievKq5w/PClm9frVWv 6/keFcgmFxcNKlNWkWr1Opved5x/XvY0icPHdaBaYe3iS1ZeDJbaUjnwMIKXMi0Pduuq dVYqxpG6vRM+vVd559NNenfudfT8W1VkJy74NzmdJZ4SBfEiFiwAqguwE6ukXuYczUuG WRQg== X-Gm-Message-State: AOJu0YwFqjIf9x88VR04Z59hUw5bRuc8DUJNYSFudEjQd+w5G7jL2WMc B+GyIgFyiSXIN1B7c47LhoNf/EQyu94ynw== X-Google-Smtp-Source: AGHT+IEyQyV3SLTeIXi8Lpymejv3wKx9SeJROdeOyX55m/5tMlvrjYWe6ChJplOiG7zUmLAsdGukeA== X-Received: by 2002:a05:6870:6b07:b0:1fb:2e1:bf6a with SMTP id mt7-20020a0568706b0700b001fb02e1bf6amr3636833oab.5.1701664444898; Sun, 03 Dec 2023 20:34:04 -0800 (PST) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id k28-20020a635a5c000000b00528db73ed70sm6705322pgm.3.2023.12.03.20.34.02 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 03 Dec 2023 20:34:04 -0800 (PST) Content-Type: multipart/mixed; boundary="------------daqEO2XCvpwqgA0o2JQwBc5W" Message-ID: <41f2930c-60cd-4992-baec-f8bfee1439e4@gmail.com> Date: Sun, 3 Dec 2023 21:34:00 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Jeff Law Content-Language: en-US To: gdb-patches@sourceware.org Subject: Improve performance of the H8 simulator X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------daqEO2XCvpwqgA0o2JQwBc5W Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Running the H8 port through the GCC testsuite currently takes 4h 30m on my fastest server -- that's roughly 1.5hrs per multilib tested and many tests are disabled for various reasons. To put that 1.5hr/multilib in perspective, that's roughly 3X the time for other embedded targets. Clearly something isn't working as well as it should. A bit of digging with perf shows that we're spending a crazy amount of time decoding instructions in the H8 simulator. It's not hard to see why -- basically we take a blob of instruction data, then try to match it to every instruction in the H8 opcode table starting at the beginning. That table has ~8000 entries (each different addressing mode is considered a different instruction in the table). Naturally my first thought was to sort the table and use a binary search to find the right entry. That's made excessively complex due to the encoding on the H8. Just getting the sort right would be much more complex than I'd consider advisable. Another thought was to build a mapping to the right entry for all the instructions that can be disambiguated based on the first nibble (4 bits) of instruction data and a mapping for those which can be disambiguated based on the first byte of instruction data. That seemed feasible until I realized that the H8/SX did some truly horrid things with encoding branches in the 0x4XYY opcode space. It uses an "always zero" bit in the offset to encode new semantic information. So we can't select on just 0x4X. Ugh! We could always to a custom decoder. I've done several through the years, they can be very fast. But no way I can justify the time to do that. So what I settled on was to first sort the opcode table by the first nibble, then find the index of the first instruction for each nibble. Decoding uses that index to start its search. This cuts the overall build/test by more than half. Next I adjusted the sort so that instructions that are not available on the current sub architecture are put at the end of the table. This shaves another ~15% off the total cycle time. The net of the two changes is on my fastest server we've gone from 4:30 to 1:40 running the GCC testsuite. Same test results before/after, of course. It's still not fast, but it's a hell of a lot better. OK for the trunk? Thanks, Jeff --------------daqEO2XCvpwqgA0o2JQwBc5W Content-Type: text/plain; charset=UTF-8; name="P" Content-Disposition: attachment; filename="P" Content-Transfer-Encoding: base64 CSogaDgzMDAvY29tcGlsZS5jIChuaWJfaW5kaWNlcyk6IE5ldyB2YXJpYWJsZS4KCShpbnN0 cnVjdGlvbl9jb21wYXJhdG9yKTogTmV3IGZ1bmN0aW9uLgoJKHNvcnRfb3Bjb2Rlc19hbmRf c2V0dXBfbmliYmxlX2luZGljZXMpOiBOZXcgZnVuY3Rpb24uCgkoaW5pdF9wb2ludGVycyk6 IENhbGwgc29ydF9vcGNvZGVzX2FuZF9zZXR1cF9uaWJibGVfaW5kaWNlcy4KCShkZWNvZGUp OiBVc2UgbmliX2luZGljZXMgdG8gYXZvaWQgc29tZSB1c2VsZXNzIHRhYmxlIHNlYXJjaGlu Zy4KCmRpZmYgLS1naXQgYS9zaW0vaDgzMDAvY29tcGlsZS5jIGIvc2ltL2g4MzAwL2NvbXBp bGUuYwppbmRleCBhNGIzOWFlMzM4MC4uY2M5NGU2M2MwZmQgMTAwNjQ0Ci0tLSBhL3NpbS9o ODMwMC9jb21waWxlLmMKKysrIGIvc2ltL2g4MzAwL2NvbXBpbGUuYwpAQCAtNDQsNiArNDQs MTEgQEAKIAogaW50IGRlYnVnOwogCisvKiBFYWNoIGVudHJ5IGluIHRoaXMgYXJyYXkgaXMg YW4gaW5kZXggaW50byB0aGUgbWFpbiBvcGNvZGUKKyAgIGFycmF5IGZvciB0aGUgZmlyc3Qg aW5zdHJ1Y3Rpb24gc3RhcnRpbmcgd2l0aCB0aGUgZ2l2ZW4KKyAgIDQgYml0IG5pYmJsZS4g ICovCitzdGF0aWMgaW50IG5pYl9pbmRpY2VzWzE2XTsKKwogc3RhdGljIGludCBtZW1vcnlf c2l6ZTsKIAogI2RlZmluZSBYKG9wLCBzaXplKSAgKG9wICogNCArIHNpemUpCkBAIC0zODgs MTQgKzM5MywyMSBAQCBkZWNvZGUgKFNJTV9ERVNDIHNkLCBzaW1fY3B1ICpjcHUsIGludCBh ZGRyLCB1bnNpZ25lZCBjaGFyICpkYXRhLCBkZWNvZGVkX2luc3QgKgogICBpbnQgcmVnWzNd ICAgPSB7MCwgMCwgMH07CiAgIGludCByZGlzcFszXSA9IHswLCAwLCAwfTsKICAgaW50IG9w bnVtOworICBpbnQgaW5kZXg7CiAgIGNvbnN0IHN0cnVjdCBoOF9vcGNvZGUgKnE7CiAKICAg ZHN0LT5kc3QudHlwZSA9IC0xOwogICBkc3QtPnNyYy50eXBlID0gLTE7CiAgIGRzdC0+b3Az LnR5cGUgPSAtMTsKIAotICAvKiBGaW5kIHRoZSBleGFjdCBvcGNvZGUvYXJnIGNvbWJvLiAg Ki8KLSAgZm9yIChxID0gaDhfb3Bjb2RlczsgcS0+bmFtZTsgcSsrKQorICAvKiBXZSBzcGVl ZCB1cCBpbnN0cnVjdGlvbiBkZWNvZGluZyBieSBjYWNoaW5nIGFuIGluZGV4IGludG8KKyAg ICAgdGhlIG1haW4gb3Bjb2RlIGFycmF5IGZvciB0aGUgZmlyc3QgaW5zdHJ1Y3Rpb24gd2l0 aCB0aGUKKyAgICAgZ2l2ZW4gNCBiaXQgbmliYmxlLiAgKi8KKyAgaW5kZXggPSBuaWJfaW5k aWNlc1soZGF0YVswXSAmIDB4ZjApID4+IDRdOworCisgIC8qIEZpbmQgdGhlIGV4YWN0IG9w Y29kZS9hcmcgY29tYm8sIHN0YXJ0aW5nIHdpdGggdGhlIHByZWNvbXB1dGVkCisgICAgIGlu ZGV4LiAgTm90ZSB0aGlzIGxvb3AgaXMgcGVyZm9ybWFuY2Ugc2Vuc2l0aXZlLiAgKi8KKyAg Zm9yIChxID0gJmg4X29wY29kZXNbaW5kZXhdOyBxLT5uYW1lOyBxKyspCiAgICAgewogICAg ICAgY29uc3Qgb3BfdHlwZSAqbmliID0gcS0+ZGF0YS5uaWI7CiAgICAgICB1bnNpZ25lZCBp bnQgbGVuID0gMDsKQEAgLTE1NTcsNiArMTU2OSw4NSBAQCBzdG9yZTIgKFNJTV9ERVNDIHNk LCBlYV90eXBlICphcmcsIGludCBuKQogICByZXR1cm4gc3RvcmVfMSAoc2QsIGFyZywgbiwg MSk7CiB9CiAKKy8qIENhbGxiYWNrIGZvciBxc29ydC4gIFdlIHNvcnQgZmlyc3QgYmFzZWQg b24gYXZhaWxhYmxpdHkKKyAgIChhdmFpbGFibGUgaW5zdHJ1Y3Rpb25zIHNvcnQgbG93ZXIp LiAgV2hlbiBhdmFpbGFiaWxpdHkgc3RhdGUKKyAgIGlzIHRoZSBzYW1lLCB0aGVuIHdlIHVz ZSB0aGUgZmlyc3QgNCBiaXQgbmliYmxlIGFzIGEgc2Vjb25kYXJ5CisgICBzb3J0IGtleS4K KworICAgV2UgZG9uJ3QgcmVhbGx5IGNhcmUgYWJvdXQgMTAwJSBzdGFiaWxpdHkgaGVyZSwg anVzdCB0aGF0IHRoZQorICAgYXZhaWxhYmxlIGluc3RydWN0aW9ucyBjb21lIGZpcnN0IGFu ZCBhbGwgaW5zdHJ1dGlvbnMgd2l0aAorICAgdGhlIHNhbWUgc3RhcnRpbmcgbmliYmxlIGFy ZSBjb25zZWN1dGl2ZS4KKworICAgV2UgY291bGQgZG8gZXZlbiBiZXR0ZXIgYnkgcmVjb3Jk aW5nIGZyZXF1ZW5jeSBpbmZvcm1hdGlvbiBpbnRvIHRoZQorICAgbWFpbiB0YWJsZSBhbmQg dXNpbmcgdGhhdCB0byBzb3J0IHdpdGhpbiBhIG5pYmJsZSdzIGdyb3VwIHdpdGggdGhlCisg ICBoaWdoZXN0IGZyZXF1ZW5jeSBpbnN0cnVjdGlvbnMgYXBwZWFyaW5nIGZpcnN0LiAgKi8K Kworc3RhdGljIGludAoraW5zdHJ1Y3Rpb25fY29tcGFyYXRvciAoY29uc3Qgdm9pZCAqcDFf LCBjb25zdCB2b2lkICpwMl8pCit7CisgIHN0cnVjdCBoOF9vcGNvZGUgKnAxID0gKHN0cnVj dCBoOF9vcGNvZGUgKilwMV87CisgIHN0cnVjdCBoOF9vcGNvZGUgKnAyID0gKHN0cnVjdCBo OF9vcGNvZGUgKilwMl87CisKKyAgLyogVGhlIDFzdCBzb3J0IGtleSBpcyBiYXNlZCBvbiB3 aGV0aGVyIG9yIG5vdCB0aGUKKyAgICAgaW5zdHJ1Y3Rpb24gaXMgZXZlbiBhdmFpbGFibGUu ICBUaGlzIHJlZHVjZXMgdGhlCisgICAgIG51bWJlciBvZiBlbnRyaWVzIHdlIGhhdmUgdG8g bG9vayBhdCBpbiB0aGUgY29tbW9uCisgICAgIGNhc2UuICAqLworICBib29sIHAxX2F2YWls YWJsZSA9ICEoKHAxLT5hdmFpbGFibGUgPT0gQVZfSDhTWCAmJiAhaDgzMDBzeG1vZGUpCisJ CQl8fCAocDEtPmF2YWlsYWJsZSA9PSBBVl9IOFMgICYmICFoODMwMHNtb2RlKQorCQkJfHwg KHAxLT5hdmFpbGFibGUgPT0gQVZfSDhIICAmJiAhaDgzMDBobW9kZSkpOworCisgIGJvb2wg cDJfYXZhaWxhYmxlID0gISgocDItPmF2YWlsYWJsZSA9PSBBVl9IOFNYICYmICFoODMwMHN4 bW9kZSkKKwkJCXx8IChwMi0+YXZhaWxhYmxlID09IEFWX0g4UyAgJiYgIWg4MzAwc21vZGUp CisJCQl8fCAocDItPmF2YWlsYWJsZSA9PSBBVl9IOEggICYmICFoODMwMGhtb2RlKSk7CisK KyAgLyogU29ydCBzbyB0aGF0IGF2YWlsYWJsZSBpbnN0cnVjdGlvbnMgY29tZSBiZWZvcmUg dW5hdmFpbGFibGUKKyAgICAgaW5zdHJ1Y3Rpb25zLiAgKi8KKyAgaWYgKHAxX2F2YWlsYWJs ZSAhPSBwMl9hdmFpbGFibGUpCisgICAgcmV0dXJuIHAyX2F2YWlsYWJsZSAtIHAxX2F2YWls YWJsZTsKKworICAvKiBTZWNvbmRhcmlseSBzb3J0IGJhc2VkIG9uIHRoZSBmaXJzdCBvcGNv ZGUgbmliYmxlLiAgKi8KKyAgcmV0dXJuIHAxLT5kYXRhLm5pYlswXSAtIHAyLT5kYXRhLm5p YlswXTsKK30KKworCisvKiBPUFMgaXMgdGhlIG9wY29kZSBhcnJheSwgd2hpY2ggaXMgaW5p dGlhbGx5IHNvcnRlZCBieSBtbmVub21pYy4KKworICAgU29ydCB0aGUgYXJyYXkgc28gdGhh dCB0aGUgaW5zdHJ1Y3Rpb25zIGZvciB0aGUgc3ViLWFyY2hpdGVjdHVyZQorICAgYXJlIGF0 IHRoZSBzdGFydCBhbmQgdW5hdmFpbGFibGUgaW5zdHJ1Y3Rpb25zIGFyZSBhdCB0aGUgZW5k LgorCisgICBXaXRoaW4gdGhlIHNldCBvZiBhdmFpbGFibGUgaW5zdHJ1Y3Rpb25zLCBmdXJ0 aGVyIHNvcnQgdGhlbSBiYXNlZAorICAgb24gdGhlIGZpcnN0IDQgYml0IG5pYmJsZS4KKwor ICAgVGhlbiBmaW5kIHRoZSBmaXJzdCBpbmRleCBpbnRvIE9QUyBmb3IgZWFjaCBvZiB0aGUg MTYgcG9zc2libGUKKyAgIG5pYmJsZXMgYW5kIHJlY29yZCB0aGF0IGludG8gTklCX0lORElD RVMgdG8gc3BlZWQgdXAgZGVjb2RpbmcuICAqLworCitzdGF0aWMgdm9pZAorc29ydF9vcGNv ZGVzX2FuZF9zZXR1cF9uaWJibGVfaW5kaWNlcyAoc3RydWN0IGg4X29wY29kZSAqb3BzKQor eworICBjb25zdCBzdHJ1Y3QgaDhfb3Bjb2RlICpxOworICBpbnQgKmluZGljZXM7CisgIGlu dCBpOworCisgIC8qIEZpcnN0IHNvcnQgdGhlIE9QUyBhcnJheS4gICovCisgIGZvciAoaSA9 IDAsIHEgPSBvcHM7IHEtPm5hbWU7IHErKywgaSsrKQorICAgIDsKKyAgcXNvcnQgKG9wcywg aSwgc2l6ZW9mIChzdHJ1Y3QgaDhfb3Bjb2RlKSwgaW5zdHJ1Y3Rpb25fY29tcGFyYXRvcik7 CisKKyAgLyogTm93IHdhbGsgdGhlIGFycmF5IGNhY2hpbmcgdGhlIGluZGV4IG9mIHRoZSBm aXJzdAorICAgICBvY2N1cnJlbmNlIG9mIGVhY2ggNCBiaXQgbmliYmxlLiAgKi8KKyAgbWVt c2V0IChuaWJfaW5kaWNlcywgLTEsIHNpemVvZiAoaW50KSAqIDE2KTsKKyAgZm9yIChpID0g MCwgcSA9IG9wczsgcS0+bmFtZTsgcSsrLCBpKyspCisgICAgeworICAgICAgaW50IG5pYiA9 IHEtPmRhdGEubmliWzBdOworCisgICAgICAvKiBSZWNvcmQgdGhlIGxvY2F0aW9uIG9mIHRo ZSBmaXJzdCBlbnRyeSB3aXRoIHRoZSByaWdodAorCSBuaWJibGUgY291bnQuICAqLworICAg ICAgaWYgKG5pYl9pbmRpY2VzW25pYl0gPT0gLTEpCisJbmliX2luZGljZXNbbmliXSA9IGk7 CisgICAgfQorfQorCisKIC8qIEZsYWcgdG8gYmUgc2V0IHdoZW5ldmVyIGEgbmV3IFNJTV9E RVNDIG9iamVjdCBpcyBjcmVhdGVkLiAgKi8KIHN0YXRpYyBpbnQgaW5pdF9wb2ludGVyc19u ZWVkZWQgPSAxOwogCkBAIC0xNjM5LDYgKzE3MzAsOSBAQCBpbml0X3BvaW50ZXJzIChTSU1f REVTQyBzZCkKIAkgIGg4X3NldF9yZWcgKGNwdSwgaSwgMCk7CiAJfQogCisgICAgICAvKiBT b3J0IHRoZSBvcGNvZGUgdGFibGUgYW5kIGNyZWF0ZSBpbmRpY2VzIHRvIHNwZWVkIHVwIGRl Y29kZS4gICovCisgICAgICBzb3J0X29wY29kZXNfYW5kX3NldHVwX25pYmJsZV9pbmRpY2Vz IChvcHMpOworCiAgICAgICBpbml0X3BvaW50ZXJzX25lZWRlZCA9IDA7CiAgICAgfQogfQo= --------------daqEO2XCvpwqgA0o2JQwBc5W--