From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.baldwin.cx (bigwig.baldwin.cx [IPv6:2607:f138:0:13::2]) by sourceware.org (Postfix) with ESMTPS id ED2D038582BE for ; Fri, 14 Jul 2023 15:55:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED2D038582BE Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=FreeBSD.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=FreeBSD.org Received: from ralph.com (c-98-35-126-114.hsd1.ca.comcast.net [98.35.126.114]) by mail.baldwin.cx (Postfix) with ESMTPSA id 5195E1A84C72 for ; Fri, 14 Jul 2023 11:55:58 -0400 (EDT) From: John Baldwin To: gdb-patches@sourceware.org Subject: [PATCH v6 15/15] gdbserver: Simplify handling of ZMM registers. Date: Fri, 14 Jul 2023 08:51:51 -0700 Message-Id: <20230714155151.21723-16-jhb@FreeBSD.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230714155151.21723-1-jhb@FreeBSD.org> References: <20230714155151.21723-1-jhb@FreeBSD.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.4 (mail.baldwin.cx [0.0.0.0]); Fri, 14 Jul 2023 11:55:58 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.103.1 at mail.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,FORGED_SPF_HELO,GIT_PATCH_0,KAM_DMARC_STATUS,KHOP_HELO_FCRDNS,SPF_HELO_PASS,SPF_SOFTFAIL,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: - Reuse num_xmm_registers directly for the count of ZMM0-15 registers as is already done for the YMM registers for AVX rather than using a new variable that is always the same. - Replace 3 identical variables for the count of upper ZMM16-31 registers with a single variable. Make use of this to merge various loops working on the ZMM XSAVE region so that all of the handling for the various sub-registers in this region are always handled in a single loop. - While here, fix some bugs in i387_cache_to_xsave on where if X86_XSTATE_ZMM was set on i386 (e.g. a 32-bit process on a 64-bit kernel), the -1 register nums would wrap around and store the value of GPRs in the XSAVE area. This should be harmless, but is definitely odd. Instead, check num_zmm_high_registers directly when checking X86_XSTATE_ZMM and skip the ZMM region handling entirely if the register count is 0. --- gdbserver/i387-fp.cc | 132 +++++++++++++++---------------------------- 1 file changed, 46 insertions(+), 86 deletions(-) diff --git a/gdbserver/i387-fp.cc b/gdbserver/i387-fp.cc index f53a6cfc477..91d3a0b8ca3 100644 --- a/gdbserver/i387-fp.cc +++ b/gdbserver/i387-fp.cc @@ -267,12 +267,8 @@ i387_cache_to_xsave (struct regcache *regcache, void *buf) /* Amd64 has 16 xmm regs; I386 has 8 xmm regs. */ int num_xmm_registers = amd64 ? 16 : 8; - /* AVX512 extends the existing xmm/ymm registers to a wider mode: zmm. */ - int num_avx512_zmmh_low_registers = num_xmm_registers; - /* AVX512 adds 16 extra regs in Amd64 mode, but none in I386 mode.*/ - int num_avx512_zmmh_high_registers = amd64 ? 16 : 0; - int num_avx512_ymmh_registers = amd64 ? 16 : 0; - int num_avx512_xmm_registers = amd64 ? 16 : 0; + /* AVX512 adds 16 extra ZMM regs in Amd64 mode, but none in I386 mode.*/ + int num_zmm_high_registers = amd64 ? 16 : 0; /* The supported bits in `xstat_bv' are 8 bytes. Clear part in vector registers if its bit in xstat_bv is zero. */ @@ -321,18 +317,12 @@ i387_cache_to_xsave (struct regcache *regcache, void *buf) memset (fp->k_space () + i * 8, 0, 8); if ((clear_bv & X86_XSTATE_ZMM_H)) - for (i = 0; i < num_avx512_zmmh_low_registers; i++) + for (i = 0; i < num_xmm_registers; i++) memset (fp->zmmh_space () + i * 32, 0, 32); if ((clear_bv & X86_XSTATE_ZMM)) - { - for (i = 0; i < num_avx512_zmmh_high_registers; i++) - memset (fp->zmm_space () + 32 + i * 64, 0, 32); - for (i = 0; i < num_avx512_xmm_registers; i++) - memset (fp->zmm_space () + i * 64, 0, 16); - for (i = 0; i < num_avx512_ymmh_registers; i++) - memset (fp->zmm_space () + 16 + i * 64, 0, 16); - } + for (i = 0; i < num_zmm_high_registers; i++) + memset (fp->zmm_space () + i * 64, 0, 64); if ((clear_bv & X86_XSTATE_PKRU)) for (i = 0; i < num_pkeys_registers; i++) @@ -446,7 +436,7 @@ i387_cache_to_xsave (struct regcache *regcache, void *buf) { int zmm0h_regnum = find_regno (regcache->tdesc, "zmm0h"); - for (i = 0; i < num_avx512_zmmh_low_registers; i++) + for (i = 0; i < num_xmm_registers; i++) { collect_register (regcache, i + zmm0h_regnum, raw); p = fp->zmmh_space () + i * 32; @@ -458,55 +448,35 @@ i387_cache_to_xsave (struct regcache *regcache, void *buf) } } - /* Check if any of ZMM16H-ZMM31H registers are changed. */ - if ((x86_xcr0 & X86_XSTATE_ZMM)) + /* Check if any of ZMM16-ZMM31 registers are changed. */ + if ((x86_xcr0 & X86_XSTATE_ZMM) && num_zmm_high_registers != 0) { - int zmm16h_regnum = (num_avx512_zmmh_high_registers == 0 - ? -1 - : find_regno (regcache->tdesc, "zmm16h")); + int zmm16h_regnum = find_regno (regcache->tdesc, "zmm16h"); + int ymm16h_regnum = find_regno (regcache->tdesc, "ymm16h"); + int xmm16_regnum = find_regno (regcache->tdesc, "xmm16"); - for (i = 0; i < num_avx512_zmmh_high_registers; i++) + for (i = 0; i < num_zmm_high_registers; i++) { - collect_register (regcache, i + zmm16h_regnum, raw); - p = fp->zmm_space () + 32 + i * 64; - if (memcmp (raw, p, 32) != 0) - { - xstate_bv |= X86_XSTATE_ZMM; - memcpy (p, raw, 32); - } - } - } - - /* Check if any XMM_AVX512 registers are changed. */ - if ((x86_xcr0 & X86_XSTATE_ZMM)) - { - int xmm_avx512_regnum = (num_avx512_xmm_registers == 0 - ? -1 - : find_regno (regcache->tdesc, "xmm16")); - - for (i = 0; i < num_avx512_xmm_registers; i++) - { - collect_register (regcache, i + xmm_avx512_regnum, raw); p = fp->zmm_space () + i * 64; - if (memcmp (raw, p, 16) != 0) + + /* ZMMH sub-register. */ + collect_register (regcache, i + zmm16h_regnum, raw); + if (memcmp (raw, p + 32, 32) != 0) + { + xstate_bv |= X86_XSTATE_ZMM; + memcpy (p, raw, 32); + } + + /* YMMH sub-register. */ + collect_register (regcache, i + ymm16h_regnum, raw); + if (memcmp (raw, p + 16, 16) != 0) { xstate_bv |= X86_XSTATE_ZMM; memcpy (p, raw, 16); } - } - } - /* Check if any YMMH_AVX512 registers are changed. */ - if ((x86_xcr0 & X86_XSTATE_ZMM)) - { - int ymmh_avx512_regnum = (num_avx512_ymmh_registers == 0 - ? -1 - : find_regno (regcache->tdesc, "ymm16h")); - - for (i = 0; i < num_avx512_ymmh_registers; i++) - { - collect_register (regcache, i + ymmh_avx512_regnum, raw); - p = fp->zmm_space () + 16 + i * 64; + /* XMM sub-register. */ + collect_register (regcache, i + xmm16_regnum, raw); if (memcmp (raw, p, 16) != 0) { xstate_bv |= X86_XSTATE_ZMM; @@ -732,12 +702,8 @@ i387_xsave_to_cache (struct regcache *regcache, const void *buf) /* Amd64 has 16 xmm regs; I386 has 8 xmm regs. */ int num_xmm_registers = amd64 ? 16 : 8; - /* AVX512 extends the existing xmm/ymm registers to a wider mode: zmm. */ - int num_avx512_zmmh_low_registers = num_xmm_registers; - /* AVX512 adds 16 extra regs in Amd64 mode, but none in I386 mode.*/ - int num_avx512_zmmh_high_registers = amd64 ? 16 : 0; - int num_avx512_ymmh_registers = amd64 ? 16 : 0; - int num_avx512_xmm_registers = amd64 ? 16 : 0; + /* AVX512 adds 16 extra ZMM regs in Amd64 mode, but none in I386 mode.*/ + int num_zmm_high_registers = amd64 ? 16 : 0; /* The supported bits in `xstat_bv' are 8 bytes. Clear part in vector registers if its bit in xstat_bv is zero. */ @@ -854,47 +820,41 @@ i387_xsave_to_cache (struct regcache *regcache, const void *buf) if ((clear_bv & X86_XSTATE_ZMM_H) != 0) { - for (i = 0; i < num_avx512_zmmh_low_registers; i++) + for (i = 0; i < num_xmm_registers; i++) supply_register_zeroed (regcache, i + zmm0h_regnum); } else { p = fp->zmmh_space (); - for (i = 0; i < num_avx512_zmmh_low_registers; i++) + for (i = 0; i < num_xmm_registers; i++) supply_register (regcache, i + zmm0h_regnum, p + i * 32); } } - if ((x86_xcr0 & X86_XSTATE_ZMM) != 0) + if ((x86_xcr0 & X86_XSTATE_ZMM) != 0 && num_zmm_high_registers != 0) { - int zmm16h_regnum = (num_avx512_zmmh_high_registers == 0 - ? -1 - : find_regno (regcache->tdesc, "zmm16h")); - int ymm16h_regnum = (num_avx512_ymmh_registers == 0 - ? -1 - : find_regno (regcache->tdesc, "ymm16h")); - int xmm16_regnum = (num_avx512_xmm_registers == 0 - ? -1 - : find_regno (regcache->tdesc, "xmm16")); + int zmm16h_regnum = find_regno (regcache->tdesc, "zmm16h"); + int ymm16h_regnum = find_regno (regcache->tdesc, "ymm16h"); + int xmm16_regnum = find_regno (regcache->tdesc, "xmm16"); if ((clear_bv & X86_XSTATE_ZMM) != 0) { - for (i = 0; i < num_avx512_zmmh_high_registers; i++) - supply_register_zeroed (regcache, i + zmm16h_regnum); - for (i = 0; i < num_avx512_ymmh_registers; i++) - supply_register_zeroed (regcache, i + ymm16h_regnum); - for (i = 0; i < num_avx512_xmm_registers; i++) - supply_register_zeroed (regcache, i + xmm16_regnum); + for (i = 0; i < num_zmm_high_registers; i++) + { + supply_register_zeroed (regcache, i + zmm16h_regnum); + supply_register_zeroed (regcache, i + ymm16h_regnum); + supply_register_zeroed (regcache, i + xmm16_regnum); + } } else { p = fp->zmm_space (); - for (i = 0; i < num_avx512_zmmh_high_registers; i++) - supply_register (regcache, i + zmm16h_regnum, p + 32 + i * 64); - for (i = 0; i < num_avx512_ymmh_registers; i++) - supply_register (regcache, i + ymm16h_regnum, p + 16 + i * 64); - for (i = 0; i < num_avx512_xmm_registers; i++) - supply_register (regcache, i + xmm16_regnum, p + i * 64); + for (i = 0; i < num_zmm_high_registers; i++) + { + supply_register (regcache, i + zmm16h_regnum, p + 32 + i * 64); + supply_register (regcache, i + ymm16h_regnum, p + 16 + i * 64); + supply_register (regcache, i + xmm16_regnum, p + i * 64); + } } } -- 2.40.0