From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 00E323858C35 for ; Thu, 25 Jan 2024 12:08:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 00E323858C35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 00E323858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706184508; cv=none; b=kuhz1Ahuc6KpZRzwSbA5DYGhLKhTzUSnObPJ+qSiiBaWtwPffSHAneBdDUJDuB6QlivdSMMhzinbRsQS0uwe34KojnC01DOLHRYCHih/5sOWQEIttva4QrkMNCHg282yjeZD11QeAsdSJZzSDp/3juv5lSu+/dHo/nNN4SVe7XE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706184508; c=relaxed/simple; bh=Jn8RL7yEWCar1RYCwQ9E3cRFSPsXoj5GSSiQvhfNLqc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=mK7Z+MYgIC9KDhE7gxUnw4bsXnEncrXgtDyvj2n5Sqe4T85ughmTW/wtbq7PhWYma+CiJa7cBUONPi3afDTAKlUS+cSUQPfLnzD2wOEKq/pfgqOgjtW5V6qvRB4PUrF8B1VIuNDKmQKnKcachrTKXd+yZkDvf3oVLLiO3sXIpX8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 29B911FB; Thu, 25 Jan 2024 04:09:09 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2D3A13F73F; Thu, 25 Jan 2024 04:08:24 -0800 (PST) From: Richard Sandiford To: Andrew Pinski Mail-Followup-To: Andrew Pinski ,, richard.sandiford@arm.com Cc: Subject: Re: [PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550] References: <20240124070103.3800874-1-quic_apinski@quicinc.com> Date: Thu, 25 Jan 2024 12:08:22 +0000 In-Reply-To: <20240124070103.3800874-1-quic_apinski@quicinc.com> (Andrew Pinski's message of "Tue, 23 Jan 2024 23:01:03 -0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-21.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Andrew Pinski writes: > The split for movv8di is not ready to handle the case where the setting > register overlaps with the address of the memory that is being load. > Fixing the split than just making the output constraint as an early clobber > for this alternative. The split would first need to figure out which register > is overlapping with the address and then only emit that move last. I was curious how strained that detection would be in practice, and in the end it didn't seem too bad. I pushed the following variant after testing on aarch64-linux-gnu. Thanks, Richard The LS64 movv8di pattern didn't handle loads that overlapped with the address register (unless the overlap happened to be in the last subload). gcc/ PR target/113550 * config/aarch64/aarch64-simd.md: In the movv8di splitter, check whether each split instruction is a load that clobbers the source address. Emit that instruction last if so. gcc/testsuite/ PR target/113550 * gcc.target/aarch64/pr113550.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 18 ++++++-- gcc/testsuite/gcc.target/aarch64/pr113550.c | 48 +++++++++++++++++++++ 2 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113550.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 48f0741e7d0..f036f6ce997 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -8221,11 +8221,21 @@ (define_split || (memory_operand (operands[0], V8DImode) && register_operand (operands[1], V8DImode))) { + std::pair last_pair = {}; for (int offset = 0; offset < 64; offset += 16) - emit_move_insn (simplify_gen_subreg (TImode, operands[0], - V8DImode, offset), - simplify_gen_subreg (TImode, operands[1], - V8DImode, offset)); + { + std::pair pair = { + simplify_gen_subreg (TImode, operands[0], V8DImode, offset), + simplify_gen_subreg (TImode, operands[1], V8DImode, offset) + }; + if (register_operand (pair.first, TImode) + && reg_overlap_mentioned_p (pair.first, pair.second)) + last_pair = pair; + else + emit_move_insn (pair.first, pair.second); + } + if (last_pair.first) + emit_move_insn (last_pair.first, last_pair.second); DONE; } else diff --git a/gcc/testsuite/gcc.target/aarch64/pr113550.c b/gcc/testsuite/gcc.target/aarch64/pr113550.c new file mode 100644 index 00000000000..0ff3c7b5c00 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr113550.c @@ -0,0 +1,48 @@ +/* { dg-options "-O" } */ +/* { dg-do run } */ + +#pragma GCC push_options +#pragma GCC target "+ls64" +#pragma GCC aarch64 "arm_acle.h" +#pragma GCC pop_options + +#define DEF_FUNCTION(NAME, ARGS) \ + __attribute__((noipa)) \ + __arm_data512_t \ + NAME ARGS \ + { \ + return *ptr; \ + } + +DEF_FUNCTION (f0, (__arm_data512_t *ptr)) +DEF_FUNCTION (f1, (int x0, __arm_data512_t *ptr)) +DEF_FUNCTION (f2, (int x0, int x1, __arm_data512_t *ptr)) +DEF_FUNCTION (f3, (int x0, int x1, int x2, __arm_data512_t *ptr)) +DEF_FUNCTION (f4, (int x0, int x1, int x2, int x3, __arm_data512_t *ptr)) +DEF_FUNCTION (f5, (int x0, int x1, int x2, int x3, int x4, + __arm_data512_t *ptr)) +DEF_FUNCTION (f6, (int x0, int x1, int x2, int x3, int x4, int x5, + __arm_data512_t *ptr)) +DEF_FUNCTION (f7, (int x0, int x1, int x2, int x3, int x4, int x5, int x6, + __arm_data512_t *ptr)) + +int +main (void) +{ + __arm_data512_t x = { 0, 10, 20, 30, 40, 50, 60, 70 }; + __arm_data512_t res[8] = + { + f0 (&x), + f1 (0, &x), + f2 (0, 1, &x), + f3 (0, 1, 2, &x), + f4 (0, 1, 2, 3, &x), + f5 (0, 1, 2, 3, 4, &x), + f6 (0, 1, 2, 3, 4, 5, &x), + f7 (0, 1, 2, 3, 4, 5, 6, &x) + }; + for (int i = 0; i < 8; ++i) + if (__builtin_memcmp (&x, &res[i], sizeof (x)) != 0) + __builtin_abort (); + return 0; +} -- 2.25.1