From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 681243858D39 for ; Wed, 29 May 2024 15:44:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 681243858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 681243858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716997479; cv=none; b=c0LFgawx7rW9j/uPta3j4sctruHQwFnmPpd2W/jzE08EN/ILyh5ig3xVnF/WdUKYJKbJNT5fKgIFhcuyHNOS+WOEzQQnp2mCAzbZK2KOzAUakUdwsbKf8vGRc+Y+aac8W+1CtPXUW/0H5ECs5PcbrUF0WAYPV8Ee59ETW0Ca3CA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716997479; c=relaxed/simple; bh=clrKd4rZNomq+2RsFU6lYG+E8pKGxqLqufsh6Xyr6j8=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=iOUGCZv6JQp4iQCFcvI7p9eJ0PMigjyyleop4LDv39sqGNx096hpPnnHAQy85XrBcyQpgF5uvMcVMo2CNXSKuw74IS2SiwiWZGJy/M5g64QW7/dOM/mBXcSokGN1GBwRZ7/DqgCnvKwiAB8FUbPhmUGmSjdFbdVf+6ZfqNp5340= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2B032339 for ; Wed, 29 May 2024 08:45:01 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 990BF3F762 for ; Wed, 29 May 2024 08:44:36 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] aarch64: Split aarch64_combinev16qi before RA [PR115258] Date: Wed, 29 May 2024 16:44:35 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-20.3 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose purpose is to put the two input data vectors into consecutive registers. This aarch64_combinev16qi was then split after reload into individual moves (from the first input to the first half of the output, and from the second input to the second half of the output). In the worst case, the RA might allocate things so that the destination of the aarch64_combinev16qi is the second input followed by the first input. In that case, the split form of aarch64_combinev16qi uses three eors to swap the registers around. This PR is about a test where this worst case occurred. And given the insn description, that allocation doesn't semm unreasonable. early-ra should (hopefully) mean that we're now better at allocating subregs of vector registers. The upcoming RA subreg patches should improve things further. The best fix for the PR therefore seems to be to split the combination before RA, so that the RA can see the underlying moves. Perhaps it even makes sense to do this at expand time, avoiding the need for aarch64_combinev16qi entirely. That deserves more experimentation though. Tested on aarch64-linux-gnu & pushed. Richard gcc/ PR target/115258 * config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow the split before reload. * config/aarch64/aarch64.cc (aarch64_split_combinev16qi): Generalize into a form that handles pseudo registers. gcc/testsuite/ PR target/115258 * gcc.target/aarch64/pr115258.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 2 +- gcc/config/aarch64/aarch64.cc | 29 ++++++++++----------- gcc/testsuite/gcc.target/aarch64/pr115258.c | 19 ++++++++++++++ 3 files changed, 34 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr115258.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c311888e4bd..868f4486218 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -8474,7 +8474,7 @@ (define_insn_and_split "aarch64_combinev16qi" UNSPEC_CONCAT))] "TARGET_SIMD" "#" - "&& reload_completed" + "&& 1" [(const_int 0)] { aarch64_split_combinev16qi (operands); diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index ee12d8897a8..13191ec8e34 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -25333,27 +25333,26 @@ aarch64_output_sve_ptrues (rtx const_unspec) void aarch64_split_combinev16qi (rtx operands[3]) { - unsigned int dest = REGNO (operands[0]); - unsigned int src1 = REGNO (operands[1]); - unsigned int src2 = REGNO (operands[2]); machine_mode halfmode = GET_MODE (operands[1]); - unsigned int halfregs = REG_NREGS (operands[1]); - rtx destlo, desthi; gcc_assert (halfmode == V16QImode); - if (src1 == dest && src2 == dest + halfregs) + rtx destlo = simplify_gen_subreg (halfmode, operands[0], + GET_MODE (operands[0]), 0); + rtx desthi = simplify_gen_subreg (halfmode, operands[0], + GET_MODE (operands[0]), + GET_MODE_SIZE (halfmode)); + + bool skiplo = rtx_equal_p (destlo, operands[1]); + bool skiphi = rtx_equal_p (desthi, operands[2]); + + if (skiplo && skiphi) { /* No-op move. Can't split to nothing; emit something. */ emit_note (NOTE_INSN_DELETED); return; } - /* Preserve register attributes for variable tracking. */ - destlo = gen_rtx_REG_offset (operands[0], halfmode, dest, 0); - desthi = gen_rtx_REG_offset (operands[0], halfmode, dest + halfregs, - GET_MODE_SIZE (halfmode)); - /* Special case of reversed high/low parts. */ if (reg_overlap_mentioned_p (operands[2], destlo) && reg_overlap_mentioned_p (operands[1], desthi)) @@ -25366,16 +25365,16 @@ aarch64_split_combinev16qi (rtx operands[3]) { /* Try to avoid unnecessary moves if part of the result is in the right place already. */ - if (src1 != dest) + if (!skiplo) emit_move_insn (destlo, operands[1]); - if (src2 != dest + halfregs) + if (!skiphi) emit_move_insn (desthi, operands[2]); } else { - if (src2 != dest + halfregs) + if (!skiphi) emit_move_insn (desthi, operands[2]); - if (src1 != dest) + if (!skiplo) emit_move_insn (destlo, operands[1]); } } diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258.c b/gcc/testsuite/gcc.target/aarch64/pr115258.c new file mode 100644 index 00000000000..9a489d4604c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr115258.c @@ -0,0 +1,19 @@ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** fun: +** (ldr|adrp) [^\n]+ +** (ldr|adrp) [^\n]+ +** (ldr|adrp) [^\n]+ +** (ldr|adrp) [^\n]+ +** tbl v[0-9]+.16b, {v[0-9]+.16b - v[0-9]+.16b}, v[0-9]+.16b +** str [^\n]+ +** ret +*/ +typedef int veci __attribute__ ((vector_size (4 * sizeof (int)))); +void fun (veci *a, veci *b, veci *c) { + *c = __builtin_shufflevector (*a, *b, 0, 5, 2, 7); +} + +/* { dg-final { scan-assembler-not {\teor\t} } } */ -- 2.25.1