From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E82CA3858D20 for ; Thu, 14 Dec 2023 13:46:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E82CA3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E82CA3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702561623; cv=none; b=k524lQhZdDP2c3hR6pN8WHze/3Y+KvuoqfePlVZlMHeuzHvcYriKoD646nEstOyRsvQucKJZkQugr6W7+bRtN0fwaBIbOp9/nVa5VydxA1ruUHbIyVIw5vfZx14hR4+/yzLVzHJIQtIG35i7UQk284Ca7BVv5DGBOdKebFa3vpM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702561623; c=relaxed/simple; bh=jFn5s0NzFLPrLTmarkF9Cp+89ZoxmENsILW7RA6FZzc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=KPV9U7lP+rDu4P+eP0pr+t5ipxKsuf2kg7J0gpL1I6CyeG4C6M5rvZDGIkPvckiGejQ5goCWuj9/li0roJhJHeP+MP2BvQQ6O1j94H/IhVFQ8/y3FIxRA+ZtBRBhdQcFWJ7f6zCEl0iCK+nCsGZ+LSlkzfZ+TUGKTVba72hnZDs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 225ABC15 for ; Thu, 14 Dec 2023 05:47:44 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 018E43F5A1 for ; Thu, 14 Dec 2023 05:46:57 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] aarch64: Improve handling of accumulators in early-ra Date: Thu, 14 Dec 2023 13:46:56 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-21.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Being very simplistic, early-ra just models an allocno's live range as a single interval. This doesn't work well for single-register accumulators that are updated multiple times in a loop, since in SSA form, each intermediate result will be a separate SSA name and will remain separate from the accumulator even after out-of-ssa. This means that in something like: for (;;) { x = x + ...; x = x + ...; } the first definition of x and the second use will be a separate pseudo from the "main" loop-carried pseudo. A real RA would fix this by keeping general, segmented live ranges. But that feels like a slippery slope in this context. This patch instead looks for sharability at a more local level, as described in the comments. It's a bit hackish, but hopefully not too much. The patch also contains some small tweaks that are needed to make the new and existing tests pass: - fix a case where a pseudo that was only moved was wrongly treated as not an FPR candidate - fix some bookkeeping related to is_strong_copy_src - use the number of FPR preferences as a tiebreaker when sorting colors I fully expect that we'll need to be more aggressive at skipping the early-ra allocation. For example, it probably makes sense to refuse any allocation that involves an FPR move. But I'd like to keep collecting examples of where things go wrong first, so that hopefully we can improve the cases with strided registers or structures. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64-early-ra.cc (allocno_info::is_equiv): New member variable. (allocno_info::equiv_allocno): Replace with... (allocno_info::related_allocno): ...this member variable. (allocno_info::chain_prev): Put into an enum with... (allocno_info::last_use_point): ...this new member variable. (color_info::num_fpr_preferences): New member variable. (early_ra::m_shared_allocnos): Likewise. (allocno_info::is_shared): New member function. (allocno_info::is_equiv_to): Likewise. (early_ra::dump_allocnos): Dump sharing information. Tweak column widths. (early_ra::fpr_preference): Check ALLOWS_NONFPR before returning -2. (early_ra::start_new_region): Handle m_shared_allocnos. (early_ra::create_allocno_group): Set related_allocno rather than equiv_allocno. (early_ra::record_allocno_use): Likewise. Detect multiple calls for the same program point. Update last_use_point and is_equiv. Clear is_strong_copy_src rather than is_strong_copy_dest. (early_ra::record_allocno_def): Use related_allocno rather than equiv_allocno. Update last_use_point. (early_ra::valid_equivalence_p): Replace with... (early_ra::find_related_start): ...this new function. (early_ra::record_copy): Look for cases where a destination copy chain can be shared with the source allocno. (early_ra::find_strided_accesses): Update for equiv_allocno-> related_allocno change. Only call consider_strong_copy_src_chain at the head of a copy chain. (early_ra::is_chain_candidate): Skip shared allocnos. Update for new representation of equivalent allocnos. (early_ra::chain_allocnos): Update for new representation of equivalent allocnos. (early_ra::try_to_chain_allocnos): Likewise. (early_ra::merge_fpr_info): New function, split out from... (early_ra::set_single_color_rep): ...here. (early_ra::form_chains): Handle shared allocnos. (early_ra::process_copies): Count the number of FPR preferences. (early_ra::cmp_decreasing_size): Rename to... (early_ra::cmp_allocation_order): ...this. Sort equal-sized groups by the number of FPR preferences. (early_ra::finalize_allocation): Handle shared allocnos. (early_ra::process_region): Reset chain_prev as well as chain_next. gcc/testsuite/ * gcc.target/aarch64/sve/accumulators_1.c: New test. * gcc.target/aarch64/sve/acle/asm/create2_1.c: Allow the moves to be in any order. * gcc.target/aarch64/sve/acle/asm/create3_1.c: Likewise. * gcc.target/aarch64/sve/acle/asm/create4_1.c: Likewise. --- gcc/config/aarch64/aarch64-early-ra.cc | 283 ++++++++++++++---- .../gcc.target/aarch64/sve/accumulators_1.c | 200 +++++++++++++ .../aarch64/sve/acle/asm/create2_1.c | 65 ++++ .../aarch64/sve/acle/asm/create3_1.c | 72 ++--- .../aarch64/sve/acle/asm/create4_1.c | 96 +++--- 5 files changed, 573 insertions(+), 143 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/accumulators_1.c diff --git a/gcc/config/aarch64/aarch64-early-ra.cc b/gcc/config/aarch64/aarch64-early-ra.cc index f05869b5cf2..484db94513d 100644 --- a/gcc/config/aarch64/aarch64-early-ra.cc +++ b/gcc/config/aarch64/aarch64-early-ra.cc @@ -256,6 +256,8 @@ private: struct allocno_info { allocno_group_info *group (); + bool is_shared (); + bool is_equiv_to (unsigned int); // The allocno's unique identifier. unsigned int id; @@ -292,6 +294,10 @@ private: // so that it cannot be tied to the destination of the instruction. unsigned int is_earlyclobbered : 1; + // True if this allocno is known to be equivalent to related_allocno + // for the whole of this allocno's lifetime. + unsigned int is_equiv : 1; + // The inclusive range of program points spanned by the allocno. // START_POINT >= END_POINT. unsigned int start_point; @@ -302,9 +308,15 @@ private: // See callers of record_copy for what counts as a copy. unsigned int copy_dest; - // If this field is not INVALID_ALLOCNO, this allocno is known to be - // equivalent to EQUIV_ALLOCNO for the whole of this allocno's lifetime. - unsigned int equiv_allocno; + // If this field is not INVALID_ALLOCNO, it indicates one of two things: + // + // - if is_equiv, this allocno is equivalent to related_allocno for + // the whole of this allocno's lifetime. + // + // - if !is_equiv, this allocno's live range is a subrange of + // related_allocno's and we have committed to making this allocno + // share whatever register related_allocno uses. + unsigned int related_allocno; union { @@ -319,9 +331,18 @@ private: unsigned int chain_next; }; - // The previous chained allocno in program order (i.e. at higher - // program points), or INVALID_ALLOCNO if none. - unsigned int chain_prev; + union + { + // The program point before start_point at which the allocno was + // last used, or END_OF_REGION if none. This is only used temporarily + // while recording allocnos; after that, chain_prev below is used + // instead. + unsigned int last_use_point; + + // The previous chained allocno in program order (i.e. at higher + // program points), or INVALID_ALLOCNO if none. + unsigned int chain_prev; + }; }; // Information about a full allocno group or a subgroup of it. @@ -380,6 +401,9 @@ private: // The clique's representative group. allocno_group_info *group; + // The number of FPR preferences recorded in fpr_preferences. + unsigned int num_fpr_preferences; + // Weights in favor of choosing each FPR as the first register for GROUP. int8_t fpr_preferences[32]; }; @@ -415,7 +439,7 @@ private: void record_fpr_def (unsigned int); void record_allocno_use (allocno_info *); void record_allocno_def (allocno_info *); - bool valid_equivalence_p (allocno_info *, allocno_info *); + allocno_info *find_related_start (allocno_info *, allocno_info *, bool); void record_copy (rtx, rtx, bool = false); void record_constraints (rtx_insn *); void record_artificial_refs (unsigned int); @@ -431,6 +455,8 @@ private: int rate_chain (allocno_info *, allocno_info *); static int cmp_chain_candidates (const void *, const void *); void chain_allocnos (unsigned int &, unsigned int &); + void merge_fpr_info (allocno_group_info *, allocno_group_info *, + unsigned int); void set_single_color_rep (allocno_info *, allocno_group_info *, unsigned int); void set_color_rep (allocno_group_info *, allocno_group_info *, @@ -445,7 +471,7 @@ private: void process_copies (); - static int cmp_decreasing_size (const void *, const void *); + static int cmp_allocation_order (const void *, const void *); void allocate_colors (); allocno_info *find_independent_subchain (allocno_info *); color_info *find_oldest_color (unsigned int, unsigned int); @@ -527,6 +553,9 @@ private: // All allocnos, by increasing START_POINT. auto_vec m_sorted_allocnos; + // Allocnos for which is_shared is true. + auto_vec m_shared_allocnos; + // All colors, by index. auto_vec m_colors; @@ -704,6 +733,22 @@ early_ra::allocno_info::group () return reinterpret_cast (chain_end - group_size) - 1; } +// Return true if this allocno's live range is a subrange of related_allocno's +// and if we have committed to making this allocno share whatever register +// related_allocno uses. +inline bool +early_ra::allocno_info::is_shared () +{ + return related_allocno != INVALID_ALLOCNO && !is_equiv; +} + +// Return true if this allocno is known to be equivalent to ALLOCNO. +inline bool +early_ra::allocno_info::is_equiv_to (unsigned int allocno) +{ + return is_equiv && related_allocno == allocno; +} + // Return the allocnos in the subgroup. inline array_slice early_ra::allocno_subgroup::allocnos () @@ -859,8 +904,8 @@ early_ra::dump_allocnos () } fprintf (dump_file, "\nAllocno chains:\n"); - fprintf (dump_file, " %5s %12s %12s %5s %5s %5s %5s\n", - "Id", "Regno", "Range ", "Src", "Dest", "Equiv", "FPR"); + fprintf (dump_file, " %5s %12s %12s %6s %5s %5s %6s %5s\n", + "Id", "Regno", "Range ", "Src", "Dest", "Equiv", "Shared", "FPR"); for (unsigned int ai = 0; ai < m_allocnos.length (); ++ai) { auto *allocno = m_allocnos[ai]; @@ -877,7 +922,7 @@ early_ra::dump_allocnos () fprintf (dump_file, " %12s", buffer); snprintf (buffer, sizeof (buffer), "[%d,%d]", allocno->start_point, allocno->end_point); - fprintf (dump_file, " %11s%s %5s", buffer, + fprintf (dump_file, " %11s%s %6s", buffer, allocno->is_earlyclobbered ? "*" : " ", allocno->is_strong_copy_dest ? "Strong" : allocno->is_copy_dest ? "Yes" : "-"); @@ -885,10 +930,14 @@ early_ra::dump_allocnos () fprintf (dump_file, " %5s", "-"); else fprintf (dump_file, " %5d", allocno->copy_dest); - if (allocno->equiv_allocno != INVALID_ALLOCNO) - fprintf (dump_file, " %5d", allocno->equiv_allocno); + if (allocno->is_equiv) + fprintf (dump_file, " %5d", allocno->related_allocno); else fprintf (dump_file, " %5s", "-"); + if (allocno->is_shared ()) + fprintf (dump_file, " %6d", allocno->related_allocno); + else + fprintf (dump_file, " %6s", "-"); if (allocno->hard_regno == FIRST_PSEUDO_REGISTER) fprintf (dump_file, " %5s", "-"); else @@ -1151,7 +1200,7 @@ early_ra::fpr_preference (unsigned int regno) return 3; else if (flags & NEEDS_FPR32) return 2; - else if (!(flags & ALLOWS_FPR32)) + else if (!(flags & ALLOWS_FPR32) && (flags & ALLOWS_NONFPR)) return -2; else if ((flags & HAS_FPR_COPY) && !(flags & HAS_NONFPR_COPY)) return 1; @@ -1230,6 +1279,7 @@ early_ra::start_new_region () m_allocno_copies.truncate (0); m_allocnos.truncate (0); m_sorted_allocnos.truncate (0); + m_shared_allocnos.truncate (0); m_colors.truncate (0); m_insn_ranges.truncate (0); for (auto &fpr_ranges : m_fpr_ranges) @@ -1291,7 +1341,7 @@ early_ra::create_allocno_group (unsigned int regno, unsigned int size) allocno->start_point = END_OF_REGION; allocno->end_point = START_OF_REGION; allocno->copy_dest = INVALID_ALLOCNO; - allocno->equiv_allocno = INVALID_ALLOCNO; + allocno->related_allocno = INVALID_ALLOCNO; allocno->chain_next = INVALID_ALLOCNO; allocno->chain_prev = INVALID_ALLOCNO; m_allocnos.safe_push (allocno); @@ -1423,16 +1473,24 @@ early_ra::record_fpr_def (unsigned int regno) void early_ra::record_allocno_use (allocno_info *allocno) { + if (allocno->start_point == m_current_point) + return; + + gcc_checking_assert (!allocno->is_shared ()); bitmap_set_bit (m_live_allocnos, allocno->id); if (allocno->end_point > m_current_point) { allocno->end_point = m_current_point; allocno->last_def_point = START_OF_REGION; + allocno->last_use_point = END_OF_REGION; } + else + allocno->last_use_point = allocno->start_point; allocno->start_point = m_current_point; allocno->is_copy_dest = false; - allocno->is_strong_copy_dest = false; - allocno->equiv_allocno = INVALID_ALLOCNO; + allocno->is_strong_copy_src = false; + allocno->related_allocno = INVALID_ALLOCNO; + allocno->is_equiv = false; } // Record a definition of the allocno with index AI at the current program @@ -1441,37 +1499,89 @@ early_ra::record_allocno_use (allocno_info *allocno) void early_ra::record_allocno_def (allocno_info *allocno) { + gcc_checking_assert (!allocno->is_shared ()); + allocno->last_use_point = allocno->start_point; allocno->last_def_point = m_current_point; allocno->start_point = m_current_point; allocno->num_defs = MIN (allocno->num_defs + 1, 2); gcc_checking_assert (!allocno->is_copy_dest - && !allocno->is_strong_copy_dest); + && !allocno->is_strong_copy_src); if (!bitmap_clear_bit (m_live_allocnos, allocno->id)) gcc_unreachable (); } -// Return true if a move from SRC_ALLOCNO to DEST_ALLOCNO could be treated -// as an equivalence. -bool -early_ra::valid_equivalence_p (allocno_info *dest_allocno, - allocno_info *src_allocno) +// SRC_ALLOCNO is copied or tied to DEST_ALLOCNO; IS_EQUIV is true if the +// two allocnos are known to be equal. See whether we can mark a chain of +// allocnos ending at DEST_ALLOCNO as related to SRC_ALLOCNO. Return the +// start of the chain if so, otherwise return null. +// +// If IS_EQUIV, a chain that contains just DEST_ALLOCNO should be treated +// as an equivalence. Otherwise the chain should be shared with SRC_ALLOCNO. +// +// Sharing chains are a rather hacky workaround for the fact that we +// don't collect segmented live ranges, and that in the end we want to do +// simple interval graph coloring. +early_ra::allocno_info * +early_ra::find_related_start (allocno_info *dest_allocno, + allocno_info *src_allocno, bool is_equiv) { - if (src_allocno->end_point > dest_allocno->end_point) - // The src allocno dies first. - return false; - - if (src_allocno->num_defs != 0) + allocno_info *res = nullptr; + for (;;) { - if (dest_allocno->end_point < m_current_bb_point) - // We don't currently track enough information to handle multiple - // definitions across basic block boundaries. - return false; + if (src_allocno->end_point > dest_allocno->end_point) + // The src allocno dies first. + return res; - if (src_allocno->last_def_point >= dest_allocno->end_point) - // There is another definition during the destination's live range. - return false; + if (src_allocno->num_defs != 0) + { + if (dest_allocno->end_point < m_current_bb_point) + // We don't currently track enough information to handle multiple + // definitions across basic block boundaries. + return res; + + if (src_allocno->last_def_point >= dest_allocno->end_point) + // There is another definition during the destination's live range. + return res; + } + if (is_equiv) + { + if (dest_allocno->num_defs == 1) + // dest_allocno is equivalent to src_allocno for dest_allocno's + // entire live range. Fall back to that if we can't establish + // a sharing chain. + res = dest_allocno; + } + else + { + if (src_allocno->last_use_point >= dest_allocno->end_point) + // src_allocno is live during dest_allocno's live range, + // and the two allocnos do not necessarily have the same value. + return res; + } + + if (dest_allocno->group_size != 1 + || DF_REG_DEF_COUNT (dest_allocno->group ()->regno) != 1) + // Currently only single allocnos that are defined once can + // share registers with non-equivalent allocnos. This could be + // relaxed, but at the time of writing, aggregates are not valid + // SSA names and so generally only use a single pseudo throughout + // their lifetime. + return res; + + if (dest_allocno->copy_dest == src_allocno->id) + // We've found a complete and valid sharing chain. + return dest_allocno; + + if (dest_allocno->copy_dest == INVALID_ALLOCNO) + return res; + + auto *next_allocno = m_allocnos[dest_allocno->copy_dest]; + if (!is_chain_candidate (dest_allocno, next_allocno)) + return res; + + dest_allocno = next_allocno; + is_equiv = false; } - return dest_allocno->num_defs == 1; } // Record any relevant allocno-related information for an actual or imagined @@ -1558,9 +1668,21 @@ early_ra::record_copy (rtx dest, rtx src, bool from_move_p) src_allocno->hard_regno = dest_allocno->hard_regno; dest_allocno->is_copy_dest = 1; } - else if (from_move_p - && valid_equivalence_p (dest_allocno, src_allocno)) - dest_allocno->equiv_allocno = src_allocno->id; + else if (auto *start_allocno = find_related_start (dest_allocno, + src_allocno, + from_move_p)) + { + auto *next_allocno = dest_allocno; + for (;;) + { + next_allocno->related_allocno = src_allocno->id; + next_allocno->is_equiv = (start_allocno == dest_allocno + && from_move_p); + if (next_allocno == start_allocno) + break; + next_allocno = m_allocnos[next_allocno->copy_dest]; + } + } } } } @@ -1876,13 +1998,13 @@ early_ra::find_strided_accesses () { // This function forms a graph of allocnos, linked by equivalences and // natural copy chains. It temporarily uses chain_next to record the - // reverse of equivalence edges (equiv_allocno) and chain_prev to record + // reverse of equivalence edges (related_allocno) and chain_prev to record // the reverse of copy edges (copy_dest). unsigned int allocno_info::*links[] = { &allocno_info::chain_next, &allocno_info::chain_prev, &allocno_info::copy_dest, - &allocno_info::equiv_allocno + &allocno_info::related_allocno }; // Set up the temporary reverse edges. Check for strong copy chains. @@ -1891,12 +2013,12 @@ early_ra::find_strided_accesses () auto *allocno1 = m_allocnos[i]; if (allocno1->copy_dest != INVALID_ALLOCNO) m_allocnos[allocno1->copy_dest]->chain_prev = allocno1->id; - if (allocno1->equiv_allocno != INVALID_ALLOCNO) - m_allocnos[allocno1->equiv_allocno]->chain_next = allocno1->id; + if (allocno1->related_allocno != INVALID_ALLOCNO) + m_allocnos[allocno1->related_allocno]->chain_next = allocno1->id; if (allocno1->is_strong_copy_src - && (allocno1->is_copy_dest - || !consider_strong_copy_src_chain (allocno1))) + && !allocno1->is_copy_dest + && !consider_strong_copy_src_chain (allocno1)) allocno1->is_strong_copy_src = false; } @@ -2062,11 +2184,14 @@ early_ra::cmp_increasing (const void *allocno1_ptr, const void *allocno2_ptr) bool early_ra::is_chain_candidate (allocno_info *allocno1, allocno_info *allocno2) { - if (allocno1->equiv_allocno != INVALID_ALLOCNO) - allocno1 = m_allocnos[allocno1->equiv_allocno]; + if (allocno2->is_shared ()) + return false; + + if (allocno1->is_equiv) + allocno1 = m_allocnos[allocno1->related_allocno]; if (allocno2->start_point >= allocno1->end_point - && allocno2->equiv_allocno != allocno1->id) + && !allocno2->is_equiv_to (allocno1->id)) return false; if (allocno2->is_strong_copy_dest) @@ -2156,12 +2281,12 @@ early_ra::chain_allocnos (unsigned int &headi1, unsigned int &headi2) && head1->chain_prev == INVALID_ALLOCNO && head2->chain_prev == INVALID_ALLOCNO); - if (head1->equiv_allocno != INVALID_ALLOCNO - && m_allocnos[head1->equiv_allocno]->copy_dest == headi2) + if (head1->is_equiv + && m_allocnos[head1->related_allocno]->copy_dest == headi2) { head1->is_copy_dest = head2->is_copy_dest; head1->is_strong_copy_dest = head2->is_strong_copy_dest; - m_allocnos[head1->equiv_allocno]->copy_dest = headi1; + m_allocnos[head1->related_allocno]->copy_dest = headi1; } head1->chain_next = headi2; head2->chain_prev = headi1; @@ -2170,6 +2295,18 @@ early_ra::chain_allocnos (unsigned int &headi1, unsigned int &headi2) } } +// Add GROUP2's FPR information to GROUP1's, given that GROUP2 starts +// OFFSET allocnos into GROUP2. +void +early_ra::merge_fpr_info (allocno_group_info *group1, + allocno_group_info *group2, + unsigned int offset) +{ + group1->fpr_size = std::max (group1->fpr_size, group2->fpr_size); + group1->fpr_candidates &= (group2->fpr_candidates + >> (offset * group1->stride)); +} + // Set the color representative of ALLOCNO's group to REP, such that ALLOCNO // ends being at allocno offset REP_OFFSET from the start of REP. void @@ -2185,9 +2322,7 @@ early_ra::set_single_color_rep (allocno_info *allocno, allocno_group_info *rep, unsigned int factor = group->stride / rep->stride; gcc_checking_assert (rep_offset >= allocno->offset * factor); group->color_rep_offset = rep_offset - allocno->offset * factor; - rep->fpr_size = std::max (rep->fpr_size, group->fpr_size); - rep->fpr_candidates &= (group->fpr_candidates - >> (group->color_rep_offset * rep->stride)); + merge_fpr_info (rep, group, group->color_rep_offset); } // REP1 and REP2 are color representatives. Change REP1's color representative @@ -2299,7 +2434,7 @@ early_ra::try_to_chain_allocnos (allocno_info *allocno1, auto *head2 = m_allocnos[headi2]; if (head1->chain_next != INVALID_ALLOCNO) return false; - if (head2->equiv_allocno != head1->id + if (!head2->is_equiv_to (head1->id) && head1->end_point <= head2->start_point) return false; } @@ -2429,6 +2564,18 @@ early_ra::form_chains () group1->fpr_candidates &= ~fprs >> allocno1->offset; } + if (allocno1->is_shared ()) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " Allocno %d shares the same hard register" + " as allocno %d\n", allocno1->id, + allocno1->related_allocno); + auto *allocno2 = m_allocnos[allocno1->related_allocno]; + merge_fpr_info (allocno2->group (), group1, allocno2->offset); + m_shared_allocnos.safe_push (allocno1); + continue; + } + // Find earlier allocnos (in processing order) that could be chained // to this one. candidates.truncate (0); @@ -2470,6 +2617,9 @@ early_ra::form_chains () for (unsigned int hi = m_sorted_allocnos.length (); hi-- > 0; ) { auto *allocno = m_sorted_allocnos[hi]; + if (allocno->is_shared ()) + continue; + auto *rep = allocno->group ()->color_rep (); if (rep->has_color) continue; @@ -2582,19 +2732,27 @@ early_ra::process_copies () auto *color = m_colors[group->color_rep ()->color]; color->fpr_preferences[fpr] = MIN (color->fpr_preferences[fpr] + copy.weight, 127); + color->num_fpr_preferences += copy.weight; } } // Compare the colors at *COLOR1_PTR and *COLOR2_PTR and return a <=> -// result that puts colors in order of decreasing size. +// result that puts colors in allocation order. int -early_ra::cmp_decreasing_size (const void *color1_ptr, const void *color2_ptr) +early_ra::cmp_allocation_order (const void *color1_ptr, const void *color2_ptr) { auto *color1 = *(color_info *const *) color1_ptr; auto *color2 = *(color_info *const *) color2_ptr; + // Allocate bigger groups before smaller groups. if (color1->group->size != color2->group->size) return color1->group->size > color2->group->size ? -1 : 1; + + // Allocate groups with stronger FPR preferences before groups with weaker + // FPR preferences. + if (color1->num_fpr_preferences != color2->num_fpr_preferences) + return color1->num_fpr_preferences > color2->num_fpr_preferences ? -1 : 1; + return (color1->id < color2->id ? -1 : color1->id == color2->id ? 0 : 1); } @@ -2610,7 +2768,7 @@ early_ra::allocate_colors () auto_vec sorted_colors; sorted_colors.safe_splice (m_colors); - sorted_colors.qsort (cmp_decreasing_size); + sorted_colors.qsort (cmp_allocation_order); for (unsigned int i = 0; i < 32; ++i) if (!crtl->abi->clobbers_full_reg_p (V0_REGNUM + i)) @@ -2810,12 +2968,16 @@ early_ra::finalize_allocation () { for (auto *allocno : m_allocnos) { + if (allocno->is_shared ()) + continue; auto *group = allocno->group (); auto *rep = group->color_rep (); auto rep_regno = m_colors[rep->color]->hard_regno; auto group_regno = rep_regno + group->color_rep_offset; allocno->hard_regno = group_regno + allocno->offset * group->stride; } + for (auto *allocno : m_shared_allocnos) + allocno->hard_regno = m_allocnos[allocno->related_allocno]->hard_regno; } // Replace any allocno references in REFS with the allocated register. @@ -3094,7 +3256,10 @@ void early_ra::process_region () { for (auto *allocno : m_allocnos) - allocno->chain_next = INVALID_ALLOCNO; + { + allocno->chain_next = INVALID_ALLOCNO; + allocno->chain_prev = INVALID_ALLOCNO; + } if (dump_file && (dump_flags & TDF_DETAILS)) { diff --git a/gcc/testsuite/gcc.target/aarch64/sve/accumulators_1.c b/gcc/testsuite/gcc.target/aarch64/sve/accumulators_1.c new file mode 100644 index 00000000000..bdb97d285ab --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/accumulators_1.c @@ -0,0 +1,200 @@ +/* { dg-options "-O2 -fno-rename-registers" } */ + +#include + +#define ACCUMULATE(VAR, OP) \ + do \ + { \ + VAR = OP (pg, VAR, svld1 (pg, ptr1), svld1 (pg, ptr2)); \ + ptr1 += svcntw (); \ + ptr2 += svcntw (); \ + } \ + while (0) + +svint32_t +f1 (svint32_t x, int32_t *ptr1, int32_t *ptr2) +{ + svbool_t pg = svptrue_b8 (); + for (int i = 0; i < 100; ++i) + ACCUMULATE (x, svmla_x); + return x; +} + +svint32_t +f2 (svint32_t x, int32_t *ptr1, int32_t *ptr2) +{ + svbool_t pg = svptrue_b8 (); + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x, svmla_x); + ACCUMULATE (x, svmls_x); + } + return x; +} + +svint32_t +f3 (svint32_t x, int32_t *ptr1, int32_t *ptr2) +{ + svbool_t pg = svptrue_b8 (); + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x, svmla_x); + ACCUMULATE (x, svmls_x); + ACCUMULATE (x, svmad_x); + ACCUMULATE (x, svmsb_x); + } + return x; +} + +void +f4 (svint32_t x0, svint32_t x1, svint32_t x2, svint32_t x3, + int32_t *ptr1, int32_t *ptr2) +{ + svbool_t pg = svptrue_b8 (); + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmla_x); + ACCUMULATE (x1, svmla_x); + ACCUMULATE (x2, svmla_x); + ACCUMULATE (x3, svmla_x); + } + svst1_vnum (pg, ptr1, 0, x0); + svst1_vnum (pg, ptr1, 1, x1); + svst1_vnum (pg, ptr1, 2, x2); + svst1_vnum (pg, ptr1, 3, x3); +} + +void +f5 (svint32_t x0, svint32_t x1, svint32_t x2, svint32_t x3, + int32_t *ptr1, int32_t *ptr2) +{ + svbool_t pg = svptrue_b8 (); + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmla_x); + ACCUMULATE (x1, svmla_x); + ACCUMULATE (x2, svmla_x); + ACCUMULATE (x3, svmla_x); + } + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmls_x); + ACCUMULATE (x1, svmls_x); + ACCUMULATE (x2, svmls_x); + ACCUMULATE (x3, svmls_x); + } + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmad_x); + ACCUMULATE (x1, svmad_x); + ACCUMULATE (x2, svmad_x); + ACCUMULATE (x3, svmad_x); + } + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmsb_x); + ACCUMULATE (x1, svmsb_x); + ACCUMULATE (x2, svmsb_x); + ACCUMULATE (x3, svmsb_x); + } + svst1_vnum (pg, ptr1, 0, x0); + svst1_vnum (pg, ptr1, 1, x1); + svst1_vnum (pg, ptr1, 2, x2); + svst1_vnum (pg, ptr1, 3, x3); +} + +void +f6 (svint32_t x0, svint32_t x1, svint32_t x2, svint32_t x3, + int32_t *ptr1, int32_t *ptr2) +{ + svbool_t pg = svptrue_b8 (); + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmla_x); + ACCUMULATE (x1, svmla_x); + ACCUMULATE (x2, svmla_x); + ACCUMULATE (x3, svmla_x); + ACCUMULATE (x0, svmls_x); + ACCUMULATE (x1, svmls_x); + ACCUMULATE (x2, svmls_x); + ACCUMULATE (x3, svmls_x); + } + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmad_x); + ACCUMULATE (x1, svmad_x); + ACCUMULATE (x2, svmad_x); + ACCUMULATE (x3, svmad_x); + ACCUMULATE (x0, svmsb_x); + ACCUMULATE (x1, svmsb_x); + ACCUMULATE (x2, svmsb_x); + ACCUMULATE (x3, svmsb_x); + } + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmad_x); + ACCUMULATE (x1, svmad_x); + ACCUMULATE (x2, svmad_x); + ACCUMULATE (x3, svmad_x); + ACCUMULATE (x0, svmsb_x); + ACCUMULATE (x1, svmsb_x); + ACCUMULATE (x2, svmsb_x); + ACCUMULATE (x3, svmsb_x); + } + svst1_vnum (pg, ptr1, 0, x0); + svst1_vnum (pg, ptr1, 1, x1); + svst1_vnum (pg, ptr1, 2, x2); + svst1_vnum (pg, ptr1, 3, x3); +} + +void +f7 (svint32_t x0, svint32_t x1, svint32_t x2, svint32_t x3, + int32_t *ptr1, int32_t *ptr2) +{ + svbool_t pg = svptrue_b8 (); + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmla_x); + ACCUMULATE (x1, svmla_x); + ACCUMULATE (x2, svmla_x); + ACCUMULATE (x3, svmla_x); + ACCUMULATE (x0, svmls_x); + ACCUMULATE (x1, svmls_x); + ACCUMULATE (x2, svmls_x); + ACCUMULATE (x3, svmls_x); + ACCUMULATE (x0, svmad_x); + ACCUMULATE (x1, svmad_x); + ACCUMULATE (x2, svmad_x); + ACCUMULATE (x3, svmad_x); + ACCUMULATE (x0, svmsb_x); + ACCUMULATE (x1, svmsb_x); + ACCUMULATE (x2, svmsb_x); + ACCUMULATE (x3, svmsb_x); + } + for (int i = 0; i < 100; ++i) + { + ACCUMULATE (x0, svmla_x); + ACCUMULATE (x1, svmla_x); + ACCUMULATE (x2, svmla_x); + ACCUMULATE (x3, svmla_x); + ACCUMULATE (x0, svmls_x); + ACCUMULATE (x1, svmls_x); + ACCUMULATE (x2, svmls_x); + ACCUMULATE (x3, svmls_x); + ACCUMULATE (x0, svmad_x); + ACCUMULATE (x1, svmad_x); + ACCUMULATE (x2, svmad_x); + ACCUMULATE (x3, svmad_x); + ACCUMULATE (x0, svmsb_x); + ACCUMULATE (x1, svmsb_x); + ACCUMULATE (x2, svmsb_x); + ACCUMULATE (x3, svmsb_x); + } + svst1_vnum (pg, ptr1, 0, x0); + svst1_vnum (pg, ptr1, 1, x1); + svst1_vnum (pg, ptr1, 2, x2); + svst1_vnum (pg, ptr1, 3, x3); +} + +/* { dg-final { scan-assembler-not {\tmov\tz} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c index 3b9245e199f..7e7d8901d21 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create2_1.c @@ -4,8 +4,13 @@ /* ** create2_s8: +** ( ** mov z0\.d, z6\.d ** mov z1\.d, z4\.d +** | +** mov z1\.d, z4\.d +** mov z0\.d, z6\.d +** ) ** ret */ TEST_CREATE (create2_s8, svint8x2_t, svint8_t, @@ -14,8 +19,13 @@ TEST_CREATE (create2_s8, svint8x2_t, svint8_t, /* ** create2_u8: +** ( ** mov z0\.d, z4\.d ** mov z1\.d, z6\.d +** | +** mov z1\.d, z6\.d +** mov z0\.d, z4\.d +** ) ** ret */ TEST_CREATE (create2_u8, svuint8x2_t, svuint8_t, @@ -24,8 +34,13 @@ TEST_CREATE (create2_u8, svuint8x2_t, svuint8_t, /* ** create2_s16: +** ( ** mov z0\.d, z6\.d ** mov z1\.d, z4\.d +** | +** mov z1\.d, z4\.d +** mov z0\.d, z6\.d +** ) ** ret */ TEST_CREATE (create2_s16, svint16x2_t, svint16_t, @@ -34,8 +49,13 @@ TEST_CREATE (create2_s16, svint16x2_t, svint16_t, /* ** create2_u16: +** ( ** mov z0\.d, z6\.d ** mov z1\.d, z5\.d +** | +** mov z1\.d, z5\.d +** mov z0\.d, z6\.d +** ) ** ret */ TEST_CREATE (create2_u16, svuint16x2_t, svuint16_t, @@ -44,8 +64,13 @@ TEST_CREATE (create2_u16, svuint16x2_t, svuint16_t, /* ** create2_bf16: +** ( ** mov z0\.d, z4\.d ** mov z1\.d, z5\.d +** | +** mov z1\.d, z5\.d +** mov z0\.d, z4\.d +** ) ** ret */ TEST_CREATE (create2_bf16, svbfloat16x2_t, svbfloat16_t, @@ -54,8 +79,13 @@ TEST_CREATE (create2_bf16, svbfloat16x2_t, svbfloat16_t, /* ** create2_f16: +** ( ** mov z0\.d, z4\.d ** mov z1\.d, z5\.d +** | +** mov z1\.d, z5\.d +** mov z0\.d, z4\.d +** ) ** ret */ TEST_CREATE (create2_f16, svfloat16x2_t, svfloat16_t, @@ -64,8 +94,13 @@ TEST_CREATE (create2_f16, svfloat16x2_t, svfloat16_t, /* ** create2_s32: +** ( ** mov z0\.d, z6\.d ** mov z1\.d, z7\.d +** | +** mov z1\.d, z7\.d +** mov z0\.d, z6\.d +** ) ** ret */ TEST_CREATE (create2_s32, svint32x2_t, svint32_t, @@ -74,8 +109,13 @@ TEST_CREATE (create2_s32, svint32x2_t, svint32_t, /* ** create2_u32: +** ( ** mov z0\.d, z7\.d ** mov z1\.d, z5\.d +** | +** mov z1\.d, z5\.d +** mov z0\.d, z7\.d +** ) ** ret */ TEST_CREATE (create2_u32, svuint32x2_t, svuint32_t, @@ -84,8 +124,13 @@ TEST_CREATE (create2_u32, svuint32x2_t, svuint32_t, /* ** create2_f32: +** ( ** mov z0\.d, z7\.d ** mov z1\.d, z4\.d +** | +** mov z1\.d, z4\.d +** mov z0\.d, z7\.d +** ) ** ret */ TEST_CREATE (create2_f32, svfloat32x2_t, svfloat32_t, @@ -94,8 +139,13 @@ TEST_CREATE (create2_f32, svfloat32x2_t, svfloat32_t, /* ** create2_s64: +** ( ** mov z0\.d, z5\.d ** mov z1\.d, z7\.d +** | +** mov z1\.d, z7\.d +** mov z0\.d, z5\.d +** ) ** ret */ TEST_CREATE (create2_s64, svint64x2_t, svint64_t, @@ -104,8 +154,13 @@ TEST_CREATE (create2_s64, svint64x2_t, svint64_t, /* ** create2_u64: +** ( ** mov z0\.d, z7\.d ** mov z1\.d, z6\.d +** | +** mov z1\.d, z6\.d +** mov z0\.d, z7\.d +** ) ** ret */ TEST_CREATE (create2_u64, svuint64x2_t, svuint64_t, @@ -114,8 +169,13 @@ TEST_CREATE (create2_u64, svuint64x2_t, svuint64_t, /* ** create2_f64: +** ( ** mov z0\.d, z5\.d ** mov z1\.d, z4\.d +** | +** mov z1\.d, z4\.d +** mov z0\.d, z5\.d +** ) ** ret */ TEST_CREATE (create2_f64, svfloat64x2_t, svfloat64_t, @@ -132,8 +192,13 @@ TEST_CREATE_B (create2_b_0, svboolx2_t, /* ** create2_b_1: +** ( ** mov p0\.b, p2\.b ** mov p1\.b, p3\.b +** | +** mov p1\.b, p3\.b +** mov p0\.b, p2\.b +** ) ** ret */ TEST_CREATE_B (create2_b_1, svboolx2_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c index 6f1afb7722a..0bea95195b8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create3_1.c @@ -4,9 +4,9 @@ /* ** create3_s8: -** mov z0\.d, z6\.d -** mov z1\.d, z4\.d -** mov z2\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_s8, svint8x3_t, svint8_t, @@ -15,9 +15,9 @@ TEST_CREATE (create3_s8, svint8x3_t, svint8_t, /* ** create3_u8: -** mov z0\.d, z4\.d -** mov z1\.d, z6\.d -** mov z2\.d, z5\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_u8, svuint8x3_t, svuint8_t, @@ -26,9 +26,9 @@ TEST_CREATE (create3_u8, svuint8x3_t, svuint8_t, /* ** create3_s16: -** mov z0\.d, z6\.d -** mov z1\.d, z4\.d -** mov z2\.d, z5\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_s16, svint16x3_t, svint16_t, @@ -37,9 +37,9 @@ TEST_CREATE (create3_s16, svint16x3_t, svint16_t, /* ** create3_u16: -** mov z0\.d, z6\.d -** mov z1\.d, z5\.d -** mov z2\.d, z4\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_u16, svuint16x3_t, svuint16_t, @@ -48,9 +48,9 @@ TEST_CREATE (create3_u16, svuint16x3_t, svuint16_t, /* ** create3_bf16: -** mov z0\.d, z4\.d -** mov z1\.d, z5\.d -** mov z2\.d, z6\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_bf16, svbfloat16x3_t, svbfloat16_t, @@ -59,9 +59,9 @@ TEST_CREATE (create3_bf16, svbfloat16x3_t, svbfloat16_t, /* ** create3_f16: -** mov z0\.d, z4\.d -** mov z1\.d, z5\.d -** mov z2\.d, z6\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_f16, svfloat16x3_t, svfloat16_t, @@ -70,9 +70,9 @@ TEST_CREATE (create3_f16, svfloat16x3_t, svfloat16_t, /* ** create3_s32: -** mov z0\.d, z6\.d -** mov z1\.d, z7\.d -** mov z2\.d, z4\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_s32, svint32x3_t, svint32_t, @@ -81,9 +81,9 @@ TEST_CREATE (create3_s32, svint32x3_t, svint32_t, /* ** create3_u32: -** mov z0\.d, z7\.d -** mov z1\.d, z5\.d -** mov z2\.d, z6\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_u32, svuint32x3_t, svuint32_t, @@ -92,9 +92,9 @@ TEST_CREATE (create3_u32, svuint32x3_t, svuint32_t, /* ** create3_f32: -** mov z0\.d, z7\.d -** mov z1\.d, z4\.d -** mov z2\.d, z6\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_f32, svfloat32x3_t, svfloat32_t, @@ -103,9 +103,9 @@ TEST_CREATE (create3_f32, svfloat32x3_t, svfloat32_t, /* ** create3_s64: -** mov z0\.d, z5\.d -** mov z1\.d, z7\.d -** mov z2\.d, z6\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_s64, svint64x3_t, svint64_t, @@ -114,9 +114,9 @@ TEST_CREATE (create3_s64, svint64x3_t, svint64_t, /* ** create3_u64: -** mov z0\.d, z7\.d -** mov z1\.d, z6\.d -** mov z2\.d, z4\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_u64, svuint64x3_t, svuint64_t, @@ -125,9 +125,9 @@ TEST_CREATE (create3_u64, svuint64x3_t, svuint64_t, /* ** create3_f64: -** mov z0\.d, z5\.d -** mov z1\.d, z4\.d -** mov z2\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create3_f64, svfloat64x3_t, svfloat64_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c index a3866286e4b..b5ffd4e6aaf 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/create4_1.c @@ -4,10 +4,10 @@ /* ** create4_s8: -** mov z0\.d, z6\.d -** mov z1\.d, z4\.d -** mov z2\.d, z7\.d -** mov z3\.d, z5\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_s8, svint8x4_t, svint8_t, @@ -16,10 +16,10 @@ TEST_CREATE (create4_s8, svint8x4_t, svint8_t, /* ** create4_u8: -** mov z0\.d, z4\.d -** mov z1\.d, z6\.d -** mov z2\.d, z5\.d -** mov z3\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_u8, svuint8x4_t, svuint8_t, @@ -28,10 +28,10 @@ TEST_CREATE (create4_u8, svuint8x4_t, svuint8_t, /* ** create4_s16: -** mov z0\.d, z6\.d -** mov z1\.d, z4\.d -** mov z2\.d, z5\.d -** mov z3\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_s16, svint16x4_t, svint16_t, @@ -40,10 +40,10 @@ TEST_CREATE (create4_s16, svint16x4_t, svint16_t, /* ** create4_u16: -** mov z0\.d, z6\.d -** mov z1\.d, z5\.d -** mov z2\.d, z4\.d -** mov z3\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_u16, svuint16x4_t, svuint16_t, @@ -52,10 +52,10 @@ TEST_CREATE (create4_u16, svuint16x4_t, svuint16_t, /* ** create4_bf16: -** mov z0\.d, z4\.d -** mov z1\.d, z5\.d -** mov z2\.d, z6\.d -** mov z3\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_bf16, svbfloat16x4_t, svbfloat16_t, @@ -64,10 +64,10 @@ TEST_CREATE (create4_bf16, svbfloat16x4_t, svbfloat16_t, /* ** create4_f16: -** mov z0\.d, z4\.d -** mov z1\.d, z5\.d -** mov z2\.d, z6\.d -** mov z3\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_f16, svfloat16x4_t, svfloat16_t, @@ -76,10 +76,10 @@ TEST_CREATE (create4_f16, svfloat16x4_t, svfloat16_t, /* ** create4_s32: -** mov z0\.d, z6\.d -** mov z1\.d, z7\.d -** mov z2\.d, z4\.d -** mov z3\.d, z5\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_s32, svint32x4_t, svint32_t, @@ -88,10 +88,10 @@ TEST_CREATE (create4_s32, svint32x4_t, svint32_t, /* ** create4_u32: -** mov z0\.d, z7\.d -** mov z1\.d, z5\.d -** mov z2\.d, z6\.d -** mov z3\.d, z7\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_u32, svuint32x4_t, svuint32_t, @@ -100,10 +100,10 @@ TEST_CREATE (create4_u32, svuint32x4_t, svuint32_t, /* ** create4_f32: -** mov z0\.d, z7\.d -** mov z1\.d, z4\.d -** mov z2\.d, z6\.d -** mov z3\.d, z4\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_f32, svfloat32x4_t, svfloat32_t, @@ -112,10 +112,10 @@ TEST_CREATE (create4_f32, svfloat32x4_t, svfloat32_t, /* ** create4_s64: -** mov z0\.d, z5\.d -** mov z1\.d, z7\.d -** mov z2\.d, z6\.d -** mov z3\.d, z6\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_s64, svint64x4_t, svint64_t, @@ -124,10 +124,10 @@ TEST_CREATE (create4_s64, svint64x4_t, svint64_t, /* ** create4_u64: -** mov z0\.d, z7\.d -** mov z1\.d, z6\.d -** mov z2\.d, z4\.d -** mov z3\.d, z5\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_u64, svuint64x4_t, svuint64_t, @@ -136,10 +136,10 @@ TEST_CREATE (create4_u64, svuint64x4_t, svuint64_t, /* ** create4_f64: -** mov z0\.d, z5\.d -** mov z1\.d, z4\.d -** mov z2\.d, z7\.d -** mov z3\.d, z6\.d +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ +** mov [^\n]+ ** ret */ TEST_CREATE (create4_f64, svfloat64x4_t, svfloat64_t, -- 2.25.1