From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from st43p00im-ztfb10071701.me.com (st43p00im-ztfb10071701.me.com [17.58.63.173]) by sourceware.org (Postfix) with ESMTPS id B39793858D39 for ; Tue, 18 Apr 2023 21:41:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B39793858D39 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=icloud.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=icloud.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=1a1hai; t=1681854117; bh=4PNUcQlpT26dAQESn0oqv6ESfWSCfTHN0tt1Xu+VOhI=; h=Content-Type:Mime-Version:Subject:From:Date:Message-Id:To; b=o0VzPCyftWSdWUpwsiqunjY3namKh8AXzEeknLNUpHWg6YWO33B9diATYc3pfjaNc 2ARa/zqPbPr4XqLHGNuDBl7pu6Fbq7ZgLBguXNU81xLkOl8xKcr3IgtYTYAlGJjNu3 TAWwH+y+WUd0ENnA9CMkzm9ppeTrP1bHtqkZhLE31qDnRdP9Xvq6HESvDE9VQOnjkd ZPfdZBtuA4OjRiQHtBX6P9h7/WoQnBVD/iPGZt4brmtT/f0sxhvNB+IlFyxEJHkuHz bsC+wwz+kTsvZwFmSHiHFdJKOMHsT/j9AGTlc/1cneITCfWLYPy4UlnsdiOGkr5XWf /OewWxKdtl5dw== Received: from smtpclient.apple (st43p00im-dlb-asmtp-mailmevip.me.com [17.42.251.41]) by st43p00im-ztfb10071701.me.com (Postfix) with ESMTPSA id 4BD04A0F8E; Tue, 18 Apr 2023 21:41:56 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: [PATCH] aarch64: Add the scheduling model for Neoverse N1 From: Evandro Menezes In-Reply-To: <8E0E3524-094D-43CD-93B1-B99D26ABD724@icloud.com> Date: Tue, 18 Apr 2023 16:41:12 -0500 Cc: Richard Sandiford , Kyrylo Tkachov Reply-To: evandro+gcc-patches@gcc.gnu.org Content-Transfer-Encoding: quoted-printable Message-Id: <4F18DDA2-F71C-45FB-A927-7B5D2CA586B4@icloud.com> References: <8E0E3524-094D-43CD-93B1-B99D26ABD724@icloud.com> To: gcc-patches@gcc.gnu.org X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Proofpoint-ORIG-GUID: iqDlm1j87HpQwwiUY9u2GPHQlnebxvo1 X-Proofpoint-GUID: iqDlm1j87HpQwwiUY9u2GPHQlnebxvo1 X-Proofpoint-Virus-Version: =?UTF-8?Q?vendor=3Dfsecure_engine=3D1.1.170-22c6f66c430a71ce266a39bfe25bc?= =?UTF-8?Q?2903e8d5c8f:6.0.138,18.0.883,17.11.64.514.0000000_definitions?= =?UTF-8?Q?=3D2022-06-21=5F08:2020-02-14=5F02,2022-06-21=5F08,2022-02-23?= =?UTF-8?Q?=5F01_signatures=3D0?= X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 suspectscore=0 bulkscore=0 malwarescore=0 phishscore=0 mlxlogscore=999 clxscore=1011 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2304180151 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch adds the scheduling model for Neoverse N1, based on the = information from the "Arm Neoverse N1 Software Optimization Guide=E2=80=9D= . --=20 Evandro Menezes = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D gcc/ChangeLog: * config/aarch64/aarch64-cores.def: Use the Neoverse N1 = scheduling model. * config/aarch64/aarch64.md: Include `neoverse-n1.md`. * config/aarch64/neoverse-n1.md: New file. Signed-off-by: Evandro Menezes --- gcc/config/aarch64/aarch64-cores.def | 2 +- gcc/config/aarch64/aarch64.md | 1 + gcc/config/aarch64/neoverse-n1.md | 711 +++++++++++++++++++++++++++ 3 files changed, 713 insertions(+), 1 deletion(-) create mode 100644 gcc/config/aarch64/neoverse-n1.md diff --git a/gcc/config/aarch64/aarch64-cores.def = b/gcc/config/aarch64/aarch64-cores.def index e352e4077b1..cc842c4e22c 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -116,7 +116,7 @@ AARCH64_CORE("cortex-a65ae", cortexa65ae, = cortexa53, V8_2A, (F16, RCPC, DOTPRO AARCH64_CORE("cortex-x1", cortexx1, cortexa57, V8_2A, (F16, RCPC, = DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1) AARCH64_CORE("cortex-x1c", cortexx1c, cortexa57, V8_2A, (F16, RCPC, = DOTPROD, SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1) AARCH64_CORE("ares", ares, cortexa57, V8_2A, (F16, RCPC, DOTPROD, = PROFILE), cortexa76, 0x41, 0xd0c, -1) -AARCH64_CORE("neoverse-n1", neoversen1, cortexa57, V8_2A, (F16, RCPC, = DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1) +AARCH64_CORE("neoverse-n1", neoversen1, neoversen1, V8_2A, (F16, = RCPC, DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1) AARCH64_CORE("neoverse-e1", neoversee1, cortexa53, V8_2A, (F16, RCPC, = DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1) =20 /* Cavium ('C') cores. */ diff --git a/gcc/config/aarch64/aarch64.md = b/gcc/config/aarch64/aarch64.md index 022eef80bc1..6cb9e31259b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -471,6 +471,7 @@ (include "../arm/cortex-a57.md") (include "../arm/exynos-m1.md") (include "falkor.md") +(include "neoverse-n1.md") (include "saphira.md") (include "thunderx.md") (include "../arm/xgene1.md") diff --git a/gcc/config/aarch64/neoverse-n1.md = b/gcc/config/aarch64/neoverse-n1.md new file mode 100644 index 00000000000..d66fa10c330 --- /dev/null +++ b/gcc/config/aarch64/neoverse-n1.md @@ -0,0 +1,711 @@ +;; Arm Neoverse N1 pipeline description +;; (Based on the "Arm Neoverse N1 Software Optimization Guide") +;; +;; Copyright (C) 2014-2023 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, but +;; WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . + +;; The Neoverse N1 core is modelled as a multiple issue pipeline that = has +;; the following functional units. + +(define_automaton "neoverse_n1") + +;; 1 - Two pipelines for integer operations: SX1, SX2. + +(define_cpu_unit "neon1_sx1_issue" "neoverse_n1") +(define_reservation "neon1_sx1" "neon1_sx1_issue") + +(define_cpu_unit "neon1_sx2_issue" "neoverse_n1") +(define_reservation "neon1_sx2" "neon1_sx2_issue") + +;; 2 - One pipeline for complex integer operations: MX. + +(define_cpu_unit "neon1_mx_issue" + "neoverse_n1") +(define_reservation "neon1_mx" "neon1_mx_issue") +(define_reservation "neon1_m_block" "neon1_mx_issue") + +;; 3 - Two asymmetric pipelines for Neon and FP operations: CX1, CX2. +(define_automaton "neoverse_n1_cx") + +(define_cpu_unit "neon1_cx1_issue" + "neoverse_n1_cx") +(define_cpu_unit "neon1_cx2_issue" + "neoverse_n1_cx") + +(define_reservation "neon1_cx1" "neon1_cx1_issue") +(define_reservation "neon1_cx2" "neon1_cx2_issue") +(define_reservation "neon1_v0_block" "neon1_cx1_issue") + +;; 4 - One pipeline for branch operations: BX. + +(define_cpu_unit "neon1_bx_issue" "neoverse_n1") +(define_reservation "neon1_bx" "neon1_bx_issue") + +;; 5 - Two pipelines for load and store operations: LS1, LS2. + +(define_cpu_unit "neon1_ls1_issue" "neoverse_n1") +(define_reservation "neon1_ls1" "neon1_ls1_issue") + +(define_cpu_unit "neon1_ls2_issue" "neoverse_n1") +(define_reservation "neon1_ls2" "neon1_ls2_issue") + +;; Block all issue queues. + +(define_reservation "neon1_block" "neon1_sx1_issue + neon1_sx2_issue + + neon1_mx_issue + + neon1_cx1_issue + neon1_cx2_issue + + neon1_ls1_issue + neon1_ls2_issue") + +;; Issue groups. + +(define_reservation "neon1_b" "neon1_bx") +(define_reservation "neon1_i" "(neon1_sx1 | neon1_sx2 | neon1_mx)") +(define_reservation "neon1_m" "neon1_mx") +(define_reservation "neon1_d" "(neon1_sx2 | neon1_mx)") +(define_reservation "neon1_l" "(neon1_ls1 | neon1_ls2)") +(define_reservation "neon1_v" "(neon1_cx1 | neon1_cx2)") +(define_reservation "neon1_v0" "neon1_cx1") +(define_reservation "neon1_v1" "neon1_cx2") + +;; Intructions resouces. + +;; Block. +(define_insn_reservation "neoverse_n1_block" 1 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "block")) + "neon1_block") + +;; Branches +;; No latency as there is no result. +(define_insn_reservation "neoverse_n1_branch" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "branch")) + "neon1_b") + +;; Calls +;; No latency as there is no result. +(define_insn_reservation "neoverse_n1_call" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "call")) + "neon1_i + neon1_b") + +;; ALU with no or simple shift. +;; TODO: there should also be "alus_shift_imm_lsl_1to4". +(define_insn_reservation "neoverse_n1_alu" 1 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "alu_imm, alu_shift_imm_lsl_1to4, alu_sreg, \ + alus_imm, alus_sreg, \ + csel, \ + logic_imm, logic_reg, logic_shift_imm, \ + logics_imm, logics_reg, \ + mov_reg")) + "neon1_i") + +;; ALU with extension or complex shift. +;; TODO: there should also be "alus_shift_imm_other". +(define_insn_reservation "neoverse_n1_alu_shift" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "alu_ext, alu_shift_imm_other, alu_shift_reg, \ + alus_shift_imm, alus_shift_reg, \ + logic_shift_reg, logics_shift_imm, = logics_shift_reg, \ + crc")) + "neon1_m") + +;; Miscellaneous ALU. +;; TODO: model 2-register "extr", "bfi", variable shifts. +(define_insn_reservation "neoverse_n1_alu_misc" 1 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "adr, rotate_imm, bfm, clz, mov_imm, rbit, = rev")) + "neon1_i") + +;; Integer divide. +;; Divisions are not pipelined. +(define_insn_reservation "neoverse_n1_div" 12 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "udiv, sdiv")) + "neon1_m, (neon1_m_block * 12)") + +;; Narrow multiply. +(define_insn_reservation "neoverse_n1_mul" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "mla, mul")) + "neon1_m") + +;; Wide multiply. +;; TODO: model multiply high. +(define_insn_reservation "neoverse_n1_mull" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "smull, umull")) + "neon1_m") + +;; Integer load. +(define_insn_reservation "neoverse_n1_ld" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "load_byte, load_4, load_8")) + "neon1_l") + +(define_insn_reservation "neoverse_n1_ld16" 5 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "load_16")) + "neon1_l * 2") + +;; Integer store. +(define_insn_reservation "neoverse_n1_st" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "store_4, store_8")) + "neon1_d, neon1_l") + +(define_insn_reservation "neoverse_n1_stp" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "store_16")) + "neon1_i, (neon1_l * 2)") + +;; FP arithmetic. +(define_insn_reservation "neoverse_n1_fp_alu" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_minmaxd, f_minmaxs, \ + faddd, fadds, \ + fconstd, fconsts, \ + fcsel, \ + ffarithd, ffariths, \ + fmov")) + "neon1_v") + +;; FP compare. +(define_insn_reservation "neoverse_n1_fp_cmp" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "fcmpd, fcmps, fccmpd, fccmps")) + "neon1_v0") + +;; FP round. +(define_insn_reservation "neoverse_n1_fp_rint" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_rintd, f_rints")) + "neon1_v0") + +;; FP divide & square-root. +;; Divisions are not pipelined. +(define_insn_reservation "neoverse_n1_fp_divd" 15 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "fdivd, fsqrtd")) + "neon1_v0, (neon1_v0_block * 15)") + +(define_insn_reservation "neoverse_n1_fp_divs" 10 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "fdivs, fsqrts")) + "neon1_v0, (neon1_v0_block * 10)") + +;; FP multiply. +(define_insn_reservation "neoverse_n1_fp_mul" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "fmuld, fmuls")) + "neon1_v") + +(define_insn_reservation "neoverse_n1_fp_mac" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "fmacd, fmacs")) + "neon1_v") + +;; FP convert. +(define_insn_reservation "neoverse_n1_fp_cvt" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_cvt")) + "neon1_v0") + +(define_insn_reservation "neoverse_n1_fp_cvti2f" 6 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_cvti2f")) + "neon1_m + neon1_v0") + +(define_insn_reservation "neoverse_n1_fp_cvtf2i" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_cvtf2i")) + "neon1_v0 + neon1_v1") + +;; FP move. +(define_insn_reservation "neoverse_n1_fp_mov" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "fconstd, fconsts, \ + fmov")) + "neon1_v") + +(define_insn_reservation "neoverse_n1_fp_movi2f" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_mcr")) + "neon1_m") + +(define_insn_reservation "neoverse_n1_fp_movf2i" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_mrc, \ + neon_to_gp, neon_to_gp_q")) + "neon1_v1") + +;; FP load. +(define_insn_reservation "neoverse_n1_fp_ld" 5 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_loadd, f_loads")) + "neon1_i, neon1_l") + +(define_insn_reservation "neoverse_n1_fp_ldp" 5 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_ldp")) + "neon1_i, (neon1_l * 2)") + +(define_insn_reservation "neoverse_n1_fp_ldp_q" 7 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_ldp_q")) + "neon1_i, (neon1_l * 2)") + +;; FP store. +(define_insn_reservation "neoverse_n1_fp_st" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "f_stored, f_stores")) + "neon1_i, neon1_l") + +(define_insn_reservation "neoverse_n1_fp_stp" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_stp")) + "neon1_l + neon1_v") + +(define_insn_reservation "neoverse_n1_fp_stp_q" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_stp_q")) + "(neon1_l * 2) + neon1_v") + +;; ASIMD arithmetic. +(define_insn_reservation "neoverse_n1_asimd_abd_long" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_abd_long")) + "neon1_v1") + +(define_insn_reservation "neoverse_n1_asimd_alu" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_abd, neon_abd_q, \ + neon_abs, neon_abs_q, \ + neon_add, neon_add_q, \ + neon_add_halve, neon_add_halve_q, \ + neon_add_halve_narrow_q, \ + neon_add_long, neon_add_widen, \ + neon_bsl, neon_bsl_q, \ + neon_cls, neon_cls_q, \ + neon_compare, neon_compare_q, \ + neon_compare_zero, neon_compare_zero_q, \ + neon_dot, neon_dot_q, \ + neon_dup, neon_dup_q, \ + neon_ext, neon_ext_q, \ + neon_ins, neon_ins_q, \ + neon_logic, neon_logic_q, \ + neon_minmax, neon_minmax_q, \ + neon_move, neon_move_q, \ + neon_move_narrow_q, \ + neon_neg, neon_neg_q, \ + neon_permute, neon_permute_q, \ + neon_qabs, neon_qabs_q, \ + neon_qadd, neon_qadd_q, \ + neon_qneg, neon_qneg_q, \ + neon_qsub, neon_qsub_q, \ + neon_rbit, neon_rbit_q, \ + neon_reduc_add, neon_reduc_add_q, \ + neon_rev, neon_rev_q, \ + neon_sub, neon_sub_q, \ + neon_sub_halve, neon_sub_halve_q, \ + neon_sub_halve_narrow_q, \ + neon_sub_widen, neon_sub_long, \ + neon_tbl1, neon_tbl1_q, \ + neon_tbl2, neon_tbl2_q")) + "neon1_v") + +(define_insn_reservation "neoverse_n1_asimd_arith_acc" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_arith_acc")) + "neon1_v1") + +(define_insn_reservation "neoverse_n1_asimd_shift_acc_q" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_shift_acc_q")) + "neon1_v1") + +(define_insn_reservation "neoverse_n1_asimd_reduc" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_reduc_add_long, \ + neon_reduc_minmax, neon_reduc_minmax_q")) + "neon1_v1") + + +;; ASIMD multiply. +(define_insn_reservation "neoverse_n1_asimd_mla" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_mla_b, neon_mla_b_long, \ + neon_mla_h, neon_mla_h_long, \ + neon_mla_h_scalar, neon_mla_h_scalar_long, \ + neon_mla_s, neon_mla_s_long, \ + neon_mla_s_scalar, neon_mla_s_scalar_long")) + "neon1_v0") + +(define_insn_reservation "neoverse_n1_asimd_mla_q" 5 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_mla_b_q, + neon_mla_h_q, neon_mla_h_scalar_q, \ + neon_mla_s_q, neon_mla_s_scalar_q")) + "neon1_v0 * 2") + +(define_insn_reservation "neoverse_n1_asimd_mul" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_mul_b, neon_mul_b_long, \ + neon_mul_h, neon_mul_h_long, \ + neon_mul_s, neon_mul_s_long, + neon_sat_mul_b, neon_sat_mul_b_long, + neon_sat_mul_h, neon_sat_mul_h_long, \ + neon_sat_mul_h_scalar, = neon_sat_mul_h_scalar_long, + neon_sat_mul_s, neon_sat_mul_s_long, \ + neon_sat_mul_s_scalar, = neon_sat_mul_s_scalar_long")) + "neon1_v0") + +(define_insn_reservation "neoverse_n1_asimd_mul_q" 5 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_mul_b_q, neon_mul_h_q, neon_mul_s_q, \ + neon_sat_mul_b_q, \ + neon_sat_mul_h_q, neon_sat_mul_h_scalar_q, \ + neon_sat_mul_s_q, neon_sat_mul_s_scalar_q")) + "neon1_v0 * 2") + +(define_insn_reservation "neoverse_n1_asimd_sat_mla" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_sat_mla_b_long, \ + neon_sat_mla_h_long, = neon_sat_mla_h_scalar_long, \ + neon_sat_mla_s_long, = neon_sat_mla_s_scalar_long")) + "neon1_v0") + +;; ASIMD shift. +(define_insn_reservation "neoverse_n1_asimd_shift" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_shift_imm, neon_shift_imm_q, = neon_shift_imm_long, \ + neon_shift_reg, neon_shift_reg_q")) + "neon1_v1") + +(define_insn_reservation "neoverse_n1_asimd_shift_q" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_sat_shift_imm, neon_sat_shift_imm_q, \ + neon_sat_shift_imm_narrow_q, \ + neon_sat_shift_reg, neon_sat_shift_reg_q, \ + neon_shift_imm_narrow_q")) + "neon1_v1") + +;; ASIMD FP arithmetic. +(define_insn_reservation "neoverse_n1_asimd_fp_alu" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_abd_d, neon_fp_abd_d_q, \ + neon_fp_abd_s, neon_fp_abd_s_q, \ + neon_fp_abs_d, neon_fp_abs_d_q, \ + neon_fp_abs_s, neon_fp_abs_s_q, \ + neon_fp_addsub_d, neon_fp_addsub_d_q, \ + neon_fp_addsub_s, neon_fp_addsub_s_q, \ + neon_fp_compare_d, neon_fp_compare_d_q, \ + neon_fp_compare_s, neon_fp_compare_s_q, \ + neon_fp_minmax_d, neon_fp_minmax_d_q, \ + neon_fp_minmax_s, neon_fp_minmax_s_q, \ + neon_fp_neg_d, neon_fp_neg_d_q, \ + neon_fp_neg_s, neon_fp_neg_s_q, \ + neon_fp_reduc_add_d, neon_fp_reduc_add_d_q, \ + neon_fp_reduc_add_s, neon_fp_reduc_add_s_q")) + "neon1_v") + +(define_insn_reservation "neoverse_n1_asimd_fp_reduc" 5 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_reduc_minmax_d, = neon_fp_reduc_minmax_d_q, \ + neon_fp_reduc_minmax_s, = neon_fp_reduc_minmax_s_q")) + "neon1_v") + +;; ASIMD FP convert. +(define_insn_reservation "neoverse_n1_asimd_cvt" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_cvt_narrow_d_q, \ + neon_fp_cvt_widen_s, \ + neon_fp_to_int_d, neon_fp_to_int_d_q, \ + neon_fp_to_int_s, \ + neon_int_to_fp_d, neon_int_to_fp_d_q, \ + neon_int_to_fp_s, \ + neon_fp_recpe_d, neon_fp_recpe_s, \ + neon_fp_recpx_d, neon_fp_recpx_s, \ + neon_fp_round_d, neon_fp_round_d_q, \ + neon_fp_round_s")) + "neon1_v0") + +(define_insn_reservation "neoverse_n1_asimd_cvt_q" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_cvt_narrow_s_q, \ + neon_fp_cvt_widen_h, \ + neon_fp_to_int_s_q, \ + neon_int_to_fp_s_q, \ + neon_fp_recpe_d_q, neon_fp_recpe_s_q, \ + neon_fp_recpx_d_q, neon_fp_recpx_s_q, \ + neon_fp_round_s_q")) + "neon1_v0 * 2") + +;; ASIMD FP divide & square-root. +;; Divisions are not pipelined. +(define_insn_reservation "neoverse_n1_asimd_fp_divd_q" 15 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_div_d_q")) + "neon1_v0, (neon1_v0_block * 14)") + +(define_insn_reservation "neoverse_n1_asimd_fp_divs" 10 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_div_s")) + "neon1_v0, (neon1_v0_block * 5)") + +(define_insn_reservation "neoverse_n1_asimd_fp_divs_q" 10 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_div_s_q")) + "neon1_v0, (neon1_v0_block * 9)") + +(define_insn_reservation "neoverse_n1_asimd_fp_sqrtd_q" 17 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_sqrt_d_q")) + "neon1_v0, (neon1_v0_block * 16)") + +(define_insn_reservation "neoverse_n1_asimd_fp_sqrts" 10 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_sqrt_s")) + "neon1_v0, (neon1_v0_block * 5)") + +(define_insn_reservation "neoverse_n1_asimd_fp_sqrts_q" 10 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_sqrt_s_q")) + "neon1_v0, (neon1_v0_block * 9)") + +;; ASIMD FP multiply. +(define_insn_reservation "neoverse_n1_asimd_fp_mul" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_mul_d, neon_fp_mul_d_q, = neon_fp_mul_d_scalar_q, \ + neon_fp_mul_s, neon_fp_mul_s_q, = neon_fp_mul_s_scalar_q")) + "neon1_v") + +;; TODO: model the long form. +(define_insn_reservation "neoverse_n1_asimd_fp_mla" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_fp_mla_d, neon_fp_mla_d_q, = neon_fp_mla_d_scalar_q, \ + neon_fp_mla_s, neon_fp_mla_s_q, = neon_fp_mla_s_scalar_q, \ + neon_fp_recps_d, neon_fp_recps_d_q, \ + neon_fp_recps_s, neon_fp_recps_s_q")) + "neon1_v") + +;; ASIMD miscellaneous. +(define_insn_reservation "neoverse_n1_asimd_gp_fp" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_from_gp, neon_from_gp_q")) + "neon1_m") + +;; TODO: model "tbx" fully. +(define_insn_reservation "neoverse_n1_asimd_tbl_3" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_tbl3, neon_tbl3_q")) + "neon1_v * 4") + +(define_insn_reservation "neoverse_n1_asimd_tbl_4" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_tbl4, neon_tbl4_q")) + "neon1_v * 6") + +;; ASIMD load. +(define_insn_reservation "neoverse_n1_asimd_ld_a" 5 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load1_1reg, neon_load1_1reg_q, \ + neon_load1_2reg, neon_load1_2reg_q")) + "neon1_l") + +(define_insn_reservation "neoverse_n1_asimd_ld_b" 6 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load1_3reg, neon_load1_3reg_q")) + "neon1_l * 3") + +(define_insn_reservation "neoverse_n1_asimd_ld_c" 6 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load1_4reg, neon_load1_4reg_q")) + "neon1_l * 4") + +(define_insn_reservation "neoverse_n1_asimd_ld_d" 7 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load1_all_lanes, neon_load1_all_lanes_q, \ + neon_load1_one_lane, neon_load1_one_lane_q")) + "neon1_l + neon1_v") + +(define_insn_reservation "neoverse_n1_asimd_ld_e" 7 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load2_2reg, neon_load2_2reg_q, \ + neon_load2_all_lanes, neon_load2_all_lanes_q, \ + neon_load2_one_lane, neon_load2_one_lane_q")) + "(neon1_l * 2) + neon1_v") + +(define_insn_reservation "neoverse_n1_asimd_ld_f" 8 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load3_3reg, neon_load3_3reg_q, \ + neon_load4_all_lanes, neon_load4_all_lanes_q, \ + neon_load4_one_lane, neon_load4_one_lane_q")) + "(neon1_l * 4) + neon1_v") + +(define_insn_reservation "neoverse_n1_asimd_ld_g" 7 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load3_all_lanes, neon_load3_all_lanes_q, \ + neon_load3_one_lane, neon_load3_one_lane_q")) + "(neon1_l * 4) + neon1_v") + +(define_insn_reservation "neoverse_n1_asimd_ld_h" 8 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load4_4reg")) + "(neon1_l * 7) + neon1_v") + +(define_insn_reservation "neoverse_n1_asimd_ld_i" 10 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_load4_4reg_q")) + "(neon1_l * 10) + neon1_v") + +;; ASIMD store. +(define_insn_reservation "neoverse_n1_asimd_st_a" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store1_1reg, neon_store1_1reg_q, \ + neon_store1_2reg")) + "neon1_v + neon1_l") + +(define_insn_reservation "neoverse_n1_asimd_st_b" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store1_1reg_q, \ + neon_store1_2reg")) + "neon1_v + (neon1_l * 2)") + +(define_insn_reservation "neoverse_n1_asimd_st_c" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store1_2reg_q, \ + neon_store1_4reg")) + "neon1_v + (neon1_l * 4)") + +(define_insn_reservation "neoverse_n1_asimd_st_d" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store1_3reg")) + "neon1_v + (neon1_l * 3)") + +(define_insn_reservation "neoverse_n1_asimd_st_e" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store1_3reg_q")) + "neon1_v + (neon1_l * 6)") + +(define_insn_reservation "neoverse_n1_asimd_st_f" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store1_4reg_q")) + "neon1_v + (neon1_l * 8)") + +(define_insn_reservation "neoverse_n1_asimd_st_g" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store1_one_lane, neon_store1_one_lane_q, \ + neon_store2_2reg, \ + neon_store2_one_lane, neon_store2_one_lane_q")) + "neon1_v + (neon1_l * 2)") + +(define_insn_reservation "neoverse_n1_asimd_st_h" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store2_2reg_q, \ + neon_store3_3reg, \ + neon_store3_one_lane_q")) + "neon1_v + (neon1_l * 4)") + +(define_insn_reservation "neoverse_n1_asimd_st_i" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store3_3reg_q")) + "neon1_v + (neon1_l * 6)") + +(define_insn_reservation "neoverse_n1_asimd_st_j" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store3_one_lane")) + "neon1_v + (neon1_l * 4)") + +(define_insn_reservation "neoverse_n1_asimd_st_k" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store4_4reg")) + "neon1_v + (neon1_l * 6)") + +(define_insn_reservation "neoverse_n1_asimd_st_l" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store4_4reg_q")) + "neon1_v + (neon1_l * 12)") + +(define_insn_reservation "neoverse_n1_asimd_st_m" 0 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "neon_store4_one_lane, neon_store4_one_lane_q")) + "neon1_v + (neon1_l * 3)") + +;; ASIMD crypto. +;; TODO: model different widths. +(define_insn_reservation "neoverse_n1_asimd_aese" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "crypto_aese")) + "neon1_v0") + +(define_insn_reservation "neoverse_n1_asimd_aesmc" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "crypto_aesmc")) + "neon1_v0") + +;; FIXME: "sha256u1" should be "crypto_sha256_fast". +(define_insn_reservation "neoverse_n1_asimd_sha_fast" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "crypto_sha1_fast, crypto_sha1_xor, + crypto_sha256_fast")) + "neon1_v0") + +(define_insn_reservation "neoverse_n1_asimd_sha_slow" 4 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "crypto_sha1_slow, crypto_sha256_slow")) + "neon1_v0") + +;; FIXME: "pmull" sometimes is also = "neon_mul_{b,h,s}(_scalar)?(_(q|long))?" +(define_insn_reservation "neoverse_n1_asimd_poly" 3 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "crypto_pmull")) + "neon1_v0") + +;; CRC +(define_insn_reservation "neoverse_n1_crc" 2 + (and (eq_attr "tune" "neoversen1") + (eq_attr "type" "crc")) + "neon1_m") + +;; Bypasses. + +;; Integer multiply. +;; TODO: model the X and high forms. +(define_bypass 1 "neoverse_n1_mul, neoverse_n1_mull" + "neoverse_n1_mul, neoverse_n1_mull") + +;; FP multiply. +(define_bypass 2 "neoverse_n1_fp_mul" "neoverse_n1_fp_mul") +(define_bypass 2 "neoverse_n1_fp_mac" "neoverse_n1_fp_mac") + +;; ASIMD arithmetic. +(define_bypass 1 "neoverse_n1_asimd_arith_acc" = "neoverse_n1_asimd_arith_acc") +(define_bypass 1 "neoverse_n1_asimd_shift_acc_q" = "neoverse_n1_asimd_shift_acc_q") + +;; ASIMD multiply. +(define_bypass 1 "neoverse_n1_asimd_mla" "neoverse_n1_asimd_mla") +(define_bypass 2 "neoverse_n1_asimd_mla_q" "neoverse_n1_asimd_mla_q") + +;; ASIMD FP multiply. +(define_bypass 2 "neoverse_n1_asimd_fp_mul" "neoverse_n1_asimd_fp_mul") +(define_bypass 2 "neoverse_n1_asimd_fp_mla" "neoverse_n1_asimd_fp_mla") + +;; CRC +(define_bypass 1 "neoverse_n1_crc" "neoverse_n1_*") --=20 2.39.2 (Apple Git-143)