From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by sourceware.org (Postfix) with ESMTPS id 6F86C3858D3C; Tue, 23 Jan 2024 23:50:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6F86C3858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6F86C3858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1032 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706053861; cv=none; b=QpAEvY9Zqf551vqZ0GW/+AjXEWkaGsnEEMRYOWiNSX7QKtVdTgTOiDBjKMJXCXfe8mc6GsOST/dzvv063qSuBVkmZIM59LC1ZsouDPTznxjC7sZKy7J5EW9/4XlmGHMRfENPyTuTQGK9ueorpJEdc0BVvuenOTx2plaZ8E2U1rY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706053861; c=relaxed/simple; bh=vsJCWNZyGl5wCnZEiFxSyGmKTqy1bFzEPr4k00slNC8=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=etqIdPhk6dXW9AoIN4ZJ2RSqs/q/qGfgP8IB/g2uxSY0TVSe5UniF79efUKbHjscYnDzowxey7AJZTc0LZcOkFJdlYdXAU2wUe+NZvAC/pYu9tkdzucbaKpkd7Vkhviem0aB4nu4Oq1DasRVLq+LVria+HAaduBHmoREZ35uGb0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-29051f5d5e8so2308644a91.0; Tue, 23 Jan 2024 15:50:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706053857; x=1706658657; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gPAvB95/ir664fJAqVlNmwSjRPWX6B5lChqk3+uZtoQ=; b=laVdqniKWT36trIRORTfDDImmJ1v2a+DUSmZDZ4dtQW1tJZuw3+uK/uphdUsO8R0Hj pLoHaEwscRiSJOfI76W9JuZBX6WhOq2bxdptIgRk/qmzTNOAyYPA3MlGovUVwbmF+/dr FTlofHFMc0YkxleUFyKVRS/VfzpWYs3Qj0mfM/26kOBomnn4erhAmVgaK1GvauqFkmAz qYKUxe/yXbjHFhxNOavt1IVxI1wr5inn+0WIz5WLNUcIsAa/OHJJ7TlAQD4gup05gj7B eagwYoliLRX5xU+uGZp94+5Y5X+JgwRXIC6pp3lA8waMIJfz/SMrp411x64L8TaYDInJ gzQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706053857; x=1706658657; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gPAvB95/ir664fJAqVlNmwSjRPWX6B5lChqk3+uZtoQ=; b=I8U+omZq1wRrcL4xItqnh0+ZnNl1v26Zhv/2koQLhQFVLyQsLAqtXu76Clh7mmFblA 8Dnk6TpOYOi92C4MW/ypzHHWuDFdZN5Vh0o5+BX0xy2OVIshq1nN0QtS/IKnnBchZKo6 ffM/Hb7Xx6BkJESRNaBbBHUSUB0KqmKGj9BJkDzGYKzn6fQ6amxhsnUrZk8zanFB4LqL e7/uFBXyJgxsdJM0dQsxLNBI4jA7Fd6hbQWnkW0eCavoS+wUKxK997v/aE/v0OGnMM6t RO8gOgZXyAp+0z21zOseU9ltI1PF/4b8fzA9BrojZ+LDxMh/TbxiEuGLadbF9qi/1I+G uZsQ== X-Gm-Message-State: AOJu0YxvFevdOsTuPtL5tuyysxdXBYwOkW+Y6yR3cG3n5u+MNPrBtJgn hlsXjcHq97KkAp2x+J2GXVNZTeJ6BjqyCNK3e1U2Cff34/xIKbIHMEVhL/h0mveJowtlfdiN1UI 3YWCoq89g5qWPOfpJxkvS5d72qpKC5Kvr X-Google-Smtp-Source: AGHT+IE4a0HgujbpDxB2jP0odC3EWhr4lbP/Zp++xQcLHZ/O4zDfKAheAhiHczoJ35luJu76sJ/Z/ulxNPTiJR5hRnk= X-Received: by 2002:a17:90a:bc82:b0:28e:8dac:e202 with SMTP id x2-20020a17090abc8200b0028e8dace202mr2884175pjr.56.1706053856823; Tue, 23 Jan 2024 15:50:56 -0800 (PST) MIME-Version: 1.0 References: <20240123205721.3EB5A385800B@sourceware.org> In-Reply-To: <20240123205721.3EB5A385800B@sourceware.org> From: Andrew Pinski Date: Tue, 23 Jan 2024 15:50:44 -0800 Message-ID: Subject: Re: [gcc(refs/vendors/vrull/heads/slp-improvements)] aarch64: expand VEC_PERM into ins + uzp[12] To: Philipp Tomsich , "Andrew Pinski (QUIC)" Cc: gcc-cvs@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Jan 23, 2024 at 12:57=E2=80=AFPM Philipp Tomsich via Gcc-cvs wrote: > > https://gcc.gnu.org/g:d61be742513b5b8529ab9ef4022011c471925622 > > commit d61be742513b5b8529ab9ef4022011c471925622 > Author: Manolis Tsamis > Date: Fri Nov 3 14:36:34 2023 +0100 > > aarch64: expand VEC_PERM into ins + uzp[12] > > The AArch64 backend has specific strategies that can be used to expan= d > VEC_PERM expression (see aarch64_expand_vec_perm_const_1). > > The last strategy applied if everything else fails is to use a tbl > instruction, which is known to have very bad latency and performance > (see aarch64_evpc_tbl). There are various improvements and additions > that can be done to the reduce the harmful tbl instructions. Actually NOT all cores a very bad performance with TBL. This definitely needs to be tunable. Thanks, Andrew > > The existing mechanisms work for cases that the permute can be done > with a single existing AArch64 vector instruction, but for x264's > first loop we need some patterns that may need two vector > instructions. > > On x264, this change results in the following change in instruction > distribution: > tbl: 8 -> 0 > ldr: 10 -> 8 (due to the eliminated tbls) > ins: 8 -> 16 > uzp: 8 -> 16 > A reduction of the newly introduced ins/uzp[12] sequences will be > addressed in a follow-on change. > > Ref #344 > > Diff: > --- > gcc/config/aarch64/aarch64.cc | 76 +++++++++++++++++++++++++++++++++++++= ++++++ > 1 file changed, 76 insertions(+) > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.c= c > index e6bd3fd0bb4..0f2423ef7de 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -25890,6 +25890,80 @@ aarch64_evpc_ins (struct expand_vec_perm_d *d) > return true; > } > > +/* Recognize patterns suitable for the an INS + UZP. > + This addresses limited permute optimizations before a more generic se= arch > + algorithm for two operator sequences is implemented. */ > +static bool > +aarch64_evpc_ins_uzp (struct expand_vec_perm_d *d) > +{ > + machine_mode mode =3D d->vmode; > + > + if (d->vec_flags !=3D VEC_ADVSIMD || BYTES_BIG_ENDIAN) > + return false; > + > + unsigned HOST_WIDE_INT nelt =3D d->perm.length ().to_constant (); > + > + if (nelt !=3D 4 > + || !d->perm[0].is_constant() > + || !d->perm[1].is_constant() > + || !d->perm.series_p (0, 2, d->perm[0], 0) > + || !d->perm.series_p (1, 2, d->perm[1], 0)) > + return false; > + > + /* We have a {A, B, A, B} permutation. */ > + HOST_WIDE_INT A =3D d->perm[0].to_constant (); > + HOST_WIDE_INT B =3D d->perm[1].to_constant (); > + > + if (A >=3D nelt || B < nelt || d->op0 =3D=3D d->op1) > + return false; > + > + rtx insv; > + rtx extractv; > + HOST_WIDE_INT idx, extractindex; > + > + /* If A is the first element or B is the second element of a UZP1/2 th= en we > + can emit this permute as INS + UZP . */ > + if (A =3D=3D 0 || A =3D=3D 1) > + { > + insv =3D d->op0; > + extractv =3D d->op1; > + idx =3D A =3D=3D 0 ? 2 : 3; > + extractindex =3D B; > + } > + else if (B =3D=3D nelt + 2 || B =3D=3D nelt + 3) > + { > + insv =3D d->op1; > + extractv =3D d->op0; > + idx =3D B =3D=3D nelt + 2 ? 0 : 1; > + extractindex =3D A; > + } > + else > + return false; > + > + if (d->testing_p) > + return true; > + > + if (extractindex >=3D nelt) > + extractindex -=3D nelt; > + gcc_assert (extractindex < nelt); > + > + /* Emit INS. */ > + insn_code icode =3D code_for_aarch64_simd_vec_copy_lane (mode); > + expand_operand ops[5]; > + create_output_operand (&ops[0], d->target, mode); > + create_input_operand (&ops[1], insv, mode); > + create_integer_operand (&ops[2], 1 << idx); > + create_input_operand (&ops[3], extractv, mode); > + create_integer_operand (&ops[4], extractindex); > + expand_insn (icode, 5, ops); > + > + /* Emit UZP. */ > + emit_set_insn (d->target, gen_rtx_UNSPEC (mode, gen_rtvec (2, d->targe= t, d->target), > + idx & 1 ? UNSPEC_UZP2 : UNSPEC_UZP1= )); > + > + return true; > +} > + > static bool > aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) > { > @@ -25931,6 +26005,8 @@ aarch64_expand_vec_perm_const_1 (struct expand_ve= c_perm_d *d) > return true; > else if (aarch64_evpc_ins (d)) > return true; > + else if (aarch64_evpc_ins_uzp (d)) > + return true; > else if (aarch64_evpc_reencode (d)) > return true;