From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id 4D8B63858D20 for ; Wed, 31 May 2023 04:37:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4D8B63858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-52cb8e5e9f5so397129a12.0 for ; Tue, 30 May 2023 21:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685507847; x=1688099847; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PNXMxwM29v7L2AfG8dfa3x7oaka5onWdFVg22uclsjk=; b=TOzk5aSxv3T4KutDGDNNFGqAruqmsAp4LqUczWJbZuiqXjj9Ayx/Ch2e1wPDgDBmkH AyHMRfYk39VBS7l83Yurzv4Xyd40ObW0+fDEx1dMhl+XmDhSBeZF7R07lFq1Lzw3C3qH kXpLXbeymAqqm8vaDm4uz2xlr8NFpc8iuc/XvXe3azKS5OX+McydPppwVQcg3ZZW9Wcj qlh3sERb4k2+cdB85rcfeoOkAM7KmqfL6TEtE2/+kh/E0RowlNJnYcaFcqTq6rXh+8kn ciE9bLLt/9yZgORexonMBplUtUyqFCNzPuttNIlX/AE0s3G8qiP5ox9Wce/+MO4xddfE hvxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685507847; x=1688099847; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PNXMxwM29v7L2AfG8dfa3x7oaka5onWdFVg22uclsjk=; b=JZ6csP/N7ChmWsL5tcq8qKXRKWoHHNZc5akPcGyfl+iOdAWuqxvDMjGluUACsd30Sm MmsgKVgdugXI1FfQiznTXBv7PEWFSye0N+CE2/mcufssCc+TiD+c5F015W3s3QEbhgf1 muLIXkowLU8Ertp+UnxvGBwsPv57fGBtd2Hpjbv3FcKdNk63OKonJRCV2k0NCFrujynV 3f94/5fO0+3f8AqcT657KluxNapsaDrZ2f/yEZgTAtOt6Oszy4h+OvwkXqJF1XA17wty NfJ3/gbRV3AbpKPQK7xoSZQJHHNFQADIl85O8PdBzY6KXm1jRUeVfB17F8lxbKBKpXdQ JqfQ== X-Gm-Message-State: AC+VfDxO00hKohCvwqftLLnDm9JEBqDQ60cOLr9JxPQOpljUJttdPbLy LeP4Eoo97Y7FDGBpaoDHsuwbazyxIbpTu7YIo6E= X-Google-Smtp-Source: ACHHUZ7U2JnMmnCs+pBvLiizDXPSPGhvzKyiDBwPg01p805h6hlxtk7ze7SbILyuJ0AlxagNOdBkLo6RGIN7BD7lxBc= X-Received: by 2002:a17:90a:d204:b0:253:3d30:e6f9 with SMTP id o4-20020a17090ad20400b002533d30e6f9mr4795779pju.15.1685507846955; Tue, 30 May 2023 21:37:26 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Max Filippov Date: Tue, 30 May 2023 21:37:15 -0700 Message-ID: Subject: Re: [PATCH 3/3 v2] xtensa: Optimize 'cstoresi4' insn pattern To: "Takayuki 'January June' Suwa" Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,FROM_LOCAL_NOVOWEL,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Suwa-san, On Tue, May 30, 2023 at 2:51=E2=80=AFAM Takayuki 'January June' Suwa wrote: > > Resubmitting the correct one due to a mistake in merging order of fixes. > --- > This patch introduces more optimized implementations for the 6 cstoresi4 > insn comparison methods (eq/ne/lt/le/gt/ge, however, required TARGET_NSA > for eq). > > gcc/ChangeLog: > > * config/xtensa/xtensa.cc (xtensa_expand_scc): > Add dedicated optimization code for cstoresi4 (eq/ne/gt/ge/lt/le)= . > * config/xtensa/xtensa.md (xtensa_ge_zero): > Rename from '*signed_ge_zero', because it had to be called from > 'xtensa_expand_scc()'. > --- > gcc/config/xtensa/xtensa.cc | 106 ++++++++++++++++++++++++++++++++---- > gcc/config/xtensa/xtensa.md | 2 +- > 2 files changed, 96 insertions(+), 12 deletions(-) This change introduces a bunch of testsuite failures: +FAIL: gcc.c-torture/execute/20070623-1.c -O0 execution test +FAIL: gcc.c-torture/execute/20070623-1.c -O1 execution test +FAIL: gcc.c-torture/execute/20070623-1.c -O2 execution test +FAIL: gcc.c-torture/execute/20070623-1.c -O3 -g execution test +FAIL: gcc.c-torture/execute/20070623-1.c -Os execution test +FAIL: gcc.c-torture/execute/20070623-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: gcc.c-torture/execute/20070623-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test +FAIL: gcc.c-torture/execute/920612-1.c -O0 execution test +FAIL: gcc.c-torture/execute/920612-1.c -O1 execution test +FAIL: gcc.c-torture/execute/920612-1.c -O2 execution test +FAIL: gcc.c-torture/execute/920612-1.c -O3 -g execution test +FAIL: gcc.c-torture/execute/920612-1.c -Os execution test +FAIL: gcc.c-torture/execute/920612-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: gcc.c-torture/execute/int-compare.c -O0 execution test +FAIL: gcc.c-torture/execute/int-compare.c -O1 execution test +FAIL: gcc.c-torture/execute/int-compare.c -O2 execution test +FAIL: gcc.c-torture/execute/int-compare.c -O3 -g execution test +FAIL: gcc.c-torture/execute/int-compare.c -Os execution test +FAIL: gcc.c-torture/execute/int-compare.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: gcc.c-torture/execute/pr28651.c -O0 execution test +FAIL: gcc.c-torture/execute/pr28651.c -O1 execution test +FAIL: gcc.c-torture/execute/pr28651.c -O2 execution test +FAIL: gcc.c-torture/execute/pr28651.c -O3 -g execution test +FAIL: gcc.c-torture/execute/pr28651.c -Os execution test +FAIL: gcc.c-torture/execute/pr28651.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: gcc.c-torture/execute/pr55137.c -O0 execution test +FAIL: gcc.c-torture/execute/pr55137.c -O1 execution test +FAIL: gcc.c-torture/execute/pr55137.c -O2 execution test +FAIL: gcc.c-torture/execute/pr55137.c -O3 -g execution test +FAIL: gcc.c-torture/execute/pr55137.c -Os execution test +FAIL: gcc.c-torture/execute/pr55137.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: gcc.dg/pr61045.c execution test +FAIL: gcc.dg/signbit-6.c execution test +FAIL: c-c++-common/torture/builtin-arith-overflow-12.c -O2 execution te= st +FAIL: c-c++-common/torture/builtin-arith-overflow-12.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: c-c++-common/torture/builtin-arith-overflow-12.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test +FAIL: c-c++-common/torture/builtin-arith-overflow-13.c -O2 execution te= st +FAIL: c-c++-common/torture/builtin-arith-overflow-13.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: c-c++-common/torture/builtin-arith-overflow-13.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test +FAIL: c-c++-common/torture/builtin-arith-overflow-14.c -O2 execution te= st +FAIL: c-c++-common/torture/builtin-arith-overflow-14.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: c-c++-common/torture/builtin-arith-overflow-p-14.c -O2 execution = test +FAIL: c-c++-common/torture/builtin-arith-overflow-p-14.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: gcc.dg/torture/pr49958.c -O0 execution test +FAIL: gcc.dg/torture/pr49958.c -O1 execution test +FAIL: gcc.dg/torture/pr49958.c -O2 execution test +FAIL: gcc.dg/torture/pr49958.c -O3 -g execution test +FAIL: gcc.dg/torture/pr49958.c -Os execution test +FAIL: gcc.dg/torture/pr49958.c -O2 -flto -fno-use-linker-plugin -flto-partition=3Dnone execution test +FAIL: gcc.dg/tree-ssa/pr68714.c (internal compiler error: in decompose, at rtl.h:2297) +FAIL: gcc.dg/tree-ssa/pr68714.c (test for excess errors) +FAIL: gcc.dg/tree-ssa/pr81346-4.c execution test > diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc > index 3b5d25b660a..64efd3d7287 100644 > --- a/gcc/config/xtensa/xtensa.cc > +++ b/gcc/config/xtensa/xtensa.cc > @@ -991,24 +991,108 @@ xtensa_expand_conditional_move (rtx *operands, int= isflt) > int > xtensa_expand_scc (rtx operands[4], machine_mode cmp_mode) > { > - rtx dest =3D operands[0]; > - rtx cmp; > - rtx one_tmp, zero_tmp; > + rtx dest =3D operands[0], op0 =3D operands[2], op1 =3D operands[3]; > + enum rtx_code code =3D GET_CODE (operands[1]); > + rtx cmp, tmp0, tmp1; > rtx (*gen_fn) (rtx, rtx, rtx, rtx, rtx); > > - if (!(cmp =3D gen_conditional_move (GET_CODE (operands[1]), cmp_mode, > - operands[2], operands[3]))) > - return 0; > + /* Dedicated optimizations for cstoresi4. > + a. In a magnitude comparison operator, swapping both sides and > + inverting magnitude does not change the result, > + eg. '(x >=3D y) !=3D (y <=3D x)' is a constant of zero > + (GE is changed to LE, not LT). > + b. Due to room for further optimization, we use subtraction rather > + than XOR (the default for RTL expansion of EQ/NE) as the binary > + operation which is zero if both sides are the same and non-zero > + otherwise. */ > + if (cmp_mode =3D=3D SImode) > + switch (code) > + { > + /* EQ(op0, op1) :=3D clz(op0 - op1) / 32 [requires TARGET_NSA] */ > + case EQ: > + if (!TARGET_NSA) > + break; > + /* EQ to EQZ conversion by subtracting op1 from op0. */ > + emit_move_insn (dest, > + expand_binop (SImode, sub_optab, op0, op1, > + 0, 0, OPTAB_LIB_WIDEN)); > + /* NSAU instruction will return 32 iff the source is zero, > + zero through 31 otherwise (See Xtensa ISA Reference Manual, > + p. 462) */ > + emit_insn (gen_clzsi2 (dest, dest)); > + emit_insn (gen_lshrsi3 (dest, dest, GEN_INT (5))); > + return 1; > + > + /* NE(op0, op1) :=3D (op0 - op1) =3D=3D 0 ? 0 : 1 */ > + case NE: > + /* NE to NEZ conversion by subtracting op1 from op0. */ > + emit_move_insn (tmp0 =3D gen_reg_rtx (SImode), > + expand_binop (SImode, sub_optab, op0, op1, > + 0, 0, OPTAB_LIB_WIDEN)); > + emit_move_insn (dest, const_true_rtx); > + emit_move_insn (dest, > + gen_rtx_fmt_eee (IF_THEN_ELSE, SImode, > + gen_rtx_fmt_ee (EQ, VOIDmode, > + tmp0, const0_rtx= ), > + tmp0, dest)); > + return 1; > + > + case LE: > + if (REG_P (op1)) > + { > + /* LE to GE conversion by swapping both sides. */ > + tmp0 =3D op0, op0 =3D op1, op1 =3D tmp0; > + goto case_GE_reg; > + } > + /* LE to LT conversion by adding one to op1. */ > + op1 =3D GEN_INT (INTVAL (op1) + 1); > + /* fallthru */ > + > + /* LT(op0, op1) :=3D (unsigned)(op0 - op1) >> 31 */ This doesn't work (as demonstrated by the gcc.c-torture/execute/20070623-1.= c) when an overflow occurs, e.g. for op0 =3D=3D INT_MIN, op1 =3D=3D INT_MAX. Maybe the dedicated instructions salt / saltu could be used in that pattern= ? They don't have their own XCHAL_* macros, but according to the ISA book they were introduced in RG-2015.0, which I believe could be tested as follo= ws: #define TARGET_SALT (XTENSA_MARCH_EARLIEST >=3D 270000) --=20 Thanks. -- Max