From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by sourceware.org (Postfix) with ESMTPS id 3E3FC3858D20 for ; Mon, 22 Jan 2024 14:07:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3E3FC3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3E3FC3858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::135 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705932485; cv=none; b=Xa1wC2sWGwuI9jZIsAlBMheZxr6TL2GM3sD5DIGcWTffz6HkEax4u85O2Jg8XU5NTgfOgJVEM1SJ5yxWwaYdJAqSOlJLmoBzxDIwRfoP9HicQmSvixP9nx/g/6owP40WY31UR0U2d6BgSawZPUwN7qVu1FHQVgMRaHVPsbE5PbY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705932485; c=relaxed/simple; bh=L79mOspXTdGbexvp9kEx8Xxke/zoI4Wg2PPMbLPMgv4=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=gEUVJNYBDMzGF/j0eWlnCjDI1VIlT4VrMgs3iYbki0wnecVFQJSmgBtJuhSIif4TvpVcRxy0y7FCxX29vrMC8VXhFMfmIgpjGEJk+c6qizmE7HwBcuemeM4dT80ilMGpAt/knMC9dEMzdqt694/EjsrzAVZ+uQdev9oyeJuXu4Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-50e759ece35so3032047e87.3 for ; Mon, 22 Jan 2024 06:07:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705932473; x=1706537273; darn=gcc.gnu.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=khVHrOhT/iZueL+UmpsoQ1rJyIV5mW3HPiJ1L0/A5eo=; b=KRetRjGaYHYPp/rL43iPEKJ2m1xGr8nPBgs8bJ7B7Ga4VK8eUcIEL+Zd3xlWniSm7D O8wgFm5HDaf9c9gUzjUep/szzXFsb1YUbFLNT/RdV3ySmMrKS7Y3XgpQRsXZUwMPuJAI 4F6Bu9Jz8OnmfrhOr02o71/zqH/ZDshwlQrvSO62mPzzTzou/hfgBNfJq93Am/oAfEL7 cVlyrpQqh2qR3dT3/qJnftLsPdhWacgl3+TqFnoLWItlmfj5BakDV+H8wWNXVUyiMEKS nDNZ2sQ68tqUlp3rHJd3mjtLwXat9DxaL0JOsI7xIMKrjgjVDCr+7fILeKjXhT4fNnPW J4/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705932473; x=1706537273; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=khVHrOhT/iZueL+UmpsoQ1rJyIV5mW3HPiJ1L0/A5eo=; b=Q7Qa5jInsQ8JzI6TGyq8FAH1HoZ2SWM2Ca5koS403JvOf7+CSsSu17prgJQk3RJQTF oGUHGq+RwEMqn5CzDrhLcdIHINZeQ0ivXbDR7OBAbrkapZAw/olRt85YbbQM3J8bBAfR jH9u64u2L4Js/CuTjhvaDCoEo8VDEBwR9oU8wVoEMtiUar8ArSoer1Al3cCpzb5KYc/R ajPmY8QU3LQ2idbpBEsTwleF8rCrKF6TIbtLXfx7oiKN/sy7jMFSNXCS5yOnF96yObke y/PbDlNBb56t5ADA2af4aXWe8qftJfILscA7fcYxcAUs9q3zmxUmsx71yGkt/BgsCb21 Yuvg== X-Gm-Message-State: AOJu0Yyi9mvKQlMPgS18Hiw88ol06RE/Ou3nkMOmvVzn0o3dQ2y/8BsU VsXYRQM4MUp/y1UIikw0vcCUQ6malGPpn5fNeVbcdMn6i7U4ZpJrWusPeaUhSth8pdtH5HoDGUc UQB20/9rdoyA+lUXyGimLXbwzQ8U= X-Google-Smtp-Source: AGHT+IHpEjswDPUqKVCJYpopmRVvDKNYLmjild1cSJ0re7+uaLBeUepX5ujkVb9JsA1uKBPiPktjNW7rccyXVa/NLXY= X-Received: by 2002:a05:6512:130c:b0:50e:73fe:c4ab with SMTP id x12-20020a056512130c00b0050e73fec4abmr1953793lfu.94.1705932472233; Mon, 22 Jan 2024 06:07:52 -0800 (PST) MIME-Version: 1.0 References: <20240105074330.2309587-1-quic_apinski@quicinc.com> In-Reply-To: From: Richard Biener Date: Mon, 22 Jan 2024 15:06:36 +0100 Message-ID: Subject: Re: Ping: [PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942] To: Andrew Pinski , gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URIBL_BLACK,URIBL_SBL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jan 22, 2024 at 2:24=E2=80=AFPM Richard Sandiford wrote: > > Ping for the expr/cfgexpand bits > > Richard Sandiford writes: > > Andrew Pinski writes: > >> Ccmp is not used if the result of the and/ior is used by both > >> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation > >> here by using ccmp in this case. > >> Two changes is required, first we need to allow the outer statement's > >> result be used more than once. > >> The second change is that during the expansion of the gimple, we need > >> to try using ccmp. This is needed because we don't use expand the ssa > >> name of the lhs but rather expand directly from the gimple. > >> > >> A small note on the ccmp_4.c testcase, we should be able to get slight= ly > >> better than with this patch but it is one extra instruction compared t= o > >> before. > >> > >> Diff from v1: > >> * v2: Split out expand_gimple_assign_ssa so the we only need to handle > >> promotion once. Add ccmp_5.c testcase which was suggested. Change comm= ent > >> on ccmp_candidate_p. > > > > I meant more that we should split out the gassign handling in > > expand_expr_real_1, since we're effectively making cfgexpand follow > > it more closely. What do you think about the attached version? > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. > > > > OK for the expr/cfgexpand bits? OK. Richard. > > Thanks, > > Richard > > > > ---- > > > > Ccmp is not used if the result of the and/ior is used by both > > a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation > > here by using ccmp in this case. > > Two changes is required, first we need to allow the outer statement's > > result be used more than once. > > The second change is that during the expansion of the gimple, we need > > to try using ccmp. This is needed because we don't use expand the ssa > > name of the lhs but rather expand directly from the gimple. > > > > A small note on the ccmp_4.c testcase, we should be able to get slightl= y > > better than with this patch but it is one extra instruction compared to > > before. > > > > PR target/100942 > > > > gcc/ChangeLog: > > > > * ccmp.cc (ccmp_candidate_p): Add outer argument. > > Allow if the outer is true and the lhs is used more > > than once. > > (expand_ccmp_expr): Update call to ccmp_candidate_p. > > * expr.h (expand_expr_real_gassign): Declare. > > * expr.cc (expand_expr_real_gassign): New function, split out fro= m... > > (expand_expr_real_1): ...here. > > * cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassi= gn. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/ccmp_3.c: New test. > > * gcc.target/aarch64/ccmp_4.c: New test. > > * gcc.target/aarch64/ccmp_5.c: New test. > > > > Signed-off-by: Andrew Pinski > > Co-authored-by: Richard Sandiford > > --- > > gcc/ccmp.cc | 12 +-- > > gcc/cfgexpand.cc | 31 ++----- > > gcc/expr.cc | 103 ++++++++++++---------- > > gcc/expr.h | 3 + > > gcc/testsuite/gcc.target/aarch64/ccmp_3.c | 20 +++++ > > gcc/testsuite/gcc.target/aarch64/ccmp_4.c | 35 ++++++++ > > gcc/testsuite/gcc.target/aarch64/ccmp_5.c | 20 +++++ > > 7 files changed, 149 insertions(+), 75 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_5.c > > > > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc > > index 09d6b5595a4..7cb525addf4 100644 > > --- a/gcc/ccmp.cc > > +++ b/gcc/ccmp.cc > > @@ -90,9 +90,10 @@ ccmp_tree_comparison_p (tree t, basic_block bb) > > If all checks OK in expand_ccmp_expr, it emits insns in prep_seq, t= hen > > insns in gen_seq. */ > > > > -/* Check whether G is a potential conditional compare candidate. */ > > +/* Check whether G is a potential conditional compare candidate; OUTER= is true if > > + G is the outer most AND/IOR. */ > > static bool > > -ccmp_candidate_p (gimple *g) > > +ccmp_candidate_p (gimple *g, bool outer =3D false) > > { > > tree lhs, op0, op1; > > gimple *gs0, *gs1; > > @@ -109,8 +110,9 @@ ccmp_candidate_p (gimple *g) > > lhs =3D gimple_assign_lhs (g); > > op0 =3D gimple_assign_rhs1 (g); > > op1 =3D gimple_assign_rhs2 (g); > > - if ((TREE_CODE (op0) !=3D SSA_NAME) || (TREE_CODE (op1) !=3D SSA_NAM= E) > > - || !has_single_use (lhs)) > > + if ((TREE_CODE (op0) !=3D SSA_NAME) || (TREE_CODE (op1) !=3D SSA_NAM= E)) > > + return false; > > + if (!outer && !has_single_use (lhs)) > > return false; > > > > bb =3D gimple_bb (g); > > @@ -284,7 +286,7 @@ expand_ccmp_expr (gimple *g, machine_mode mode) > > rtx_insn *last; > > rtx tmp; > > > > - if (!ccmp_candidate_p (g)) > > + if (!ccmp_candidate_p (g, true)) > > return NULL_RTX; > > > > last =3D get_last_insn (); > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > > index 1db22f0a1a3..381ed2c82d7 100644 > > --- a/gcc/cfgexpand.cc > > +++ b/gcc/cfgexpand.cc > > @@ -3971,37 +3971,18 @@ expand_gimple_stmt_1 (gimple *stmt) > > { > > rtx target, temp; > > bool nontemporal =3D gimple_assign_nontemporal_move_p (assign= _stmt); > > - struct separate_ops ops; > > bool promoted =3D false; > > > > target =3D expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE= ); > > if (GET_CODE (target) =3D=3D SUBREG && SUBREG_PROMOTED_VAR_P = (target)) > > promoted =3D true; > > > > - ops.code =3D gimple_assign_rhs_code (assign_stmt); > > - ops.type =3D TREE_TYPE (lhs); > > - switch (get_gimple_rhs_class (ops.code)) > > - { > > - case GIMPLE_TERNARY_RHS: > > - ops.op2 =3D gimple_assign_rhs3 (assign_stmt); > > - /* Fallthru */ > > - case GIMPLE_BINARY_RHS: > > - ops.op1 =3D gimple_assign_rhs2 (assign_stmt); > > - /* Fallthru */ > > - case GIMPLE_UNARY_RHS: > > - ops.op0 =3D gimple_assign_rhs1 (assign_stmt); > > - break; > > - default: > > - gcc_unreachable (); > > - } > > - ops.location =3D gimple_location (stmt); > > - > > - /* If we want to use a nontemporal store, force the value to > > - register first. If we store into a promoted register, > > - don't directly expand to target. */ > > + /* If we want to use a nontemporal store, force the value to > > + register first. If we store into a promoted register, > > + don't directly expand to target. */ > > temp =3D nontemporal || promoted ? NULL_RTX : target; > > - temp =3D expand_expr_real_2 (&ops, temp, GET_MODE (target), > > - EXPAND_NORMAL); > > + temp =3D expand_expr_real_gassign (assign_stmt, temp, > > + GET_MODE (target), EXPAND_NO= RMAL); > > > > if (temp =3D=3D target) > > ; > > @@ -4013,7 +3994,7 @@ expand_gimple_stmt_1 (gimple *stmt) > > if (CONSTANT_P (temp) && GET_MODE (temp) =3D=3D VOIDmode) > > { > > temp =3D convert_modes (GET_MODE (target), > > - TYPE_MODE (ops.type), > > + TYPE_MODE (TREE_TYPE (lhs)), > > temp, unsignedp); > > temp =3D convert_modes (GET_MODE (SUBREG_REG (target)= ), > > GET_MODE (target), temp, unsign= edp); > > diff --git a/gcc/expr.cc b/gcc/expr.cc > > index dc816bc20fa..f9052a6ff5f 100644 > > --- a/gcc/expr.cc > > +++ b/gcc/expr.cc > > @@ -11035,6 +11035,62 @@ stmt_is_replaceable_p (gimple *stmt) > > return false; > > } > > > > +/* A subroutine of expand_expr_real_1. Expand gimple assignment G, > > + which is known to set an SSA_NAME result. The other arguments are > > + as for expand_expr_real_1. */ > > + > > +rtx > > +expand_expr_real_gassign (gassign *g, rtx target, machine_mode tmode, > > + enum expand_modifier modifier, rtx *alt_rtl, > > + bool inner_reference_p) > > +{ > > + separate_ops ops; > > + rtx r; > > + location_t saved_loc =3D curr_insn_location (); > > + auto loc =3D gimple_location (g); > > + if (loc !=3D UNKNOWN_LOCATION) > > + set_curr_insn_location (loc); > > + tree lhs =3D gimple_assign_lhs (g); > > + ops.code =3D gimple_assign_rhs_code (g); > > + ops.type =3D TREE_TYPE (lhs); > > + switch (get_gimple_rhs_class (ops.code)) > > + { > > + case GIMPLE_TERNARY_RHS: > > + ops.op2 =3D gimple_assign_rhs3 (g); > > + /* Fallthru */ > > + case GIMPLE_BINARY_RHS: > > + ops.op1 =3D gimple_assign_rhs2 (g); > > + > > + /* Try to expand conditonal compare. */ > > + if (targetm.gen_ccmp_first) > > + { > > + gcc_checking_assert (targetm.gen_ccmp_next !=3D NULL); > > + r =3D expand_ccmp_expr (g, TYPE_MODE (ops.type)); > > + if (r) > > + break; > > + } > > + /* Fallthru */ > > + case GIMPLE_UNARY_RHS: > > + ops.op0 =3D gimple_assign_rhs1 (g); > > + ops.location =3D loc; > > + r =3D expand_expr_real_2 (&ops, target, tmode, modifier); > > + break; > > + case GIMPLE_SINGLE_RHS: > > + { > > + r =3D expand_expr_real (gimple_assign_rhs1 (g), target, > > + tmode, modifier, alt_rtl, > > + inner_reference_p); > > + break; > > + } > > + default: > > + gcc_unreachable (); > > + } > > + set_curr_insn_location (saved_loc); > > + if (REG_P (r) && !REG_EXPR (r)) > > + set_reg_attrs_for_decl_rtl (lhs, r); > > + return r; > > +} > > + > > rtx > > expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, > > enum expand_modifier modifier, rtx *alt_rtl, > > @@ -11198,51 +11254,8 @@ expand_expr_real_1 (tree exp, rtx target, mach= ine_mode tmode, > > && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp))) > > g =3D SSA_NAME_DEF_STMT (exp); > > if (g) > > - { > > - rtx r; > > - location_t saved_loc =3D curr_insn_location (); > > - loc =3D gimple_location (g); > > - if (loc !=3D UNKNOWN_LOCATION) > > - set_curr_insn_location (loc); > > - ops.code =3D gimple_assign_rhs_code (g); > > - switch (get_gimple_rhs_class (ops.code)) > > - { > > - case GIMPLE_TERNARY_RHS: > > - ops.op2 =3D gimple_assign_rhs3 (g); > > - /* Fallthru */ > > - case GIMPLE_BINARY_RHS: > > - ops.op1 =3D gimple_assign_rhs2 (g); > > - > > - /* Try to expand conditonal compare. */ > > - if (targetm.gen_ccmp_first) > > - { > > - gcc_checking_assert (targetm.gen_ccmp_next !=3D NULL); > > - r =3D expand_ccmp_expr (g, mode); > > - if (r) > > - break; > > - } > > - /* Fallthru */ > > - case GIMPLE_UNARY_RHS: > > - ops.op0 =3D gimple_assign_rhs1 (g); > > - ops.type =3D TREE_TYPE (gimple_assign_lhs (g)); > > - ops.location =3D loc; > > - r =3D expand_expr_real_2 (&ops, target, tmode, modifier); > > - break; > > - case GIMPLE_SINGLE_RHS: > > - { > > - r =3D expand_expr_real (gimple_assign_rhs1 (g), target, > > - tmode, modifier, alt_rtl, > > - inner_reference_p); > > - break; > > - } > > - default: > > - gcc_unreachable (); > > - } > > - set_curr_insn_location (saved_loc); > > - if (REG_P (r) && !REG_EXPR (r)) > > - set_reg_attrs_for_decl_rtl (SSA_NAME_VAR (exp), r); > > - return r; > > - } > > + return expand_expr_real_gassign (as_a (g), target, tmo= de, > > + modifier, alt_rtl, inner_referen= ce_p); > > > > ssa_name =3D exp; > > decl_rtl =3D get_rtx_for_ssa_name (ssa_name); > > diff --git a/gcc/expr.h b/gcc/expr.h > > index f04f40da6ab..64956f63029 100644 > > --- a/gcc/expr.h > > +++ b/gcc/expr.h > > @@ -302,6 +302,9 @@ extern rtx expand_expr_real_1 (tree, rtx, machine_m= ode, > > enum expand_modifier, rtx *, bool); > > extern rtx expand_expr_real_2 (sepops, rtx, machine_mode, > > enum expand_modifier); > > +extern rtx expand_expr_real_gassign (gassign *, rtx, machine_mode, > > + enum expand_modifier modifier, > > + rtx * =3D nullptr, bool =3D false); > > > > /* Generate code for computing expression EXP. > > An rtx for the computed value is returned. The value is never null= . > > diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_3.c b/gcc/testsuite/= gcc.target/aarch64/ccmp_3.c > > new file mode 100644 > > index 00000000000..a2b47fbee14 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_3.c > > @@ -0,0 +1,20 @@ > > +/* { dg-options "-O2" } */ > > +/* PR target/100942 */ > > + > > +void foo(void); > > +int f1(int a, int b) > > +{ > > + int c =3D a =3D=3D 0 || b =3D=3D 0; > > + if (c) foo(); > > + return c; > > +} > > + > > +/* We should get one cmp followed by ccmp and one cset. */ > > +/* { dg-final { scan-assembler "\tccmp\t" } } */ > > +/* { dg-final { scan-assembler-times "\tcset\t" 1 } } */ > > +/* { dg-final { scan-assembler-times "\tcmp\t" 1 } } */ > > +/* And not get 2 cmps and 2 (or more cset) and orr and a cbnz. */ > > +/* { dg-final { scan-assembler-not "\torr\t" } } */ > > +/* { dg-final { scan-assembler-not "\tcbnz\t" } } */ > > +/* { dg-final { scan-assembler-not "\tcbz\t" } } */ > > + > > diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_4.c b/gcc/testsuite/= gcc.target/aarch64/ccmp_4.c > > new file mode 100644 > > index 00000000000..bc0f57a7c59 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_4.c > > @@ -0,0 +1,35 @@ > > +/* { dg-options "-O2" } */ > > +/* PR target/100942 */ > > + > > +void foo(void); > > +int f1(int a, int b, int d) > > +{ > > + int c =3D a < 8 || b < 9; > > + int e =3D d < 11 || c; > > + if (e) foo(); > > + return c; > > +} > > + > > +/* > > + We really should get: > > + cmp w0, 7 > > + ccmp w1, 8, 4, gt > > + cset w0, le > > + ccmp w2, 10, 4, gt > > + ble .L11 > > + > > + But we currently get: > > + cmp w0, 7 > > + ccmp w1, 8, 4, gt > > + cset w0, le > > + cmp w0, 0 > > + ccmp w2, 10, 4, eq > > + ble .L11 > > + The middle cmp is not needed. > > + */ > > + > > +/* We should end up with only one cmp and 2 ccmp and 1 cset but curren= tly we get 2 cmp > > + though. */ > > +/* { dg-final { scan-assembler-times "\tccmp\t" 2 } } */ > > +/* { dg-final { scan-assembler-times "\tcset\t" 1 } } */ > > +/* { dg-final { scan-assembler-times "\tcmp\t" 1 { xfail *-*-* } } } *= / > > diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_5.c b/gcc/testsuite/= gcc.target/aarch64/ccmp_5.c > > new file mode 100644 > > index 00000000000..7e52ae4f322 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_5.c > > @@ -0,0 +1,20 @@ > > + > > +/* { dg-options "-O2" } */ > > +/* PR target/100942 */ > > +void f1(int a, int b, _Bool *x) > > +{ > > + x[0] =3D x[1] =3D a =3D=3D 0 || b =3D=3D 0; > > +} > > + > > +void f2(int a, int b, int *x) > > +{ > > + x[0] =3D x[1] =3D a =3D=3D 0 || b =3D=3D 0; > > +} > > + > > + > > +/* Both functions should be using ccmp rather than 2 cset/orr. */ > > +/* { dg-final { scan-assembler-times "\tccmp\t" 2 } } */ > > +/* { dg-final { scan-assembler-times "\tcset\t" 2 } } */ > > +/* { dg-final { scan-assembler-times "\tcmp\t" 2 } } */ > > +/* { dg-final { scan-assembler-not "\torr\t" } } */ > > +