From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) by sourceware.org (Postfix) with ESMTPS id 588293858D3C for ; Sat, 23 Dec 2023 22:47:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 588293858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 588293858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.167.170 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703371632; cv=none; b=yEBKAQmnPIc8tW+uwDYcA1qNzK6SIwFn53v316E7Adzl1hChC7lUXmArPjLozjcxcQaW0zd1WgYnpJ3baIPg34GRmnw+8HJiA2pwd7OSp5eCXDnmrNcEK4eHwxGo+VKAyuY7qZxukBkilQCtjTZL/Z77oWpNSIRsZ1JGVdhks5c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703371632; c=relaxed/simple; bh=BAg6ioYy6u02/4bm0h5/Al04uItiAap1a2MzOosBfeM=; h=MIME-Version:From:Date:Message-ID:Subject:To; b=nbRBkcPn24yv37FcJXb9++E53pQZKNYtlecMqMyOefD4/QvqDNrVphtCfQgwGoCjUhCVcLhrO0Y1bkwqK4Bcgifg9aU7d0/+YS7HSNGicegGAbenRwN9cmaAvu6cNltF+zyuUdpLBNn/pOBFZAF5+N1ENPiX7uyrTxNYRobnTcY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oi1-f170.google.com with SMTP id 5614622812f47-3ba2e4ff6e1so2241124b6e.3 for ; Sat, 23 Dec 2023 14:47:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703371629; x=1703976429; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TWYdZ05AYeHfytvZSMEF1d4aijpfJfvzbKFPchTsrXw=; b=mZuiMyxou3OnzwAQe3HAwtXrKusfwOdlwqCaeCytr14rpRGqc0DdFpYqvdjdE5+OOH zkvUUbxoOyPR2XSDHM03xDAftfoQlUiqGDiVgoGkjAmBrC3xaNU8DBItzgtAmVGS4SVL 5zbMJ1PVRO/Ul8P2+dcor2zjJkQfR7WBdX7TH6t7T54jxTy5jasWRfzuVKWBAQXpzA5V hwFvCClX5LfR+3aFREZxgvbxFX/R/iCyy4oUBRcrGAPGI8T4OeTUTs68E3/Ol9woVytp kXoESiO6YpbsrazfUMQygcKq0xLyb1+zeVzMJM792S1wkFzDWCLVW+UkAh03pf9V0fRL xx3w== X-Gm-Message-State: AOJu0YxXzp33u1nA/vrJtzOQpgExCjOI7lg5CNUIPn+XsdoA/PD8y9vd 8i7upBDxFKfPmYqH6KozKxhpCH/3e5to2Q== X-Google-Smtp-Source: AGHT+IGckbUk4vd/xiY8qO06lWUsnuH+jkswUkNCXS3HPVGtHrLWbSJqLpGm3NnXcuXwOihCJndpxw== X-Received: by 2002:a05:6808:191c:b0:3bb:7549:8d8c with SMTP id bf28-20020a056808191c00b003bb75498d8cmr4330381oib.82.1703371629256; Sat, 23 Dec 2023 14:47:09 -0800 (PST) Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com. [209.85.210.174]) by smtp.gmail.com with ESMTPSA id q26-20020a62e11a000000b006d994a0bba7sm2506365pfh.189.2023.12.23.14.47.08 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 23 Dec 2023 14:47:09 -0800 (PST) Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6d7750e2265so1504729b3a.3 for ; Sat, 23 Dec 2023 14:47:08 -0800 (PST) X-Received: by 2002:a17:903:2b08:b0:1d4:396b:ec19 with SMTP id mc8-20020a1709032b0800b001d4396bec19mr603627plb.88.1703371628622; Sat, 23 Dec 2023 14:47:08 -0800 (PST) MIME-Version: 1.0 References: <20231223085858.4136369-1-syq@gcc.gnu.org> <04a01582-2bff-496f-95b1-4643b5a2f494@gmail.com> In-Reply-To: <04a01582-2bff-496f-95b1-4643b5a2f494@gmail.com> From: YunQiang Su Date: Sun, 24 Dec 2023 06:46:57 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode To: Jeff Law Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com, pinskia@gmail.com, rguenther@suse.de Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,BODY_8BITS,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPAM_BODY,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jeff Law =E4=BA=8E2023=E5=B9=B412=E6=9C=8824=E6=97= =A5=E5=91=A8=E6=97=A5 00:51=E5=86=99=E9=81=93=EF=BC=9A > > > > On 12/23/23 01:58, YunQiang Su wrote: > > On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) =3D=3D true platform= s, > > if 31 or above bits is polluted by an bitops, we will need an > > truncate. Let's emit one, and mark let's use the same hardreg > > as in and out, the RTL may like: > > > > (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0) > > (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1 > > (nil)) > > > > We use /s/u flags to mark it as really needed, as in > > combine_simplify_rtx, this insn may be considered as truncated, > > so let's skip this combination. > > > > gcc/ChangeLog: > > PR: 104914. > > * combine.cc (try_combine): Skip combine with truncate if > > dest is subreg and has /u/s flags on platforms > > TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) =3D=3D true. > > * expr.cc (expand_assignment): Emit a truncate insn, if > > 31+ bits is polluted for SImode. > > > > gcc/testsuite/ChangeLog: > > PR: 104914. > > * gcc.target/mips/pr104914.c: New testcase. > I would suggest you show the RTL before/after whatever transformation > has caused problems on your target and explain why you think the > transformation is incorrect. > Before this patch, the RTL is like this (insn 19 18 20 2 (set (zero_extract:DI (reg/v:DI 200 [ val ]) (const_int 8 [0x8]) (const_int 24 [0x18])) (subreg:DI (reg:QI 205) 0)) "../xx.c":7:29 -1 (nil)) (insn 20 19 23 2 (set (reg/v:DI 200 [ val ]) (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) "../xx.c":7:29 -1 (nil)) (jump_insn 23 20 24 2 (set (pc) (if_then_else (lt (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0) (const_int 0 [0])) (label_ref 32) (pc))) "../xx.c":10:5 -1 (int_list:REG_BR_PROB 440234148 (nil)) -> 32) and then, when combine (insn 20 19 23 2 (set (reg/v:DI 200 [ val ]) (sign_extend:DI (subreg:SI (reg/v:DI 200 [ val ]) 0))) "../xx.c":7:29 -1 (nil)) will be convert to (note 20 19 23 2 NOTE_INSN_DELETED) MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) =3D=3D true based on that the hard register is always sign-extended, but here the hard register is polluted by zero_extract. If we just patch combine.cc to make it not eat sign_extend, here, sign_extend will still disappear in the later passes, due to mips define sign_extend as "emit_note (NOTE_INSN_DELETED)". So I tried to insert a new truncate RTX here, (insn 21 20 24 2 (set (reg/v:DI 200 [ val ]) (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1 (nil)) This is the RTL for this C code int32_t fun (int64_t arg) { int32_t a =3D (int32_t) arg; return a; } But, the `reload` pass will get an ICE. I haven't dig the real problem. If the new RTX is (insn 21 20 24 2 (set (subreg/s/u:SI (reg/v:DI 200 [ val ]) 0) (truncate:SI (reg/v:DI 200 [ val ]))) "../xx.c":7:29 -1 (nil)) `reload` pass will happily accept it, and then it is converted to # this instruction will be sure the reg is well sign extended. `sll $rN, $rN, 0` hard instruction. The problem is that simple-rtx (called by combine) will believe that REG 200 has been truncated to SImode, as the dest has an subreg:SI. So, I use /s/u flags to tell combine don't do so. > Focus on the RTL semantics as well as the target specific semantics > because both are critically important here. > > I strongly suspect you're just papering over a problem elsewhere. > Yes. I also guess so. Any new idea? In the previous threads, you suggested that we can just insert an truncate instruction just before the comparison. It still have some problem: 1. There may be no comparison just after the zero_extract, instead some normal calculation, such as add/sub. Then, the calculation will get a malformed register, and in the ISA document, it is claimed UNPREDICTABLE. 2. Insert an RTX before every comparison will cause performance regression, since in the most case, it is not needed. 3. Inserting an RTX before comparison still needs some dirty hack like this. > > > --- > > gcc/combine.cc | 23 +++++++++++++++++++++- > > gcc/expr.cc | 17 ++++++++++++++++ > > gcc/testsuite/gcc.target/mips/pr104914.c | 25 +++++++++++++++++++++++= + > > 3 files changed, 64 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c > > > > diff --git a/gcc/combine.cc b/gcc/combine.cc > > index 1cda4dd57f2..04b9c414053 100644 > > --- a/gcc/combine.cc > > +++ b/gcc/combine.cc > > @@ -3294,6 +3294,28 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_ins= n *i1, rtx_insn *i0, > > n_occurrences =3D 0; /* `subst' counts here */ > > subst_low_luid =3D DF_INSN_LUID (i2); > > > > + /* Don't try to combine a TRUNCATE INSN, if it's DEST is SUBREG = and has > > + FLAG /s/u. We use these 2 flags to mark this INSN as really nee= ded: > > + normally, it means that the bits of 31+ of this variable is poll= uted > > + by a bitops. The reason of existing of case (subreg:SI (reg:DI)= ) is > > + that, the same hardreg may act as src and dest. */ > > + if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode) > > + && INSN_P (i2)) > > + { > > + rtx i2dest_o =3D SET_DEST (PATTERN (i2)); > > + rtx i2src_o =3D SET_SRC (PATTERN (i2)); > > + if (GET_CODE (i2dest_o) =3D=3D SUBREG > > + && GET_MODE (i2dest_o) =3D=3D SImode > > + && GET_MODE (SUBREG_REG (i2dest_o)) =3D=3D DImode > > + && SUBREG_PROMOTED_VAR_P (i2dest_o) > > + && SUBREG_PROMOTED_GET (i2dest_o) =3D=3D SRP_SIGNED > > + && GET_CODE (i2src_o) =3D=3D TRUNCATE > > + && GET_MODE (i2src_o) =3D=3D SImode > > + && rtx_equal_p (SUBREG_REG (i2dest_o), XEXP (i2src_o, 0)) > > + ) > > + return 0; > > + } > So checking SI/DI like this is just wrong. There's nothing special > about SI/DI. Checking for equality between the destination and source With some test, when the case of 8->32, 8->64, and 16->64, 16->64 don't have this problem, as sign_extend to SI/DI from HI/QI is not NOOP, such as this RTX (insn 11 10 12 2 (set (reg/v:DI 198 [ val ]) (sign_extend:DI (subreg:HI (reg/v:DI 198 [ val ]) 0))) "yy.c":4:= 29 -1 (nil)) will be converted to: seh $rN, $rN > also seems wrong -- if the state of the sign bit is wrong, it's wrong > regardless of whether or not the source/destination register is the same. > I guess so, too. In this patch, I try my best to be very careful not to break something else= , so I use very strict tests for this case. The reason, I use this rtx_equal_p, based on that in expr.cc (see bellow), I use emit_unop_insn (icode, to_rtx, SUBREG_REG (to_rtx), TRUNCATE); And in fact, if source/dest are different, the subreg in dest won't be need= ed, In this case, combine won't eat this truncate insn. I try my best to reduce the influence. > > > > > @@ -5326,7 +5348,6 @@ find_split_point (rtx *loc, rtx_insn *insn, bool = set_src) > > > > UNIQUE_COPY is true if each substitution must be unique. We do th= is > > by copying if `n_occurrences' is nonzero. */ > > - > > static rtx > > subst (rtx x, rtx from, rtx to, bool in_dest, bool in_cond, bool uniq= ue_copy) > > { > > diff --git a/gcc/expr.cc b/gcc/expr.cc > > index 9fef2bf6585..f7236040a34 100644 > > --- a/gcc/expr.cc > > +++ b/gcc/expr.cc > > @@ -6284,6 +6284,23 @@ expand_assignment (tree to, tree from, bool nont= emporal) > > nontemporal, reversep); > > convert_move (SUBREG_REG (to_rtx), to_rtx1, > > SUBREG_PROMOTED_SIGN (to_rtx)); > > + > > + rtx last =3D get_last_insn (); > > + if (TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode) > > + && known_ge (bitregion_end, 31) > > + && SUBREG_PROMOTED_VAR_P (to_rtx) > > + && SUBREG_PROMOTED_SIGN (to_rtx) =3D=3D SRP_SIGNED > > + && GET_MODE (to_rtx) =3D=3D SImode > > + && GET_MODE (SUBREG_REG (to_rtx)) =3D=3D DImode > > + && GET_CODE (SET_SRC (PATTERN (last))) =3D=3D SIGN_= EXTEND > > + ) > > + { > > + insn_code icode =3D convert_optab_handler > > + (trunc_optab, SImode, DIm= ode); > > + if (icode !=3D CODE_FOR_nothing) > > + emit_unop_insn (icode, to_rtx, > > + SUBREG_REG (to_rtx), TRUNCATE); > > + } > Similar comments about the modes apply here. > > But again, my sense is there's a higher level problem here and that > these changes are just papering over it. > > Jeff