From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26761 invoked by alias); 15 Apr 2011 07:54:23 -0000 Received: (qmail 26752 invoked by uid 22791); 15 Apr 2011 07:54:22 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,MIME_QP_LONG_LINE,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 15 Apr 2011 07:54:17 +0000 Received: (qmail 4298 invoked from network); 15 Apr 2011 07:54:15 -0000 Received: from unknown (HELO ?192.168.0.199?) (maxim@127.0.0.2) by mail.codesourcery.com with ESMTPA; 15 Apr 2011 07:54:15 -0000 From: Maxim Kuvyrkov Content-Type: multipart/mixed; boundary=Apple-Mail-56-276821964 Subject: [PATCH] Improve combining of conditionals Date: Fri, 15 Apr 2011 07:59:00 -0000 Message-Id: <33F4E740-6ED2-4694-B63C-E43ED3B91461@codesourcery.com> To: gcc-patches Mime-Version: 1.0 (Apple Message framework v1084) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-04/txt/msg01149.txt.bz2 --Apple-Mail-56-276821964 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Content-length: 1320 This patch fixes the problem with substituting expressions into IF_THEN_ELS= E during combining. Without this patch combining of conditionals inside IF= _THEN_ELSE is seriously inhibited. The problem this patch fixes is that combine_simplify_rtx() prefers to retu= rn an expression (say, ) even when a comparison is prefered (s= ay, ). Expressions are not recognized as valid condit= ions of if_then_else for most targets, so combiner misses a potential optim= ization. This patch makes combine_simplify_rtx() aware of the context it w= as invoked in, and, when appropriate, does not discourage it from returning= a conditional. The motivating example for this fix was crcu8() routine from CoreMark. Com= piling this routine for MIPS32R2 at -O3 produces there are several instance= s of sequence andi $2,$2,0x1 xori $2,$2,0x1 movn $3,$5,$2 ; $2 dies here which can be optimized into andi $2,$2,0x1 movz $3,$5,$2 ; $2 dies here =2E The patch was successfully tested on {i686, arm, mips}-linux, both GCC test= suites and SPEC2000 runs. For all targets there was no observable code dif= ference in SPEC2000 benchmarks, so the example does not trigger very often.= Still, it speeds up CoreMark by about 1%. OK for trunk? -- Maxim Kuvyrkov Mentor Graphics / CodeSourcery --Apple-Mail-56-276821964 Content-Disposition: attachment; filename=gcc-combine-if_then_else.ChangeLog Content-Type: application/octet-stream; name="gcc-combine-if_then_else.ChangeLog" Content-Transfer-Encoding: 7bit Content-length: 228 2011-04-15 Maxim Kuvyrkov * combine.c (subst, combine_simlify_rtx): Add new argument, use it to track processing of conditionals. Update all callers. (try_combine, simplify_if_then_else): Update. --Apple-Mail-56-276821964 Content-Disposition: attachment; filename=gcc-combine-if_then_else.patch Content-Type: application/octet-stream; name="gcc-combine-if_then_else.patch" Content-Transfer-Encoding: quoted-printable Content-length: 10823 >From 7738ca05728ebc1b5d46eabc9f3dedd73842850d Mon Sep 17 00:00:00 2001=0A= From: Maxim Kuvyrkov =0A= Date: Fri, 8 Apr 2011 07:55:20 -0700=0A= Subject: [PATCH] combine=0A= =0A= ---=0A= gcc/combine.c | 69 ++++++++++++++++++++++++++++++++++-------------------= ---=0A= 1 files changed, 42 insertions(+), 27 deletions(-)=0A= =0A= diff --git a/gcc/combine.c b/gcc/combine.c=0A= index 8771acf..ca54083 100644=0A= --- a/gcc/combine.c=0A= +++ b/gcc/combine.c=0A= @@ -417,8 +417,8 @@ static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx)= ;=0A= static void undo_all (void);=0A= static void undo_commit (void);=0A= static rtx *find_split_point (rtx *, rtx, bool);=0A= -static rtx subst (rtx, rtx, rtx, int, int);=0A= -static rtx combine_simplify_rtx (rtx, enum machine_mode, int);=0A= +static rtx subst (rtx, rtx, rtx, int, int, int);=0A= +static rtx combine_simplify_rtx (rtx, enum machine_mode, int, int);=0A= static rtx simplify_if_then_else (rtx);=0A= static rtx simplify_set (rtx);=0A= static rtx simplify_logical (rtx);=0A= @@ -3117,11 +3117,11 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *n= ew_direct_jump_p,=0A= if (i1)=0A= {=0A= subst_low_luid =3D DF_INSN_LUID (i1);=0A= - i1src =3D subst (i1src, pc_rtx, pc_rtx, 0, 0);=0A= + i1src =3D subst (i1src, pc_rtx, pc_rtx, 0, 0, 0);=0A= }=0A= =20=0A= subst_low_luid =3D DF_INSN_LUID (i2);=0A= - i2src =3D subst (i2src, pc_rtx, pc_rtx, 0, 0);=0A= + i2src =3D subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);=0A= }=0A= =20=0A= n_occurrences =3D 0; /* `subst' counts here */=0A= @@ -3132,7 +3132,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new= _direct_jump_p,=0A= self-referential RTL when we will be substituting I1SRC for I1DEST=0A= later. Likewise if I0 feeds into I2, either directly or indirectly=0A= through I1, and I0DEST is in I0SRC. */=0A= - newpat =3D subst (PATTERN (i3), i2dest, i2src, 0,=0A= + newpat =3D subst (PATTERN (i3), i2dest, i2src, 0, 0,=0A= (i1_feeds_i2_n && i1dest_in_i1src)=0A= || ((i0_feeds_i2_n || (i0_feeds_i1_n && i1_feeds_i2_n))=0A= && i0dest_in_i0src));=0A= @@ -3171,7 +3171,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new= _direct_jump_p,=0A= copy of I1SRC each time we substitute it, in order to avoid creating=0A= self-referential RTL when we will be substituting I0SRC for I0DEST=0A= later. */=0A= - newpat =3D subst (newpat, i1dest, i1src, 0,=0A= + newpat =3D subst (newpat, i1dest, i1src, 0, 0,=0A= i0_feeds_i1_n && i0dest_in_i0src);=0A= substed_i1 =3D 1;=0A= =20=0A= @@ -3201,7 +3201,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new= _direct_jump_p,=0A= =20=0A= n_occurrences =3D 0;=0A= subst_low_luid =3D DF_INSN_LUID (i0);=0A= - newpat =3D subst (newpat, i0dest, i0src, 0, 0);=0A= + newpat =3D subst (newpat, i0dest, i0src, 0, 0, 0);=0A= substed_i0 =3D 1;=0A= }=0A= =20=0A= @@ -3263,7 +3263,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new= _direct_jump_p,=0A= {=0A= rtx t =3D i1pat;=0A= if (i0_feeds_i1_n)=0A= - t =3D subst (t, i0dest, i0src, 0, 0);=0A= + t =3D subst (t, i0dest, i0src, 0, 0, 0);=0A= =20=0A= XVECEXP (newpat, 0, --total_sets) =3D t;=0A= }=0A= @@ -3271,10 +3271,10 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *n= ew_direct_jump_p,=0A= {=0A= rtx t =3D i2pat;=0A= if (i1_feeds_i2_n)=0A= - t =3D subst (t, i1dest, i1src_copy ? i1src_copy : i1src, 0,=0A= + t =3D subst (t, i1dest, i1src_copy ? i1src_copy : i1src, 0, 0,=0A= i0_feeds_i1_n && i0dest_in_i0src);=0A= if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n)=0A= - t =3D subst (t, i0dest, i0src, 0, 0);=0A= + t =3D subst (t, i0dest, i0src, 0, 0, 0);=0A= =20=0A= XVECEXP (newpat, 0, --total_sets) =3D t;=0A= }=0A= @@ -4938,11 +4938,13 @@ find_split_point (rtx *loc, rtx insn, bool set_src)= =0A= =20=0A= IN_DEST is nonzero if we are processing the SET_DEST of a SET.=0A= =20=0A= + IN_COND is nonzero if we are on top level of the condition.=0A= +=0A= UNIQUE_COPY is nonzero if each substitution must be unique. We do this= =0A= by copying if `n_occurrences' is nonzero. */=0A= =20=0A= static rtx=0A= -subst (rtx x, rtx from, rtx to, int in_dest, int unique_copy)=0A= +subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)= =0A= {=0A= enum rtx_code code =3D GET_CODE (x);=0A= enum machine_mode op0_mode =3D VOIDmode;=0A= @@ -5003,7 +5005,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int uniq= ue_copy)=0A= && GET_CODE (XVECEXP (x, 0, 0)) =3D=3D SET=0A= && GET_CODE (SET_SRC (XVECEXP (x, 0, 0))) =3D=3D ASM_OPERANDS)=0A= {=0A= - new_rtx =3D subst (XVECEXP (x, 0, 0), from, to, 0, unique_copy);=0A= + new_rtx =3D subst (XVECEXP (x, 0, 0), from, to, 0, 0, unique_copy);= =0A= =20=0A= /* If this substitution failed, this whole thing fails. */=0A= if (GET_CODE (new_rtx) =3D=3D CLOBBER=0A= @@ -5020,7 +5022,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int uniq= ue_copy)=0A= && GET_CODE (dest) !=3D CC0=0A= && GET_CODE (dest) !=3D PC)=0A= {=0A= - new_rtx =3D subst (dest, from, to, 0, unique_copy);=0A= + new_rtx =3D subst (dest, from, to, 0, 0, unique_copy);=0A= =20=0A= /* If this substitution failed, this whole thing fails. */=0A= if (GET_CODE (new_rtx) =3D=3D CLOBBER=0A= @@ -5066,8 +5068,8 @@ subst (rtx x, rtx from, rtx to, int in_dest, int uniq= ue_copy)=0A= }=0A= else=0A= {=0A= - new_rtx =3D subst (XVECEXP (x, i, j), from, to, 0,=0A= - unique_copy);=0A= + new_rtx =3D subst (XVECEXP (x, i, j), from, to, 0, 0,=0A= + unique_copy);=0A= =20=0A= /* If this substitution failed, this whole thing=0A= fails. */=0A= @@ -5144,7 +5146,9 @@ subst (rtx x, rtx from, rtx to, int in_dest, int uniq= ue_copy)=0A= && (code =3D=3D SUBREG || code =3D=3D STRICT_LOW_PART=0A= || code =3D=3D ZERO_EXTRACT))=0A= || code =3D=3D SET)=0A= - && i =3D=3D 0), unique_copy);=0A= + && i =3D=3D 0),=0A= + code =3D=3D IF_THEN_ELSE && i =3D=3D 0,=0A= + unique_copy);=0A= =20=0A= /* If we found that we will have to reject this combination,=0A= indicate that by returning the CLOBBER ourselves, rather than=0A= @@ -5201,7 +5205,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int uniq= ue_copy)=0A= /* If X is sufficiently simple, don't bother trying to do anything= =0A= with it. */=0A= if (code !=3D CONST_INT && code !=3D REG && code !=3D CLOBBER)=0A= - x =3D combine_simplify_rtx (x, op0_mode, in_dest);=0A= + x =3D combine_simplify_rtx (x, op0_mode, in_dest, in_cond);=0A= =20=0A= if (GET_CODE (x) =3D=3D code)=0A= break;=0A= @@ -5221,10 +5225,12 @@ subst (rtx x, rtx from, rtx to, int in_dest, int un= ique_copy)=0A= expression.=0A= =20=0A= OP0_MODE is the original mode of XEXP (x, 0). IN_DEST is nonzero=0A= - if we are inside a SET_DEST. */=0A= + if we are inside a SET_DEST. IN_COND is nonzero if we are on the top l= evel=0A= + of a condition. */=0A= =20=0A= static rtx=0A= -combine_simplify_rtx (rtx x, enum machine_mode op0_mode, int in_dest)=0A= +combine_simplify_rtx (rtx x, enum machine_mode op0_mode, int in_dest,=0A= + int in_cond)=0A= {=0A= enum rtx_code code =3D GET_CODE (x);=0A= enum machine_mode mode =3D GET_MODE (x);=0A= @@ -5279,8 +5285,8 @@ combine_simplify_rtx (rtx x, enum machine_mode op0_mo= de, int in_dest)=0A= false arms to store-flag values. Be careful to use copy_rtx=0A= here since true_rtx or false_rtx might share RTL with x as a=0A= result of the if_then_else_cond call above. */=0A= - true_rtx =3D subst (copy_rtx (true_rtx), pc_rtx, pc_rtx, 0, 0);=0A= - false_rtx =3D subst (copy_rtx (false_rtx), pc_rtx, pc_rtx, 0, 0);=0A= + true_rtx =3D subst (copy_rtx (true_rtx), pc_rtx, pc_rtx, 0, 0, 0);=0A= + false_rtx =3D subst (copy_rtx (false_rtx), pc_rtx, pc_rtx, 0, 0, 0);=0A= =20=0A= /* If true_rtx and false_rtx are not general_operands, an if_then_else= =0A= is unlikely to be simpler. */=0A= @@ -5624,7 +5630,7 @@ combine_simplify_rtx (rtx x, enum machine_mode op0_mo= de, int in_dest)=0A= {=0A= /* Try to simplify the expression further. */=0A= rtx tor =3D simplify_gen_binary (IOR, mode, XEXP (x, 0), XEXP (x, 1));= =0A= - temp =3D combine_simplify_rtx (tor, mode, in_dest);=0A= + temp =3D combine_simplify_rtx (tor, mode, in_dest, 0);=0A= =20=0A= /* If we could, great. If not, do not go ahead with the IOR=0A= replacement, since PLUS appears in many special purpose=0A= @@ -5717,7 +5723,16 @@ combine_simplify_rtx (rtx x, enum machine_mode op0_m= ode, int in_dest)=0A= ZERO_EXTRACT is indeed appropriate, it will be placed back by=0A= the call to make_compound_operation in the SET case. */=0A= =20=0A= - if (STORE_FLAG_VALUE =3D=3D 1=0A= + if (in_cond)=0A= + /* Don't apply below optimizations if the caller would=0A= + prefer a comparison rather than a value.=0A= + E.g., for the condition in an IF_THEN_ELSE most targets need=0A= + an explicit comparison. */=0A= + {=0A= + ;=0A= + }=0A= +=0A= + else if (STORE_FLAG_VALUE =3D=3D 1=0A= && new_code =3D=3D NE && GET_MODE_CLASS (mode) =3D=3D MODE_INT=0A= && op1 =3D=3D const0_rtx=0A= && mode =3D=3D GET_MODE (op0)=0A= @@ -5961,11 +5976,11 @@ simplify_if_then_else (rtx x)=0A= if (reg_mentioned_p (from, true_rtx))=0A= true_rtx =3D subst (known_cond (copy_rtx (true_rtx), true_code,=0A= from, true_val),=0A= - pc_rtx, pc_rtx, 0, 0);=0A= + pc_rtx, pc_rtx, 0, 0, 0);=0A= if (reg_mentioned_p (from, false_rtx))=0A= false_rtx =3D subst (known_cond (copy_rtx (false_rtx), false_code,=0A= from, false_val),=0A= - pc_rtx, pc_rtx, 0, 0);=0A= + pc_rtx, pc_rtx, 0, 0, 0);=0A= =20=0A= SUBST (XEXP (x, 1), swapped ? false_rtx : true_rtx);=0A= SUBST (XEXP (x, 2), swapped ? true_rtx : false_rtx);=0A= @@ -6182,11 +6197,11 @@ simplify_if_then_else (rtx x)=0A= {=0A= temp =3D subst (simplify_gen_relational (true_code, m, VOIDmode,=0A= cond_op0, cond_op1),=0A= - pc_rtx, pc_rtx, 0, 0);=0A= + pc_rtx, pc_rtx, 0, 0, 0);=0A= temp =3D simplify_gen_binary (MULT, m, temp,=0A= simplify_gen_binary (MULT, m, c1,=0A= const_true_rtx));=0A= - temp =3D subst (temp, pc_rtx, pc_rtx, 0, 0);=0A= + temp =3D subst (temp, pc_rtx, pc_rtx, 0, 0, 0);=0A= temp =3D simplify_gen_binary (op, m, gen_lowpart (m, z), temp);=0A= =20=0A= if (extend_op !=3D UNKNOWN)=0A= --=20=0A= 1.7.4.1=0A= =0A= --Apple-Mail-56-276821964--