From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id DDDDB385354A for ; Tue, 6 Sep 2022 12:31:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDDDB385354A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x533.google.com with SMTP id q21so7288408edc.9 for ; Tue, 06 Sep 2022 05:31:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=FNSR3+ihejc3631pWQnYfA2m/L/6yU9dpDm1ycPZl0A=; b=hwEznQmUWHAwAaPFScPp6kgxKErAcBRRF53ekoY4lGNEmDNwmHYwCGli9XyXqcbj4a thxZnOeSOMD+bu7M3pdoylSIuuaTdodcCgCa3mc9GMSCxq02fLjfFvquDyW+Pz7EQdpH pDwpDlKDkiozgnBueGCF/8MI8m3MJlBjI4psI5v6bbsn8P+Yx4y8T1cTBrBkVhQ009if M+VZFoRaXOYcV/injPsGk4lpRRMIBoF8KqP5BppDp5mwJ4j6j8NVvkd1d0g4r5jCSHF5 UQhL0aDvQFt0KD2pOscsMv3/Sgi9wYltTGySc5rV7HPK5CKaXKpfIT06WCMkUdNthCiV NK/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=FNSR3+ihejc3631pWQnYfA2m/L/6yU9dpDm1ycPZl0A=; b=RHbsU1WLuuPRqB6ITeNwiG7keIL2Ymnsot3O7ADUvqAGUkblSIOhw8ASuxTOupkTOd iriyMejvcVx+NzvBRVZdgUEoVy7uIirs9Lrfly8HR5FUa82xMZ98oh9y9jWWFdJJB0ND KN1dh5NKozkfF4s4hF2VlH8vxh/mFwpwfcKTNonRHAE+xoV2/BapCBoE/RkCyLtz99Vr XW1MrhjnuCPUVffun/uvvtt439Qok5coQ3ZfIloGkKp8+LqCWdcwo5O4I6BR4lj2n+YZ w3ekpsCLmE/VjSArszrJ5H3sXn98PU7i6hjJfa1sPYmrV5dkM6xV/2xUcdy5oV32wWgU nCsA== X-Gm-Message-State: ACgBeo2PAVrL29vpam9SuQGSGalO3DzEAYj2FTEReVavOB1/TEjnqE6Y 8MIQE+fGVxssH90aJCSAaGR2XvNbVWZCKc84SA4= X-Google-Smtp-Source: AA6agR6kLnijAe9cdaBc/KUa0hK0uGrp6BX0nk4y7SjgLy+wv79SZLoIRzPwRscpNS2xwSPoLsGOFCPiCJLfYMxD7dU= X-Received: by 2002:a05:6402:2b98:b0:43e:107:183d with SMTP id fj24-20020a0564022b9800b0043e0107183dmr47215133edb.366.1662467483489; Tue, 06 Sep 2022 05:31:23 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Tue, 6 Sep 2022 14:31:11 +0200 Message-ID: Subject: Re: [PATCH] expand: Convert cst - x into cst xor x. To: Robin Dapp Cc: GCC Patches , Andrew Pinski Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Sep 6, 2022 at 11:42 AM Robin Dapp wrote: > > Hi, > > posting this separately from PR91213 now. I wrote an s390 test and most > likely it could also be done for x86 which will give it broader coverage. > > Depending on the backend it might be better to convert > cst - x > into > cst xor x > if cst + 1 is a power of two and 0 <= x <= cst. This patch compares > both sequences and emits the less expensive one. > > Does this look like a viable approach? Bootstrapped and regtested on > s390[x], waited with x86 tests until a first round of comments. cost might also depend on the context in case flag setting behavior differs for xor vs sub (on x86 sub looks strictly more powerful here). The same is probably true when looking for a combination with another bitwise operation. Btw, why not perform the optimization in expand_binop? That for example already does if (binoptab == sub_optab && CONST_INT_P (op1)) { op1 = negate_rtx (mode, op1); binoptab = add_optab; } alternatively a targets expander can do the selection as well. Richard. > Regards > Robin > > gcc/ChangeLog: > > PR middle-end/91213 > * expr.cc (expand_expr_real_2): Call new function. > (maybe_optimize_cst_sub): New function. > * expr.h (maybe_optimize_cst_sub): Define. > > gcc/testsuite/ChangeLog: > > * gcc.target/s390/cst-minus-var.c: New test. > > --- > gcc/expr.cc | 79 +++++++++++++++++++ > gcc/expr.h | 2 + > gcc/testsuite/gcc.target/s390/cst-minus-var.c | 55 +++++++++++++ > 3 files changed, 136 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/s390/cst-minus-var.c > > diff --git a/gcc/expr.cc b/gcc/expr.cc > index 80bb1b8a4c5b..80f25720d7b6 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -9397,6 +9397,21 @@ expand_expr_real_2 (sepops ops, rtx target, > machine_mode tmode, > return simplify_gen_binary (MINUS, mode, op0, op1); > } > > + /* Convert const - A to const xor A if integer_pow2p (const + 1) > + and 0 <= A <= const. */ > + if (code == MINUS_EXPR > + && SCALAR_INT_MODE_P (mode) > + && TREE_CODE (treeop0) == INTEGER_CST > + && TREE_CODE (TREE_TYPE (treeop1)) == INTEGER_TYPE > + && wi::exact_log2 (wi::to_widest (treeop0) + 1) != -1) > + { > + rtx res = maybe_optimize_cst_sub (code, treeop0, treeop1, > + mode, unsignedp, type, > + target, subtarget); > + if (res) > + return res; > + } > + > /* No sense saving up arithmetic to be done > if it's all in the wrong mode to form part of an address. > And force_operand won't know whether to sign-extend or > @@ -12692,6 +12707,70 @@ maybe_optimize_mod_cmp (enum tree_code code, > tree *arg0, tree *arg1) > return code == EQ_EXPR ? LE_EXPR : GT_EXPR; > } > > +/* Convert const - A to const xor A if integer_pow2p (const + 1) > + and 0 <= A <= const. */ > + > +rtx > +maybe_optimize_cst_sub (enum tree_code code, tree treeop0, tree treeop1, > + machine_mode mode, int unsignedp, tree type, > + rtx target, rtx subtarget) > +{ > + gcc_checking_assert (code == MINUS_EXPR); > + gcc_checking_assert (SCALAR_INT_MODE_P (mode)); > + gcc_checking_assert (TREE_CODE (treeop0) == INTEGER_CST); > + gcc_checking_assert (TREE_CODE (TREE_TYPE (treeop1)) == INTEGER_TYPE); > + gcc_checking_assert (wi::exact_log2 (wi::to_widest (treeop0) + 1) != -1); > + > + if (!optimize) > + return NULL_RTX; > + > + optab this_optab; > + rtx op0, op1; > + > + if (wi::leu_p (tree_nonzero_bits (treeop1), tree_nonzero_bits (treeop0))) > + { > + expand_operands (treeop0, treeop1, subtarget, &op0, &op1, > + EXPAND_NORMAL); > + bool speed_p = optimize_insn_for_speed_p (); > + do_pending_stack_adjust (); > + start_sequence (); > + this_optab = optab_for_tree_code (MINUS_EXPR, type, > + optab_default); > + rtx subi = expand_binop (mode, this_optab, op0, op1, target, > + unsignedp, OPTAB_LIB_WIDEN); > + > + rtx_insn *sub_insns = get_insns (); > + end_sequence (); > + start_sequence (); > + this_optab = optab_for_tree_code (BIT_XOR_EXPR, type, > + optab_default); > + rtx xori = expand_binop (mode, this_optab, op0, op1, target, > + unsignedp, OPTAB_LIB_WIDEN); > + rtx_insn *xor_insns = get_insns (); > + end_sequence (); > + unsigned sub_cost = seq_cost (sub_insns, speed_p); > + unsigned xor_cost = seq_cost (xor_insns, speed_p); > + /* If costs are the same then use as tie breaker the other other > + factor. */ > + if (sub_cost == xor_cost) > + { > + sub_cost = seq_cost (sub_insns, !speed_p); > + xor_cost = seq_cost (xor_insns, !speed_p); > + } > + > + if (sub_cost <= xor_cost) > + { > + emit_insn (sub_insns); > + return subi; > + } > + > + emit_insn (xor_insns); > + return xori; > + } > + > + return NULL_RTX; > +} > + > /* Optimize x - y < 0 into x < 0 if x - y has undefined overflow. */ > > void > diff --git a/gcc/expr.h b/gcc/expr.h > index 08b59b8d869a..43ea11042d26 100644 > --- a/gcc/expr.h > +++ b/gcc/expr.h > @@ -326,6 +326,8 @@ extern tree string_constant (tree, tree *, tree *, > tree *); > extern tree byte_representation (tree, tree *, tree *, tree *); > > extern enum tree_code maybe_optimize_mod_cmp (enum tree_code, tree *, > tree *); > +extern rtx maybe_optimize_cst_sub (enum tree_code, tree, tree, > + machine_mode, int, tree , rtx, rtx); > extern void maybe_optimize_sub_cmp_0 (enum tree_code, tree *, tree *); > > /* Two different ways of generating switch statements. */ > diff --git a/gcc/testsuite/gcc.target/s390/cst-minus-var.c > b/gcc/testsuite/gcc.target/s390/cst-minus-var.c > new file mode 100644 > index 000000000000..c713624a9784 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/s390/cst-minus-var.c > @@ -0,0 +1,55 @@ > +/* Check that we can convert const - x to const xor x if > + const + 1 is a power of two and 0 <= x <= const. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mzarch" } */ > +/* { dg-final { scan-assembler-times 8 "xr\t" { target { ! 390_z10_hw } > } } } */ > +/* { dg-final { scan-assembler-times 8 "xilf\t" { target { s390_z10_hw > } } } } */ > + > +unsigned long long foo (unsigned long long a) > +{ > + if (a > 1) __builtin_unreachable(); > + return 1 - a; > +} > + > +unsigned long long bar (unsigned long long a) > +{ > + if (a > 65535) __builtin_unreachable(); > + return 65535 - a; > +} > + > +unsigned long long baz (unsigned long long a) > +{ > + if (a > 4294967295) __builtin_unreachable(); > + return 4294967295 - a; > +} > + > +int fooi (int a) > +{ > + if (a > 127 || a < 0) __builtin_unreachable(); > + return 127 - a; > +} > + > +int bari (int a) > +{ > + if (a > 65535 || a < 0) __builtin_unreachable(); > + return 65535 - a; > +} > + > +long bazl (long a) > +{ > + if (a > 4294967295 || a < 0) __builtin_unreachable(); > + return 4294967295 - a; > +} > + > +short foos (short a) > +{ > + if (a > 31 || a < 0) __builtin_unreachable(); > + return 31 - a; > +} > + > +short bars (int a) > +{ > + if (a > 65535 || a < 0) __builtin_unreachable(); > + return 65535 - a; > +} > -- > 2.31.1