From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) by sourceware.org (Postfix) with ESMTPS id D58C23858D1E for ; Mon, 6 May 2024 14:48:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D58C23858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D58C23858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715006893; cv=none; b=QLxk0ZdfNFMl6qCQTScNqjG+N9yPh4uty67VxUw9Hs/oQ0U96+CO3zMhn3b3FskRedTerqAyTjXjMTrNE3D6jVKGKlkz2WNdWDqKpJzWz1/p93P/zca9Om6q4G+ohgSRlWJ6TvXTJUbyNpL1FuWy/xOXyZruFnBD7dpAHqBNgR4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715006893; c=relaxed/simple; bh=mVWBi/tuibD8znmvBkehggo2msVzFyRoRZjCK3DmJnU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=pO3ebsT0Ob5/BwkS85X6Xih7PmcvFX5eBdG4gTRugWnysluUjMseO9jBOtmd/Zi7sfXRAgQ/zneZkp8+WvhPqYXRUUhz9G5c/TvetVsWSfEcO1GTKNVV8gr78LsW0Wc4AuNc6A9yiOUWLI6bUe+ktFJU45VvHqnZ0yd9BDicLDA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715006891; x=1746542891; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mVWBi/tuibD8znmvBkehggo2msVzFyRoRZjCK3DmJnU=; b=itVXcqKXq5ja5iwNnQqr7PjNv4jJRT1AjUSoKEZcZVc9aWUUjiGw1kbd 6SQdMIZJ0DeLo9mlrKswaTecSkLI0taoHhDhcUvAXeRVvjRwLvoSgnvYJ 9LLFWrk+XajQ7cX07nu4+MfhhdzdEV0Z6mAX0e8poZXw7nxuQfIgLIyus moCyOoqisJenVFiWhWzBQjV4evbh1+iW8ZHFBS1Th/1Bq7P/DGTGklKEf WtehcjiEc/j/owCsr85S+u8zWkmA3ZCsyQ4WpEU6YbWPrtVuG2qNkKczL 8Wy0Xwmiz2zCQNtVM7YtMtKx3o6hQCFgQUALEGPVq8s4f+WgO/8BRlVnv g==; X-CSE-ConnectionGUID: Xaw6Uz1fQpyk7YC1P6wSrA== X-CSE-MsgGUID: u9oT2rwQTQmiLPm1DNh/uA== X-IronPort-AV: E=McAfee;i="6600,9927,11065"; a="10916544" X-IronPort-AV: E=Sophos;i="6.07,258,1708416000"; d="scan'208";a="10916544" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 May 2024 07:48:10 -0700 X-CSE-ConnectionGUID: bza8IrkCRtm/G+5MWEjQhA== X-CSE-MsgGUID: Y0yMbfnnRyG10aGxg3nQ9g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,258,1708416000"; d="scan'208";a="32993124" Received: from shvmail02.sh.intel.com ([10.239.244.9]) by orviesa005.jf.intel.com with ESMTP; 06 May 2024 07:48:08 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail02.sh.intel.com (Postfix) with ESMTP id 546EA1006F20; Mon, 6 May 2024 22:48:06 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, tamar.christina@arm.com, richard.guenther@gmail.com, hongtao.liu@intel.com, Pan Li Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Date: Mon, 6 May 2024 22:48:05 +0800 Message-Id: <20240506144805.725379-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240406120755.2692291-1-pan2.li@intel.com> References: <20240406120755.2692291-1-pan2.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Pan Li This patch would like to add the middle-end presentation for the saturation add. Aka set the result of add to the max when overflow. It will take the pattern similar as below. SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) Take uint8_t as example, we will have: * SAT_ADD (1, 254) => 255. * SAT_ADD (1, 255) => 255. * SAT_ADD (2, 255) => 255. * SAT_ADD (255, 255) => 255. Given below example for the unsigned scalar integer uint64_t: uint64_t sat_add_u64 (uint64_t x, uint64_t y) { return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); } Before this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { long unsigned int _1; _Bool _2; long unsigned int _3; long unsigned int _4; uint64_t _7; long unsigned int _10; __complex__ long unsigned int _11; ;; basic block 2, loop depth 0 ;; pred: ENTRY _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); _1 = REALPART_EXPR <_11>; _10 = IMAGPART_EXPR <_11>; _2 = _10 != 0; _3 = (long unsigned int) _2; _4 = -_3; _7 = _1 | _4; return _7; ;; succ: EXIT } After this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { uint64_t _7; ;; basic block 2, loop depth 0 ;; pred: ENTRY _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] return _7; ;; succ: EXIT } We perform the tranform during widen_mult because that the sub-expr of SAT_ADD will be optimized to .ADD_OVERFLOW. We need to try the .SAT_ADD pattern first and then .ADD_OVERFLOW, or we may never catch the pattern .SAT_ADD. Meanwhile, the isel pass is after widen_mult and then we cannot perform the .SAT_ADD pattern match as the sub-expr will be optmized to .ADD_OVERFLOW first. The below tests are passed for this patch: 1. The riscv fully regression tests. 2. The aarch64 fully regression tests. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD to the return true switch case(s). * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. * match.pd: Add unsigned SAT_ADD match. * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern func decl generated in match.pd match. (match_saturation_arith): New func impl to match the saturation arith. (math_opts_dom_walker::after_dom_children): Try match saturation arith. Signed-off-by: Pan Li --- gcc/internal-fn.cc | 1 + gcc/internal-fn.def | 2 ++ gcc/match.pd | 28 ++++++++++++++++++++++++ gcc/optabs.def | 4 ++-- gcc/tree-ssa-math-opts.cc | 46 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 79 insertions(+), 2 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 0a7053c2286..73045ca8c8c 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn) case IFN_UBSAN_CHECK_MUL: case IFN_ADD_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_SAT_ADD: case IFN_VEC_WIDEN_PLUS: case IFN_VEC_WIDEN_PLUS_LO: case IFN_VEC_WIDEN_PLUS_HI: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 848bb9dbff3..25badbb86e5 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first, DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first, smulhrs, umulhrs, binary) +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary) + DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary) DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary) DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary) diff --git a/gcc/match.pd b/gcc/match.pd index d401e7503e6..7058e4cbe29 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || POINTER_TYPE_P (itype)) && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) +/* Unsigned Saturation Add */ +(match (usadd_left_part @0 @1) + (plus:c @0 @1) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +(match (usadd_right_part @0 @1) + (negate (convert (lt (plus:c @0 @1) @0))) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +(match (usadd_right_part @0 @1) + (negate (convert (gt @0 (plus:c @0 @1)))) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +/* Unsigned saturation add, case 1 (branchless): + SAT_U_ADD = (X + Y) | - ((X + Y) < X) or + SAT_U_ADD = (X + Y) | - (X > (X + Y)). */ +(match (unsigned_integer_sat_add @0 @1) + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1))) + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne) diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..3f2cb46aff8 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3") OPTAB_NX(add_optab, "add$Q$a3") OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc) OPTAB_VX(addv_optab, "add$F$a3") -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc) -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc) +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc) +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc) OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc) OPTAB_NX(sub_optab, "sub$F$a3") OPTAB_NX(sub_optab, "sub$Q$a3") diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index 705f4a4695a..35a46edc9f6 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple *cast_stmt, gimple *&use_stmt, return 0; } +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); + +/* + * Try to match saturation arith pattern(s). + * 1. SAT_ADD (unsigned) + * _7 = _4 + _6; + * _8 = _4 > _7; + * _9 = (long unsigned int) _8; + * _10 = -_9; + * _12 = _7 | _10; + * => + * _12 = .SAT_ADD (_4, _6); */ +static bool +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt, + bool *cfg_changed_p) +{ + gcall *call = NULL; + bool changed_p = false; + + gcc_assert (is_gimple_assign (stmt)); + + tree ops[2]; + tree lhs = gimple_assign_lhs (stmt); + + if (gimple_unsigned_integer_sat_add (lhs, ops, NULL) + && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs), + OPTIMIZE_FOR_SPEED)) + { + call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]); + gimple_call_set_lhs (call, lhs); + gsi_replace (gsi, call, true); + changed_p = true; + *cfg_changed_p = changed_p; + } + + return changed_p; +} + /* Recognize for unsigned x x = y - z; if (x > y) @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children (basic_block bb) fma_deferring_state fma_state (param_avoid_fma_max_bits > 0); + for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (is_gimple_assign (stmt)) + match_saturation_arith (&gsi, stmt, m_cfg_changed_p); + } + for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);) { gimple *stmt = gsi_stmt (gsi); -- 2.34.1