From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by sourceware.org (Postfix) with ESMTPS id 214723858C3A for ; Wed, 15 May 2024 03:04:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 214723858C3A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 214723858C3A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715742274; cv=none; b=La2Ih0F1e4CpanRF7e4eCahDE6rkbloCgm6s8bFKG9K2hADvPZmm7lNXeoDcMA2yWla2azyPFzQ3Ywirvk7T17EPcQvMoN9iaUozl38xOPn5fcJlZngVqs9uiy+h9iOV2JaeTUhb8+FOcOn5iWr5YDZXSV+JsNZvkGrE2qxyV9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715742274; c=relaxed/simple; bh=2B9+bR0Q/EdooJG0wlCzPmVLzueRw/uEzIdtveRay3M=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=iFtLC4ioFwu3Bh/myLjGw1nZMGNDr7xYjMDre0PwameMjvymgh4Qn7dJFiSmLkLweFS+7Udx+Gwshnr3kI4OmtL4bKPRcKWQeNRRuaadabaV8xbUKLGNZEPECQLQQE6vVxmPRSvj9n9nf/ZzGXW23X3iA3/qqBKaQFo3ROnAHSU= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715742273; x=1747278273; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=2B9+bR0Q/EdooJG0wlCzPmVLzueRw/uEzIdtveRay3M=; b=Au/L5UXhODrytBi1DkyJayplMTPb2u2VfLHrX6kSi7c9BMMEjvIpSzt7 yCO/73rMbvdSApBkuK/py2mZ27v5tr4KNWULSo/fxxiume3hBkIuW/6om T9vqF1z8i5dzkFmL4s6wyPen7cHwVUyXmLzqUN6ExzRQvrZka4vNdWIer d0RKoQduX5kg57PZXbw8jTnO/rbnHpiw+h4cSIkjn3PPyL6UyXmHOUAA1 12QQTtVCPxU7UTBMtPLHyzA1MGwIoWmM5pT+j361J+ASuWnavADNCegyN S08iHECs2W8+MA6u65bO/MssSzK1XL/x77oj7ysttRUHgpBeAEENDoWPH Q==; X-CSE-ConnectionGUID: N6BSOjS1QOOUNcxl/s1h2A== X-CSE-MsgGUID: 5o+mbS4dTBWUZZD1v09fMQ== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="22369360" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="22369360" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 20:04:32 -0700 X-CSE-ConnectionGUID: JNg1il90R3e2CBj9xP3RWA== X-CSE-MsgGUID: mRw2actZR3iFXJJnBvUqiA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="31320615" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orviesa006.jf.intel.com with ESMTP; 14 May 2024 20:04:30 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 38B9010077D8; Wed, 15 May 2024 11:04:29 +0800 (CST) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue Date: Wed, 15 May 2024 11:04:27 +0800 Message-Id: <20240515030429.2575440-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi all, Recently, we have encountered several random performance regressions in benchmarks commit to commit. It is caused by cross cacheline issue for tight loops. We are trying to solve the issue by two patches. One is adjusting the loop alignment for generic tune, the other is aligning tight and hot loops more aggressively. For SPECINT, we get a 0.85% improvement overall in rates, under option -O2 -march=x86-64-v3 -mtune=generic on Emerald Rapids. BenchMarks EMR Rates 500.perlbench_r -1.21% 502.gcc_r 0.78% 505.mcf_r 0.00% 520.omnetpp_r 0.41% 523.xalancbmk_r 1.33% 525.x264_r 2.83% 531.deepsjeng_r 1.11% 541.leela_r 0.00% 548.exchange2_r 2.36% 557.xz_r 0.98% Geomean-int 0.85% Side effect is that we get a 1.40% increase in codesize. BenchMarks EMR Codesize 500.perlbench_r 0.70% 502.gcc_r 0.67% 505.mcf_r 3.26% 520.omnetpp_r 0.31% 523.xalancbmk_r 1.15% 525.x264_r 1.11% 531.deepsjeng_r 1.40% 541.leela_r 1.31% 548.exchange2_r 3.06% 557.xz_r 1.04% Geomean-int 1.40% Bootstrapped and regtested on x86_64-pc-linux-gnu. After we committed into trunk for a month, if there isn't any unexpected happen. We planned to backport it to GCC14.2. Thx, Haochen Haochen Jiang (1): Adjust generic loop alignment from 16:11:8 to 16 for Intel processors liuhongt (1): Align tight&hot loop without considering max skipping bytes. gcc/config/i386/i386.cc | 148 ++++++++++++++++++++++++++++++- gcc/config/i386/i386.md | 10 ++- gcc/config/i386/x86-tune-costs.h | 2 +- 3 files changed, 154 insertions(+), 6 deletions(-) -- 2.31.1