From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by sourceware.org (Postfix) with ESMTPS id 8F3A83858D28 for ; Mon, 27 May 2024 01:21:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8F3A83858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8F3A83858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::734 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716772908; cv=none; b=ARkyRj/awe1MVeqc0bn8zomzu2iurKJ6r4VOYa/50rffbsZRsyUm59fNWDoleQ6ER6JkwqhgsRe0Fbki84cU65vKh/VqZUF4Uhlkam4AS/s9YnxWRAB/GD9OxR0N8aSOvaQ28055JyUj11rp9sBBcoRszi2ZwmAA+4N8rM+4onc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716772908; c=relaxed/simple; bh=7Dc0hzvAziLW9Z6qh4G45KcOpO11gnk7ug3lazz4Yn4=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=r+qrYjlo188rXNvr1hKctxed470thwA+rzDxHo2R9Nv2ugl+NmaQ91hycjbUGC864tS6egptahjHBlr0disQFKqcrvSiPrHIR4trg0y9atRaP3Tr4IysNPjkbcmTluXEBhat5sXsT6JmEV+v9b0cedyClqAQzjCkENDZIC30g84= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qk1-x734.google.com with SMTP id af79cd13be357-794b10641b9so104551685a.3 for ; Sun, 26 May 2024 18:21:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716772906; x=1717377706; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=t8OmtspfFOKdbzRSmQi7Wt/O7bIMsl0C/y9vVdDd12k=; b=ieaiUGrMHSGLq1V2y7Qp+r1PpgNwdWY1rysKjYyN924I0KSdkJEWM3i6VVuxUNyCmJ h5vA02Uey2ZZY2cFxbwOrJmmykQwz1zUhde+kgbf/5znpjj0hEmpOoqDYjkg89fKb4nh Tjqyd5/JDhsSfrgGyHpUOizNbsGgW6eTyYc/DYFkgtHQO3dMtxyX+zcuUxyy+f7LNn9R yH3QB2YrzbppUGM74zMPkD/QgWsJoFqW8k3hy/eWcwNhiDO/ysOdC+pKKSMnlMDFZD0u 54Odh5VuXT8MnxOJLaFY7bfEyjpnePoUqM3fPSbdn47hP5Ei4uDLxVDU1oUHXISG/2tT abwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716772906; x=1717377706; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t8OmtspfFOKdbzRSmQi7Wt/O7bIMsl0C/y9vVdDd12k=; b=ARwpq0v0Jq2jin9/qu+4HTQQgcrAvJRHMDCEVpvkpGqpvJHH0fcRnSKgmtE1O+rKQI ZxpJI6KrynPm+ErMeXve1J5ZCIEBsW41ef/zX5M8A/2m3Kvya24sNjwgo2Te1pvwzgNe g707AuxkWfyn16Vaq1P2H8p4ClawLBOVJzSC18EmAgKc9S9cjs+MBCSI1ISCDI94OhQ6 MHFz9iaZrYs4+51RQe7rgPIHii3SoFAfmOXr6V+A5yDJk7I34u1jtPO1wD0iqENWOQlG bwXZnRty6ID0gZ4gRYsd3ISAGc5hitu6VNB26mpNfjiYBgQDf8YYiL05Nx6OjyCBAkH0 lRIw== X-Gm-Message-State: AOJu0Yy/D6m7QfjsiOZJt/0/FlLB8+QdlvQaYwhSCFs3zszdIXHSJtze LY3IOS5vJebPVC+iqcR9NYklp+cSz4kKLawNhqD5B5GXCk4JEjUKcJRGPSFiLj5UMXfesAjLeB7 lcvOk2ak+NpTtPm/38pRlpzEhjhU= X-Google-Smtp-Source: AGHT+IEk99X2lNxrtoZN5fpbDJUo3BgddBoFAvMSB0+2pHtrsGBILPFEfWjb7FL52t71C79kgKDLxvHqoaxcV+Cy4wQ= X-Received: by 2002:a05:6214:43c9:b0:6ad:76a4:6cbd with SMTP id 6a1803df08f44-6ad76a46ddemr42034676d6.7.1716772905713; Sun, 26 May 2024 18:21:45 -0700 (PDT) MIME-Version: 1.0 References: <20240515030429.2575440-1-haochen.jiang@intel.com> In-Reply-To: From: Hongtao Liu Date: Mon, 27 May 2024 09:33:27 +0800 Message-ID: Subject: Re: [PATCH 0/2] Align tight loops to solve cross cacheline issue To: "Jiang, Haochen" Cc: "gcc-patches@gcc.gnu.org" , "Liu, Hongtao" , "ubizjak@gmail.com" , Jan Hubicka , Richard Biener Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, May 20, 2024 at 11:15=E2=80=AFAM Hongtao Liu w= rote: > > On Wed, May 15, 2024 at 11:30=E2=80=AFAM Jiang, Haochen wrote: > > > > Also cc Honza and Richard since we touched generic tune. > > > > Thx, > > Haochen > > > > > -----Original Message----- > > > From: Haochen Jiang > > > Sent: Wednesday, May 15, 2024 11:04 AM > > > To: gcc-patches@gcc.gnu.org > > > Cc: Liu, Hongtao ; ubizjak@gmail.com > > > Subject: [PATCH 0/2] Align tight loops to solve cross cacheline issue > > > > > > Hi all, > > > > > > Recently, we have encountered several random performance regressions = in > > > benchmarks commit to commit. It is caused by cross cacheline issue fo= r tight > > > loops. > > > > > > We are trying to solve the issue by two patches. One is adjusting the= loop > > > alignment for generic tune, the other is aligning tight and hot loops= more > > > aggressively. > > > > > > For SPECINT, we get a 0.85% improvement overall in rates, under optio= n > > > -O2 -march=3Dx86-64-v3 -mtune=3Dgeneric on Emerald Rapids. > > > > > > BenchMarks EMR Rates > > > 500.perlbench_r -1.21% > > > 502.gcc_r 0.78% > > > 505.mcf_r 0.00% > > > 520.omnetpp_r 0.41% > > > 523.xalancbmk_r 1.33% > > > 525.x264_r 2.83% > > > 531.deepsjeng_r 1.11% > > > 541.leela_r 0.00% > > > 548.exchange2_r 2.36% > > > 557.xz_r 0.98% > > > Geomean-int 0.85% > > > > > > Side effect is that we get a 1.40% increase in codesize. > > > > > > BenchMarks EMR Codesize > > > 500.perlbench_r 0.70% > > > 502.gcc_r 0.67% > > > 505.mcf_r 3.26% > > > 520.omnetpp_r 0.31% > > > 523.xalancbmk_r 1.15% > > > 525.x264_r 1.11% > > > 531.deepsjeng_r 1.40% > > > 541.leela_r 1.31% > > > 548.exchange2_r 3.06% > > > 557.xz_r 1.04% > > > Geomean-int 1.40% > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for this if there's no objection in 48 hours. > > > > > > After we committed into trunk for a month, if there isn't any unexpec= ted > > > happen. We planned to backport it to GCC14.2. > > > > > > Thx, > > > Haochen > > > > > > Haochen Jiang (1): > > > Adjust generic loop alignment from 16:11:8 to 16 for Intel processo= rs > For this one, current znver{1,2,3,4,5}_cost already set loop align as > 16, so I think it should be fine set it to generic_cost. > > > > > > liuhongt (1): > > > Align tight&hot loop without considering max skipping bytes. > For this one, although we have seen similar growth on AMD's > processors, it's still nice to have someone from AMD to look at this > to see if it's what they need. > > > > > > gcc/config/i386/i386.cc | 148 +++++++++++++++++++++++++++++= +- > > > gcc/config/i386/i386.md | 10 ++- > > > gcc/config/i386/x86-tune-costs.h | 2 +- > > > 3 files changed, 154 insertions(+), 6 deletions(-) > > > > > > -- > > > 2.31.1 > > > > > -- > BR, > Hongtao --=20 BR, Hongtao