From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id 8974B3858C2F for ; Thu, 27 Jul 2023 19:05:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8974B3858C2F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-99bdcade7fbso120943166b.1 for ; Thu, 27 Jul 2023 12:05:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690484737; x=1691089537; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=Hz9/O56aT8C3SWquulp3PD+RfSIKRk3ktcJreTTq3PM=; b=VqZFyuqAIOLvxTPR8V9xZrGCbw93yDGsmtWR4yt8n7zL65VYS1MjCbs51f34KtfxDU 5kdtZ4VxT8gnpZx9WGkSLC/InqvYU2B6VLevwvsixHwTIb4LWpAnYn8i+VNSWJu3Nfg6 S+iYiiLs9HU5bsCn/hny+kNJ2D3d+JuVMo4edAa6oAKG+FSYF1Gn3kEi6M8kv5QE90E7 YlFoWzlLrLGjCfYc5TA0/Xq9StdN2gT+1f+TGODSqj1ylfzcqMvAGjof5CsymN8p5TSw opaehsO4IZAbAnTokDJaBLGVCw04Tp4+A1jOxCwRB+LZg0fJEg7It0g1jx7LErQU+eKo pyoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690484737; x=1691089537; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hz9/O56aT8C3SWquulp3PD+RfSIKRk3ktcJreTTq3PM=; b=LVB3FDAHPTzp66d1IHtV+28nHvnhAwfaM+MTYJ35jK+FBJ4drYkxMhvGVEYElltPiL UCa6BTAhWxUq1ofUnJ/klI77MpmxRo5x4d/bT0LfyY9Stn36OkYc7QFw/PJhxVlRt2BT Wcp7+pJ7EKe59kh6N4k9aI9Sz4StrGFw2h58/HTdm+MQ96VXRdsFj7jBdNcN2rpujZgL 3YhuaP++5LSsoVg2crnt9i/4h/NOp0hB382E19ibI8Dkv0v7mtP3vUSaKJykLv7stfp1 8xmhU/Ycp0MkM9WGjIK3CtR5M6H2SNAZOx6NgrrXQTtAgfwGdFgExjXTp+TwsDc/v9BZ b97A== X-Gm-Message-State: ABy/qLYLFkdBg06Ux7ZUTD7isXdV79gqqf0M2oWPTZLMsApqgplTF73h aCpvlR7Y3YSles5iVcYVwSJPIPYE2Mc= X-Google-Smtp-Source: APBJJlHnRYxLeElHSwkv3Mrd2Lw21NXQsBFT7ZlkYFmw3eDy4boYhu1CX1ikxWo//Qr3JWPx13hn/A== X-Received: by 2002:a17:906:3142:b0:988:9ec1:a8c5 with SMTP id e2-20020a170906314200b009889ec1a8c5mr83570eje.55.1690484737255; Thu, 27 Jul 2023 12:05:37 -0700 (PDT) Received: from smtpclient.apple (dynamic-095-117-053-070.95.117.pool.telefonica.de. [95.117.53.70]) by smtp.gmail.com with ESMTPSA id z7-20020a170906074700b0099293cdbc98sm1096028ejb.145.2023.07.27.12.05.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 27 Jul 2023 12:05:36 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Richard Biener Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog. Date: Thu, 27 Jul 2023 21:05:25 +0200 Message-Id: References: <001601d9c0ad$93934e00$bab9ea00$@nextmovesoftware.com> Cc: gcc-patches@gcc.gnu.org In-Reply-To: <001601d9c0ad$93934e00$bab9ea00$@nextmovesoftware.com> To: Roger Sayle X-Mailer: iPhone Mail (20G75) X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Am 27.07.2023 um 19:12 schrieb Roger Sayle : >=20 > =EF=BB=BF > Hi Richard, >=20 > You're 100% right. It=E2=80=99s possible to significantly clean-up this c= ode, replacing > the body of the conditional with a call to force_reg and simplifying the c= onditions > under which it is called. These improvements are implemented in the patch= > below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and > make -k check, both with and without -m32, as usual. >=20 > Interestingly, the CONCAT clause afterwards is still required (I've learne= d something > new), as calling force_reg (or gen_reg_rtx) with HCmode, actually returns= a CONCAT > instead of a REG, Heh, interesting. > so although the code looks dead, it's required to build libgcc during > a bootstrap. But the remaining clean-up is good, reducing the number of s= ource lines > and making the logic easier to understand. >=20 > Ok for mainline? Ok. Thanks, Richard=20 > 2023-07-27 Roger Sayle > Richard Biener >=20 > gcc/ChangeLog > PR middle-end/28071 > PR rtl-optimization/110587 > * expr.cc (emit_group_load_1): Simplify logic for calling > force_reg on ORIG_SRC, to avoid making a copy if the source > is already in a pseudo register. >=20 > Roger > -- >=20 >> -----Original Message----- >> From: Richard Biener >> Sent: 25 July 2023 12:50 >>=20 >>> On Tue, Jul 25, 2023 at 1:31=E2=80=AFPM Roger Sayle >>> wrote: >>>=20 >>> This patch is the third in series of fixes for PR >>> rtl-optimization/110587, a compile-time regression with -O0, that >>> attempts to address the underlying cause. As noted previously, the >>> pathological test case pr28071.c contains a large number of useless >>> register-to-register moves that can produce quadratic behaviour (in >>> LRA). These move are generated during RTL expansion in >>> emit_group_load_1, where the middle-end attempts to simplify the >>> source before calling extract_bit_field. This is reasonable if the >>> source is a complex expression (from before the tree-ssa optimizers), >>> or a SUBREG, or a hard register, but it's not particularly useful to >>> copy a pseudo register into a new pseudo register. This patch eliminate= s that >> redundancy. >>>=20 >>> The -fdump-tree-expand for pr28071.c compiled with -O0 currently >>> contains 777K lines, with this patch it contains 717K lines, i.e. >>> saving about 60K lines (admittedly of debugging text output, but it make= s the >> point). >>>=20 >>>=20 >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >>> and make -k check, both with and without --target_board=3Dunix{-m32} >>> with no new failures. Ok for mainline? >>>=20 >>> As always, I'm happy to revert this change quickly if there's a >>> problem, and investigate why this additional copy might (still) be >>> needed on other >>> non-x86 targets. >>=20 >> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src= , >> tree type, >> be loaded directly into the destination. */ >> src =3D orig_src; >> if (!MEM_P (orig_src) >> + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) >> && (!CONSTANT_P (orig_src) >> || (GET_MODE (orig_src) !=3D mode >> && GET_MODE (orig_src) !=3D VOIDmode))) >>=20 >> so that means the code guarded by the conditional could instead be transf= ormed >> to >>=20 >> src =3D force_reg (mode, orig_src); >>=20 >> ? Btw, the || (GET_MODE (orig_src) !=3D mode && GET_MODE (orig_src) !=3D= >> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for= the >> move ... that might also mean we have to use force_reg (GET_MODE (orig_sr= c) =3D=3D >> VOIDmode ? mode : GET_MODE (orig_src), orig_src)) >>=20 >> Otherwise I think this is OK, as said, using force_reg somehow would impr= ove >> readability here I think. >>=20 >> I also wonder how the >>=20 >> else if (GET_CODE (src) =3D=3D CONCAT) >>=20 >> case will ever trigger with the current code. >>=20 >> Richard. >>=20 >>>=20 >>> 2023-07-25 Roger Sayle >>>=20 >>> gcc/ChangeLog >>> PR middle-end/28071 >>> PR rtl-optimization/110587 >>> * expr.cc (emit_group_load_1): Avoid copying a pseudo register in= to >>> a new pseudo register, i.e. only copy hard regs into a new pseudo= . >>>=20 >>>=20 >=20 >