From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by sourceware.org (Postfix) with ESMTPS id A5FD53858007 for ; Tue, 6 Jul 2021 04:01:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A5FD53858007 Received: by mail-pj1-x102e.google.com with SMTP id g6-20020a17090adac6b029015d1a9a6f1aso927554pjx.1 for ; Mon, 05 Jul 2021 21:01:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=+6PbbkpNWQJYOBNEWw3A8eJ6SeiruiJgEmBVic+oqQg=; b=ZU+x9w41cWbEUuNq/2tXQuf7bF2pBEqT6iZ/PK7rCKeTkD/+tcS0vN7z9rO+7YiTQT kz4AjYQBmu0kfJmR5p6FRU98bT3gy7zx5uKM169m7jY8VFbib40rXGBNrhjZvJhofKXI cy9ocGnInHZGMBw9vvAVfx8Ns97gtc5xb0qgmlXpDs8bFIqyiVrOOYYXf38jw3F2d09g EFZ1NEHwcwyMyDe7KehqlTroYDkIxDmgTApFj3sQ1TxByrDUxoaGT449KqDMdDsy3W1r AY9yURrxXGKDiXl4ccAu1UcIdha6OBJaCIc47f9OC7xjLKf774yxAwx5Vwnuq4UJsVlm M+AA== X-Gm-Message-State: AOAM531h7TNf4m3YDC315DRgsGEDAQkuvja4ANDJz6o7QKlrkh/iYo0f 7k4rXikWV2AzDtFZ4RW9UwI2B4/uVuE80A== X-Google-Smtp-Source: ABdhPJykoVYq/a9L7AOSnB5CvN7DJVxz9g9pNlyhc9w1+Qy1Nj6G5jPAEaqLXcIywpl8WdVjE9HWaw== X-Received: by 2002:a17:902:bf45:b029:129:8147:3a93 with SMTP id u5-20020a170902bf45b029012981473a93mr8967600pls.84.1625544118305; Mon, 05 Jul 2021 21:01:58 -0700 (PDT) Received: from [192.168.1.17] (c-98-202-48-222.hsd1.ut.comcast.net. [98.202.48.222]) by smtp.gmail.com with ESMTPSA id r188sm15128850pfc.167.2021.07.05.21.01.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 05 Jul 2021 21:01:57 -0700 (PDT) Subject: Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack To: Richard Biener , Jeff Law Cc: GCC Patches References: <45e3b746-c762-cd92-6429-78af069bac9c@tachyum.com> From: Jeff Law Message-ID: <63d74723-38dc-3f61-9a3f-fa39d376de2b@gmail.com> Date: Mon, 5 Jul 2021 22:01:56 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jul 2021 04:02:01 -0000 On 7/5/2021 5:17 AM, Richard Biener via Gcc-patches wrote: > On Fri, Jul 2, 2021 at 6:13 PM Jeff Law wrote: >> >> This is a minor missed optimization we found with our internal port. >> >> Given this code: >> >> typedef struct {short a; short b;} T; >> >> extern void g1(); >> >> void f(T s) >> { >> if (s.a < 0) >> g1(); >> } >> >> >> "s" is passed in a register, but it's still a BLKmode object because the >> alignment of T is smaller than the alignment that an integer of the same >> size would have (16 bits vs 32 bits). >> >> >> Because "s" is BLKmode function.c is going to store it into a stack slot >> and we'll load it from the that slot for each reference. So on the v850 >> (just to pick a port that likely has the same behavior we're seeing) we >> have this RTL from CSE2: >> >> >> (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp) >> (const_int -4 [0xfffffffffffffffc])) [2 S4 A32]) >> (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal} >> (expr_list:REG_DEAD (reg:SI 6 r6) >> (nil))) >> (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG) >> (insn 8 3 9 2 (set (reg:HI 44 [ s.a ]) >> (mem/c:HI (plus:SI (reg/f:SI 34 .fp) >> (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 A32])) >> "j.c":7:5 3 {*movhi_internal} >> (nil)) >> (insn 9 8 10 2 (parallel [ >> (set (reg:SI 45) >> (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0) >> (const_int 16 [0x10]))) >> (clobber (reg:CC 32 psw)) >> ]) "j.c":7:5 94 {ashlsi3_clobber_flags} >> (expr_list:REG_DEAD (reg:HI 44 [ s.a ]) >> (expr_list:REG_UNUSED (reg:CC 32 psw) >> (nil)))) >> (insn 10 9 11 2 (parallel [ >> (set (reg:SI 43) >> (ashiftrt:SI (reg:SI 45) >> (const_int 16 [0x10]))) >> (clobber (reg:CC 32 psw)) >> ]) "j.c":7:5 104 {ashrsi3_clobber_flags} >> (expr_list:REG_DEAD (reg:SI 45) >> (expr_list:REG_UNUSED (reg:CC 32 psw) >> (nil)))) >> >> >> Insn 2 is the store into the stack. insn 8 is the load for s.a in the >> conditional. DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6) >> has the value we want. After that the store at insn 2 is dead. Sadly >> DSE never removes the store. >> >> The problem is RTL DSE considers a store with no MEM_EXPR as escaping, >> which keeps the MEM live. The lack of a MEM_EXPR is due to call to >> change_address to twiddle the mode on the MEM for the store at insn 2. >> It should be safe to copy the MEM_EXPR (which should always be a >> PARM_DECL) from the original memory to the memory returned by >> change_address. Doing so results in DSE1 removing the store at insn 2. >> >> It would be nice to remove the stack setup/teardown. I'm not offhand >> aware of mechanisms to remove the setup/teardown after we've already >> allocated a slot, even if the slot is no longer used. >> >> Bootstrapped and regression tested on x86, though I don't think that's a >> particularly useful test. So I also ran it through my tester across >> those pesky embedded targets without regressions as well. >> >> I didn't include a test simply because I didn't want to have an insane >> target selector. I guess if we really wanted a test we could look after >> DSE1 is done and verify there aren't any MEMs left at all. Willing to >> try that if the consensus is we want this tested. >> >> OK for the trunk? > I wonder why the code doesn't use adjust_address instead? That > handles most cases already and the code doesn't change the > address but just the mode (and access size)? No idea.  It should be easy enough to try that approach though. jeff