From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 55111 invoked by alias); 8 Jan 2019 11:43:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 55096 invoked by uid 89); 8 Jan 2019 11:43:45 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 08 Jan 2019 11:43:43 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 51E0858E23; Tue, 8 Jan 2019 11:43:42 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-18.ams2.redhat.com [10.36.116.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CE3E05D9CC; Tue, 8 Jan 2019 11:43:41 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id x08Bhdbm008017; Tue, 8 Jan 2019 12:43:39 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id x08Bhcpm008016; Tue, 8 Jan 2019 12:43:38 +0100 Date: Tue, 08 Jan 2019 11:43:00 -0000 From: Jakub Jelinek To: Uros Bizjak Cc: Jeff Law , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] Optimize away x86 mem stores of what the mem contains already (PR rtl-optimization/79593) Message-ID: <20190108114338.GZ30353@tucnak> Reply-To: Jakub Jelinek References: <20190107225116.GU30353@tucnak> <20190108092714.GX30353@tucnak> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-IsSubscribed: yes X-SW-Source: 2019-01/txt/msg00383.txt.bz2 On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote: > FLD from memory in SF and DFmode is considered a conversion, and > converts sNaN to NaN (and emits #IA exception). But sNaN handling is > already busted in the compiler as RA is free to spill the register in > non-XFmode. IMO, the peephole2 pattern is no worse than the current > situation. Ok. > At least for x86, there are no SUBREGs after reload, otherwise other > parts of the compiler would break. The new patch would really handle even a SUBREG there... > > I don't see how, that would mean I'd have to write two peephole2s instead of > > one. It tries to deal with two different cases, one is where the temporary > > reg is dead, in that case we can optimize away both the load or store, the > > second case is where the temporary reg isn't dead, in that case we can > > optimize away the store, but not the load. With the optimizing away of both > > load and store I was just trying to do a cheap DCE there. > > I didn't realize this is an optimization, a comment would be welcome here. Ugh, except that it doesn't work. peep2_reg_dead_p (1, operands[0]) is not what I meant, that is always false, as the register must be live in between the first and second instruction. I meant peep2_reg_dead_p (2, operands[0]), the register dead at the end of the second instruction, except we don't really support define_split/define_peephole2 splitting into zero instructions, DONE; in that case returns NULL like FAIL; does. So, let's just wait for DCE to finish it up. Here is what I'll bootstrap/regtest then. Added also reg_overlap_mentioned_p, in case there is e.g. movl (%eax,%edx), %eax movl %eax, (%eax,%edx) or similar and as I said earlier, explicit match_operand so that I can check MEM_VOLATILE_P on both MEMs. 2019-01-08 Jakub Jelinek PR rtl-optimization/79593 * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. --- gcc/config/i386/i386.md.jj 2019-01-07 23:54:54.494800693 +0100 +++ gcc/config/i386/i386.md 2019-01-08 12:34:18.916832780 +0100 @@ -18740,6 +18740,18 @@ (define_peephole2 const0_rtx); }) +;; Attempt to optimize away memory stores of values the memory already +;; has. See PR79593. +(define_peephole2 + [(set (match_operand 0 "register_operand") + (match_operand 1 "memory_operand")) + (set (match_operand 2 "memory_operand") (match_dup 0))] + "!MEM_VOLATILE_P (operands[1]) + && !MEM_VOLATILE_P (operands[2]) + && rtx_equal_p (operands[1], operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 1))]) + ;; Attempt to always use XOR for zeroing registers (including FP modes). (define_peephole2 [(set (match_operand 0 "general_reg_operand") Jakub