From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21186 invoked by alias); 28 Aug 2019 13:54:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21173 invoked by uid 89); 28 Aug 2019 13:54:41 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-5.5 required=5.0 tests=AWL,BAYES_00,KAM_NUMSUBJECT,KAM_SHORT,SPF_PASS autolearn=no version=3.3.1 spammy=sandoe, Sandoe, Iain, H*f:sk:A52FFD3 X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 28 Aug 2019 13:54:39 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4FFCDABED; Wed, 28 Aug 2019 13:54:37 +0000 (UTC) Date: Wed, 28 Aug 2019 14:00:00 -0000 From: Richard Biener To: Iain Sandoe cc: "gcc-patches@gcc.gnu.org" , "H. J. Lu" , Uros Bizjak Subject: Re: [PATCH] Fix PR91522 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="-1609908220-1722345000-1567000477=:32458" X-SW-Source: 2019-08/txt/msg01894.txt.bz2 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609908220-1722345000-1567000477=:32458 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-length: 7231 On Wed, 28 Aug 2019, Iain Sandoe wrote: > > > > > On 26 Aug 2019, at 11:06, Uros Bizjak wrote: > > > > On Mon, Aug 26, 2019 at 10:40 AM Richard Biener wrote: > >> > >> On Fri, 23 Aug 2019, Richard Biener wrote: > >> > >>> On Fri, 23 Aug 2019, Richard Biener wrote: > >>> > >>>> On Fri, 23 Aug 2019, Uros Bizjak wrote: > >>>> > >>>>> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener wrote: > >>>>>> > >>>>>> > >>>>>> This fixes quadraticness in STV and makes > >>>>>> > >>>>>> machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( > >>>>>> 95%) 54 kB ( 0%) > >>>>>> > >>>>>> drop to zero. Anybody remembers why it is the way it is now? > >>>>>> > >>>>>> Bootstrap / regtest running on x86_64-unknown-linux-gnu. > >>>>>> > >>>>>> OK? > >>>>> > >>>>> Looking at the PR, comment #3 [1], I assume this patch is obsoltete > >>>>> and will be updated? > >>>>> > >>>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 > >>>> > >>>> Yes. I'm still learning how STV operates (and learing DF and RTL...). > >>>> The following is a rewrite of the non-TImode chain conversion > >>>> according to I think how it should operate als allowing the hunk > >>>> that fixes the compile-time and fixing PR91527 on the way > >>>> (which I ran into during more extensive testing of the patch myself). > >>>> > >>>> So compared to the state before which I still do not 100% understand > >>>> we now do the following. Chain detection works as before including > >>>> recording of all defs (both defined by the insns in the chain and > >>>> insns outside) that need copy-in or copy-out operations. > >>>> > >>>> But then the patch changes things as to guarantee that > >>>> after the conversion all uses/defs of a pseudo are > >>>> of the (subreg:Vmode ..) form or of the original scalar form. > >>>> In particular it avoids the need to change any insns that > >>>> are not part of the chain (besides emitting the extra copy > >>>> instructions). For this all defs that were marked as needing > >>>> copies (thus they have uses/defs both in the chain and outside) > >>>> the chain will use a new pseudo that we copy to from scalar sources > >>>> and that we copy from for scalar uses. There's the new defs_map > >>>> which records the mapping of old to new reg. pseudos that are > >>>> only used in the chain already are not remapped. > >>>> > >>>> The conversion itself then happens in two stages, first, > >>>> in make_vector_copies, we emit the copy-in insns and > >>>> allocate all pseudos we need. Then the rest of the > >>>> conversion happens fully inside of convert_insn where > >>>> we generate the copy-outs of the insns def, replace > >>>> defs and uses according to the mapping and replace uses > >>>> and defs with the (subreg:Vmode ..) style. > >>>> > >>>> For PR91527 IRA doesn't like the REG_EQUIV note in > >>>> > >>>> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) > >>>> (subreg:V4SI (reg:SI 100) 0)) > >>>> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 > >>>> 1248 {movv4si_internal} > >>>> (expr_list:REG_DEAD (reg:SI 100) > >>>> (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) > >>>> (const_int 16 [0x10])) [1 c+0 S4 A64]) > >>>> (nil)))) > >>>> > >>>> because the SET_DEST is not a REG_P. I'm not sure if this > >>>> is invalid RTL, docs say SET_DEST might be a strict_low_part > >>>> or a zero_extract but doesn't mention a subreg. So I opted > >>>> to simply remove equal/equiv notes on insns we convert > >>>> and since the above has a REG_DEAD note I took the liberty > >>>> to update that according to the mapping (so that would have > >>>> been not needed before this patch) rather than dropping it. > >>>> > >>>> Bootstrapped with and without --with-march=westmere (to get > >>>> some STV coverage, this included all languages) on > >>>> x88_64-unknown-linux-gnu, testing in progress. > >>>> > >>>> OK if testing succeeds? > >>> > >>> Testing revealed I made an error in general_scalar_chain::convert_insn > >>> failing to move down SET_SRC extraction below replacing with > >>> the defs map. This showed in 4 execute FAILs in 32bit fortran > >>> testing (only). Fixed by moving down the whole def_set/src/dst > >>> extraction. > >>> > >>> Re-testing on x86_64-unknown-linux-gnu. > >> > >> Bootstrapped / tested on x86_64-unknown-linux-gnu. The "no-costmodel" > >> run runs into the latent PR91528 building 32bit libada in stage3 > >> for a few sources, I've manually built those with -mno-stv and > >> bootstrap finishes successfully. I hope HJ can help with this > >> dynamic stack-alignment issue. > >> > >> So - OK for trunk? > >> > >> As followup we can now remove general_remove_non_convertible_regs > >> because we can handle defs that cannot be converted just fine > >> with the patch since we're splitting live-ranges of all defs at > >> the chain boundary. > >> > >> Thanks, > >> Richard. > >> > >>> Updated patch below. I'm feeling adventurous and will run > >>> the "westmere" bootstrap with costing disabled (aka always > >>> convert detected chains...). > >>> > >>> Richard. > >>> > >>> 2019-08-23 Richard Biener > >>> > >>> PR target/91522 > >>> PR target/91527 > >>> * config/i386/i386-features.h (general_scalar_chain::defs_map): > >>> New member. > >>> (general_scalar_chain::replace_with_subreg): Remove. > >>> (general_scalar_chain::replace_with_subreg_in_insn): Likewise. > >>> (general_scalar_chain::convert_reg): Adjust signature. > >>> * config/i386/i386-features.c (scalar_chain::add_insn): Do not > >>> iterate over all defs of a reg. > >>> (general_scalar_chain::replace_with_subreg): Remove. > >>> (general_scalar_chain::replace_with_subreg_in_insn): Likewise. > >>> (general_scalar_chain::make_vector_copies): Populate defs_map, > >>> place copy only after defs that are used as vectors in the chain. > >>> (general_scalar_chain::convert_reg): Emit a copy for a specific > >>> def in a specific instruction. > >>> (general_scalar_chain::convert_op): All reg uses are converted here. > >>> (general_scalar_chain::convert_insn): Emit copies for scalar > >>> uses of defs here. Replace uses with the copies we created. > >>> Replace and convert the def. Adjust REG_DEAD notes, remove > >>> REG_EQUIV/EQUAL notes. > >>> (general_scalar_chain::convert_registers): Only handle copies > >>> into the chain here. > > > > Rubberstamped with LGTM. It looks you are the master of this domain now ;) > > This breaks bootstrap for i686-darwin (and most likely is the cause of bootstrap fail on > i686-linux i686-linux-gnu at https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00398.html > et al) > It gives a bunch of compare errors spread around the tree - so no specific pointer there. > > There’s a secondary fail overlaying it between 274933-or so and 274980 which confused > my initial search. Please file a bugreport. Don't have time today to dig into but will do tomorrow (but now at least try to reproduce). Richard. ---1609908220-1722345000-1567000477=:32458--