From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe2e.google.com (mail-vs1-xe2e.google.com [IPv6:2607:f8b0:4864:20::e2e]) by sourceware.org (Postfix) with ESMTPS id B493F38708F8; Wed, 20 Jan 2021 04:37:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B493F38708F8 Received: by mail-vs1-xe2e.google.com with SMTP id f22so3919284vsk.11; Tue, 19 Jan 2021 20:37:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=OMoFMbRMd8y61JfgaMaTBi2kMiop7empPJDAKQP0FRU=; b=ZUqu9+2TVQYQ0A+V0DzMlDgpmdt76RArlIfk770d6Gd8U8LQGB9e1TGBNYuSnbksAk 0EBduGQPEnkfU47qxfOv1MPV72ddYuinwIa0tzvns1VEr5CA/otQvlmyp96ggkBlID0J nNPeHFGHq7qwOtYtzgjn3zCtEubaNu/OeeoYez50QQSEkWeriUKnwE4ffctekcSuAxAE 3iw5balN1DPKMtLTTSluk4CRXIbtv8QPq7kOMbGd67ShU0VWkFO2QOBa5efQhVM52wHr dL+UgghQv7oOVs9I1jPTswiIB9x9FK7XZQQTldunc92pEn6wBsDq0gNvm3KBxiO5uLkq aj9A== X-Gm-Message-State: AOAM531iId0I4clhu9JtBi9ywx9yNVQL040QMh9OZ6Js8iBCzXbeU0PR XZm1YhIMdc84qasTBfKTzTA6E3uTAqvthEKL3A4ijvNR X-Google-Smtp-Source: ABdhPJwv9ctruVMAgS1YrpUGe9EPNvdg6sBRMcGaNb6+GeO9XTG7IE4L/Owxv457kxpP5PqwxwaAGftTezFCG9bUB/A= X-Received: by 2002:a05:6102:30ac:: with SMTP id y12mr4049743vsd.48.1611117460058; Tue, 19 Jan 2021 20:37:40 -0800 (PST) MIME-Version: 1.0 References: <20210119144514.GA4020736@tucnak> In-Reply-To: From: Hongtao Liu Date: Wed, 20 Jan 2021 12:40:33 +0800 Message-ID: Subject: Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg. To: Jakub Jelinek via Gcc-patches , Hongtao Liu , ebotcazou@libertysurf.fr, steven@gcc.gnu.org, Jakub Jelinek , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jan 2021 04:37:42 -0000 On Wed, Jan 20, 2021 at 12:35 PM Hongtao Liu wrote: > > On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford > wrote: > > > > Jakub Jelinek via Gcc-patches writes: > > > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-p= atches wrote: > > >> > actually only the lower 16bits are needed, the original insn is li= ke > > >> > > > >> > .294.r.ira > > >> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ]) > > >> > (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76 > > >> > {*movhi_internal} > > >> > (nil)) > > >> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ]) > > >> > (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _= 52 > > >> > ]) 0)))) 1412 {*vec_dupv4hi} > > >> > (nil)) > > >> > > > >> > .295r.reload > > >> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96]) > > >> > (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 7= 6 > > >> > {*movhi_internal} > > >> > (nil)) > > >> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297]) > > >> > (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal} > > >> > (nil)) > > >> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140]) > > >> > (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) > > >> > 1412 {*vec_dupv4hi} > > >> > (nil)) > > >> > > > >> > and insn 489 is created by lra/reload which seems ok for the seque= nce, > > >> > but problemistic with considering the logic of hardreg_cprop. > > >> > > >> It looks OK even with the regcprop behaviour though: > > >> > > >> - insn 69 defines only the low 16 bits of di, > > >> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-= 31 > > >> too (with unknown contents) > > >> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents > > >> introduced by insn 489 are truncated away) > > >> > > >> So where do bits 16-31 become significant? What goes wrong if they'= re > > >> not zero? > > > > > > The k0 register is initialized I believe with > > > (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82]) > > > (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40] ) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_inte= rnal} > > > (nil)) > > > and so it contains all 64-bits, and then the code sometimes uses all = the > > > bits, sometimes just the low 16-bits and sometimes low 32-bits of tha= t > > > value. > > > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96]) > > > (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 7= 6 {*movhi_internal} > > > (nil)) > > > (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149]) > > > (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 14= 4 {*zero_extendhisi2} > > > (nil)) > > > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297]) > > > (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal} > > > (nil)) > > > (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140]) > > > (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 14= 12 {*vec_dupv4hi} > > > (expr_list:REG_DEAD (reg:SI 22 xmm2 [297]) > > > (nil))) > > > are examples when it uses only the low 16 bits from that, and > > > (insn 487 72 73 12 (set (reg:SI 1 dx [148]) > > > (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal= } > > > (nil)) > > > > > > (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86]) > > > (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 7= 5 {*movsi_internal} > > > (nil)) > > > > > > (insn 491 85 88 13 (set (reg:SI 3 bx [299]) > > > (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal= } > > > (nil)) > > > (insn 88 491 89 13 (set (reg:CCNO 17 flags) > > > (compare:CCNO (reg:SI 3 bx [299]) > > > (const_int 0 [0]))) 7 {*cmpsi_ccno_1} > > > (expr_list:REG_DEAD (reg:SI 3 bx [299]) > > > (nil))) > > > > > > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86]) > > > (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*mo= vsi_internal} > > > (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86]) > > > (nil))) > > > are examples where it uses low 32-bits from k0. > > > So the > > > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86]) > > > - (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*m= ovsi_internal} > > > - (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86]) > > > + (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {= *movsi_internal} > > > + (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86]) > > > (nil))) > > > cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it ho= lds > > > only the low 16-bits of the value and has the upper bits undefined, w= hile r9 > > > it is replacing had all of the low 32-bits well defined. > > > > Ah, ok, thanks for the extra context. > > > > So AIUI the problem when recording xmm2<-di isn't just: > > > > [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src)) > > > > but also that: > > > > [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mo= de) > > > > For example, all registers in this sequence can be part of the same cha= in: > > > > (set (reg:HI R1) (reg:HI R0)) > > (set (reg:SI R2) (reg:SI R1)) // [A] > > (set (reg:DI R3) (reg:DI R2)) // [A] > > (set (reg:SI R4) (reg:SI R[0-3])) > > (set (reg:HI R5) (reg:HI R[0-4])) > > > > But: > > > > (set (reg:SI R1) (reg:SI R0)) > > (set (reg:HI R2) (reg:HI R1)) > > (set (reg:SI R3) (reg:SI R2)) // [A] && [B] > > > > is problematic because it dips below the precision of the oldest regno > > and then increases again. > > > > When this happens, I guess we have two choices: > > > > (1) what the patch does: treat R3 as the start of a new chain. > > (2) pretend that the copy occured in vd->e[sr].mode instead > > (i.e. copy vd->e[sr].mode to vd->e[dr].mode) > > > > I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P. > > Maybe the optimisation provided by (2) compared to (1) isn't common > > enough to be worth the complication. > > > > I think we should test [B] as well as [A] though. The pass is set > > up to do some quite elaborate mode changes and I think rejecting > > [A] on its own would make some of the other code redundant. > > It also feels like it should be a seperate =E2=80=9Cif=E2=80=9D or =E2= =80=9Celse if=E2=80=9D, > > with its own comment. > > > Update patch. Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Thanks, > > Richard > > > > -- > BR, > Hongtao --=20 BR, Hongtao