From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by sourceware.org (Postfix) with ESMTPS id 5F7FC3858C52 for ; Thu, 1 Dec 2022 08:23:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5F7FC3858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-3b5d9050e48so9866037b3.2 for ; Thu, 01 Dec 2022 00:23:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=CLZcgOycqwQ89dgoYDkgAMXgKiuTA1zkoQmReCkNsN0=; b=Y/CJXubxzPWUqlhk47rh76N/W/7CPZOCj88yz7BnKWAbEx2qOXF1b/ZcEC+yX5r+sQ JHMLYAOilKIJJp8hPdxDMEEi/wJs5sIw+SZfqQ/HkgoqaTfy7oDH0gJFYj0RZ1aq9/4h KgECJlvnD72zbHHJ+NbF8YIRuiTy3Mfrr5j2rr1XSlaKLqHcRcHtppzSXfFgmNm2LNWs wFvoby/8Ts2aQcAvNrC367GYswoqqaOWaTGBcYnlRROnpkb0djufX1FmJ2UFfwxpnDJk uM1jzuFak5PPvQ6pHuZwMbD0BGv9KrkKanMnut1QVii1hWI/qBUSGGVGpuTwEZLball0 Z4fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CLZcgOycqwQ89dgoYDkgAMXgKiuTA1zkoQmReCkNsN0=; b=7988Yrf7NwvIYmhBc2RNx9Pu2ZvMrBHqS/drMF7AD3E0t8EeXbpmAnaDZgIamZFi+T 5iA8xGpQyqPciCHb18PqsUhuuQkcUo6D+OFA3nOF7sXWRTQao12/TkMacy+f54rzE4qP j9Ix7z9PWo6rDgl7sTNqvnTJCEKRliDqoMxoqiub27oDBl6zp2eJ3VDhdu/4+M8OUiuX 72XLzZmULAFPbstp4vdnuZxRlevJ1DrUAYo6veeC8xvi13KVjWcO09I0XqvMkpFI1Z0l k1CUs9MmE2mJ1Qh5+CcGTWnOw9vi7UtTqBgF4ZrAKBM7x9O65OyuRUhwE+ri/UbiPVZE w9Gg== X-Gm-Message-State: ANoB5pn4K5NCDM9sAe9+p4I1C3Gud/J7KiyXX4GTlmRPRv3Qe9ZgDgXq eZP98gKec8k9YwIs6tID4LkRdEZvdflUVk+1t8A= X-Google-Smtp-Source: AA0mqf7B7gwf7tdUSI83z2e0XuRwFRyeG9540lQrhUQ42IG8+UfcBcPPoeWarRoclkEfA9rAqS+PrsxPc3SB5E+ar3s= X-Received: by 2002:a81:4642:0:b0:36a:ca92:d207 with SMTP id t63-20020a814642000000b0036aca92d207mr47219853ywa.429.1669883029403; Thu, 01 Dec 2022 00:23:49 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Uros Bizjak Date: Thu, 1 Dec 2022 09:23:38 +0100 Message-ID: Subject: Re: [PATCH] i386: Improve *concat3_{1,2,3,4} patterns [PR107627] To: Jakub Jelinek Cc: Roger Sayle , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Dec 1, 2022 at 9:10 AM Jakub Jelinek wrote: > > Hi! > > On the first testcase we've regressed since 12 at -O2: > - movq 8(%rsi), %rax > - movq %rdi, %r8 > - movq (%rsi), %rdi > + movq (%rsi), %rax > + movq 8(%rsi), %r8 > movl %edx, %ecx > - shrdq %rdi, %rax > - movq %rax, (%r8) > + xorl %r9d, %r9d > + movq %rax, %rdx > + xorl %eax, %eax > + orq %r8, %rax > + orq %r9, %rdx > + shrdq %rdx, %rax > + movq %rax, (%rdi) > On the second testcase we've emitted such terrible code > with the useless xors and ors for a long time. > For PR91681 the *concat3_{1,2,3,4} patterns have been added > but they allow just register inputs and register or memory offsettable > output. > The following patch fixes this by allowing also memory inputs on those > patterns, because the pattern is then split to 0-2 emit_move_insns or > one xchg and those can handle loads from memory too just fine. > So that we don't narrow memory loads (source has 128-bit (or for ia32 > 64-bit) load and we would make 64-bit (or for ia32 32-bit) load out of it), > register_operand -> nonmemory_operand change is done only for operands > in zero_extend arguments. o <- m, m or o <- m, r or o <- r, m alternatives > aren't used, we'd lack registers to perform the moves. But what is > in addition to the current ro <- r, r supported are r <- m, r and r <- r, m > (in that case we just need to be careful about corner cases, see what > emit_move_insn we'd call and if we wouldn't clobber registers used in m's > address before loading - split_double_concat handles that now) and > &r <- m, m (in that case I think the early clobber is the easiest solution). > > The first testcase then on 12 -> patched trunk at -O2 changes: > - movq 8(%rsi), %rax > - movq %rdi, %r8 > - movq (%rsi), %rdi > + movq 8(%rsi), %r9 > + movq (%rsi), %r10 > movl %edx, %ecx > - shrdq %rdi, %rax > - movq %rax, (%r8) > + movq %r9, %rax > + shrdq %r10, %rax > + movq %rax, (%rdi) > so same amount of instructions and second testcase 12 -> patched trunk > at -O2 -m32: > - pushl %edi > - xorl %edi, %edi > pushl %esi > - movl 16(%esp), %esi > + pushl %ebx > + movl 16(%esp), %eax > movl 20(%esp), %ecx > - movl (%esi), %eax > - movl 4(%esi), %esi > - movl %eax, %edx > - movl $0, %eax > - orl %edi, %edx > - orl %esi, %eax > - shrdl %edx, %eax > movl 12(%esp), %edx > + movl 4(%eax), %ebx > + movl (%eax), %esi > + movl %ebx, %eax > + shrdl %esi, %eax > movl %eax, (%edx) > + popl %ebx > popl %esi > - popl %edi > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > BTW, I wonder if we couldn't add additional patterns which would catch > the case where one of the operands is constant and how does this interact > with the stv pass in 32-bit mode where I think stv is right after combine, > so if we match these patterns, perhaps it would be nice to handle them > in stv (unless they are handled there already). IIRC, STV handles only loads of 0 and -1, everything else has to be loaded from memory. So, it leaves e.g.: long long a; long long test (void) { return a + 1; } unconverted. > > 2022-12-01 Jakub Jelinek > > PR target/107627 > * config/i386/i386.md (*concat3_1, *concat3_2): > For operands which are zero_extend arguments allow memory if > output operand is a register. > (*concat3_3, *concat3_4): Likewise. If > both input operands are memory, use early clobber on output operand. > * config/i386/i386-expand.cc (split_double_concat): Deal with corner > cases where one input is memory and the other is not and the address > of the memory input uses a register we'd overwrite before loading > the memory into a register. > > * gcc.target/i386/pr107627-1.c: New test. > * gcc.target/i386/pr107627-2.c: New test. LGTM. Thanks, Uros. > > --- gcc/config/i386/i386.md.jj 2022-11-28 10:13:17.758656933 +0100 > +++ gcc/config/i386/i386.md 2022-11-30 12:11:55.724474793 +0100 > @@ -11396,11 +11396,12 @@ (define_insn "*xorqi_ext_1_cc" > ;; Split DST = (HI<<32)|LO early to minimize register usage. > (define_code_iterator any_or_plus [plus ior xor]) > (define_insn_and_split "*concat3_1" > - [(set (match_operand: 0 "nonimmediate_operand" "=ro") > + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r") > (any_or_plus: > - (ashift: (match_operand: 1 "register_operand" "r") > + (ashift: (match_operand: 1 "register_operand" "r,r") > (match_operand: 2 "const_int_operand")) > - (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))] > + (zero_extend: > + (match_operand:DWIH 3 "nonimmediate_operand" "r,m"))))] > "INTVAL (operands[2]) == * BITS_PER_UNIT" > "#" > "&& reload_completed" > @@ -11412,10 +11413,11 @@ (define_insn_and_split "*concat }) > > (define_insn_and_split "*concat3_2" > - [(set (match_operand: 0 "nonimmediate_operand" "=ro") > + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r") > (any_or_plus: > - (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) > - (ashift: (match_operand: 2 "register_operand" "r") > + (zero_extend: > + (match_operand:DWIH 1 "nonimmediate_operand" "r,m")) > + (ashift: (match_operand: 2 "register_operand" "r,r") > (match_operand: 3 "const_int_operand"))))] > "INTVAL (operands[3]) == * BITS_PER_UNIT" > "#" > @@ -11428,12 +11430,14 @@ (define_insn_and_split "*concat }) > > (define_insn_and_split "*concat3_3" > - [(set (match_operand: 0 "nonimmediate_operand" "=ro") > + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r") > (any_or_plus: > (ashift: > - (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) > + (zero_extend: > + (match_operand:DWIH 1 "nonimmediate_operand" "r,m,r,m")) > (match_operand: 2 "const_int_operand")) > - (zero_extend: (match_operand:DWIH 3 "register_operand" "r"))))] > + (zero_extend: > + (match_operand:DWIH 3 "nonimmediate_operand" "r,r,m,m"))))] > "INTVAL (operands[2]) == * BITS_PER_UNIT" > "#" > "&& reload_completed" > @@ -11444,11 +11448,13 @@ (define_insn_and_split "*concat }) > > (define_insn_and_split "*concat3_4" > - [(set (match_operand: 0 "nonimmediate_operand" "=ro") > + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r") > (any_or_plus: > - (zero_extend: (match_operand:DWIH 1 "register_operand" "r")) > + (zero_extend: > + (match_operand:DWIH 1 "nonimmediate_operand" "r,m,r,m")) > (ashift: > - (zero_extend: (match_operand:DWIH 2 "register_operand" "r")) > + (zero_extend: > + (match_operand:DWIH 2 "nonimmediate_operand" "r,r,m,m")) > (match_operand: 3 "const_int_operand"))))] > "INTVAL (operands[3]) == * BITS_PER_UNIT" > "#" > --- gcc/config/i386/i386-expand.cc.jj 2022-11-28 10:13:17.703657740 +0100 > +++ gcc/config/i386/i386-expand.cc 2022-11-30 13:27:44.851737861 +0100 > @@ -173,6 +173,33 @@ split_double_concat (machine_mode mode, > rtx dlo, dhi; > int deleted_move_count = 0; > split_double_mode (mode, &dst, 1, &dlo, &dhi); > + /* Constraints ensure that if both lo and hi are MEMs, then > + dst has early-clobber and thus addresses of MEMs don't use > + dlo/dhi registers. Otherwise if at least one of li and hi are MEMs, > + dlo/dhi are registers. */ > + if (MEM_P (lo) > + && rtx_equal_p (dlo, hi) > + && reg_overlap_mentioned_p (dhi, lo)) > + { > + /* If dlo is same as hi and lo's address uses dhi register, > + code below would first emit_move_insn (dhi, hi) > + and then emit_move_insn (dlo, lo). But the former > + would invalidate lo's address. Load into dhi first, > + then swap. */ > + emit_move_insn (dhi, lo); > + lo = dhi; > + } > + else if (MEM_P (hi) > + && !MEM_P (lo) > + && !rtx_equal_p (dlo, lo) > + && reg_overlap_mentioned_p (dlo, hi)) > + { > + /* In this case, code below would first emit_move_insn (dlo, lo) > + and then emit_move_insn (dhi, hi). But the former would > + invalidate hi's address. Load into dhi first. */ > + emit_move_insn (dhi, hi); > + hi = dhi; > + } > if (!rtx_equal_p (dlo, hi)) > { > if (!rtx_equal_p (dlo, lo)) > --- gcc/testsuite/gcc.target/i386/pr107627-1.c.jj 2022-11-30 13:52:11.654818924 +0100 > +++ gcc/testsuite/gcc.target/i386/pr107627-1.c 2022-11-30 13:53:40.288496872 +0100 > @@ -0,0 +1,22 @@ > +/* PR target/107627 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-O2 -masm=att" } */ > +/* { dg-final { scan-assembler-not "\torq\t" } } */ > + > +static inline unsigned __int128 > +foo (unsigned long long x, unsigned long long y) > +{ > + return ((unsigned __int128) x << 64) | y; > +} > + > +static inline unsigned long long > +bar (unsigned long long x, unsigned long long y, unsigned z) > +{ > + return foo (x, y) >> (z % 64); > +} > + > +void > +baz (unsigned long long *x, const unsigned long long *y, unsigned z) > +{ > + x[0] = bar (y[0], y[1], z); > +} > --- gcc/testsuite/gcc.target/i386/pr107627-2.c.jj 2022-11-30 13:52:14.890770658 +0100 > +++ gcc/testsuite/gcc.target/i386/pr107627-2.c 2022-11-30 13:53:32.863607618 +0100 > @@ -0,0 +1,22 @@ > +/* PR target/107627 */ > +/* { dg-do compile { target ia32 } } */ > +/* { dg-options "-O2 -masm=att" } */ > +/* { dg-final { scan-assembler-not "\torl\t" } } */ > + > +static inline unsigned long long > +qux (unsigned int x, unsigned int y) > +{ > + return ((unsigned long long) x << 32) | y; > +} > + > +static inline unsigned int > +corge (unsigned int x, unsigned int y, unsigned z) > +{ > + return qux (x, y) >> (z % 32); > +} > + > +void > +garply (unsigned int *x, const unsigned int *y, unsigned z) > +{ > + x[0] = corge (y[0], y[1], z); > +} > > Jakub >