public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue
@ 2022-02-08  4:29 crazylht at gmail dot com
  2022-02-08  9:15 ` [Bug rtl-optimization/104438] Combine optimization opportunity " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-02-08  4:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

            Bug ID: 104438
           Summary: Combine optimization exposed after pro_and_epilogue
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: x86_64-*-* i?86-*-*

#include<stdint.h>
#include<immintrin.h>

static __m256i __attribute__((always_inline)) load8bit_4x4_avx2(const uint8_t
*const src,
    const uint32_t stride)
{
    __m128i src01, src23;
    src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride));
    src01 = _mm_insert_epi32(src01, *(int32_t *)(src + 1 * stride), 1);
    src23 = _mm_cvtsi32_si128(*(int32_t*)(src + 2 * stride));
    src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1);
    return _mm256_setr_m128i(src01, src23);
}

uint32_t  compute4x_m_sad_avx2_intrin(
    uint8_t  *src,         // input parameter, source samples Ptr
    uint32_t  src_stride,  // input parameter, source stride
    uint8_t  *ref,         // input parameter, reference samples Ptr
    uint32_t  ref_stride,  // input parameter, reference stride
    uint32_t  height,      // input parameter, block height (M)
    uint32_t  width)       // input parameter, block width (N)
{
    __m128i xmm0;
    __m256i ymm = _mm256_setzero_si256();
    uint32_t y;
    (void)width;

    for (y = 0; y < height; y += 4) {
        const __m256i src0123 = load8bit_4x4_avx2(src, src_stride);
        const __m256i ref0123 = load8bit_4x4_avx2(ref, ref_stride);
        ymm = _mm256_add_epi32(ymm, _mm256_sad_epu8(src0123, ref0123));
        src += src_stride << 2;
        ref += ref_stride << 2;
    }

    xmm0 = _mm_add_epi32(_mm256_castsi256_si128(ymm),
        _mm256_extracti128_si256(ymm, 1));

    return (uint32_t)_mm_cvtsi128_si32(xmm0);
}  




gcc -O2 -mavx2 -S



suboptimal asm

.L4:
        vpxor   xmm3, xmm3, xmm3      # 12        [c=4 l=4]  movv4di_internal/0
        vpxor   xmm0, xmm0, xmm0      # 11        [c=4 l=4]  movv8si_internal/0
        vextracti128    xmm3, ymm3, 0x1 # 409     [c=4 l=6] 
vec_extract_hi_v4di
        vpaddd  xmm0, xmm0, xmm3    # 429   [c=4 l=4]  *addv4si3/1
        vmovd   eax, xmm0     # 430     [c=4 l=4]  *movsi_internal/12
        ret       # 437       [c=0 l=1]  simple_return_internal

It can be optimized to just

        xor eax, eax

Before pro_and_epilogue, cfg is like

.L2
...asm...
jmp .L4

.L3:
        vpxor   xmm3, xmm3, xmm3      # 12        [c=4 l=4]  movv4di_internal/0
        vpxor   xmm0, xmm0, xmm0      # 11        [c=4 l=4]  movv8si_internal/0

.L4:

        vextracti128    xmm3, ymm3, 0x1 # 409     [c=4 l=6] 
vec_extract_hi_v4di
        vpaddd  xmm0, xmm0, xmm3    # 429   [c=4 l=4]  *addv4si3/1
        vmovd   eax, xmm0     # 430     [c=4 l=4]  *movsi_internal/12
        ret       # 437       [c=0 l=1]  simple_return_internal


And Since there're 2 predecessor bbs for .L4, it can't be optimized off, but
after pro_and_epilogue, GCC copy .L4 to .L2 and merge .L4 with .L3, and exposed
the opportunity.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
@ 2022-02-08  9:15 ` rguenth at gcc dot gnu.org
  2022-02-08  9:50 ` rsandifo at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-08  9:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org
          Component|target                      |rtl-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm not sure if passes like combine or fwprop would work well after RA but I'm
quite sure that moving prologue/epilogue generation and shrink-wrapping before
RA will not ;)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
  2022-02-08  9:15 ` [Bug rtl-optimization/104438] Combine optimization opportunity " rguenth at gcc dot gnu.org
@ 2022-02-08  9:50 ` rsandifo at gcc dot gnu.org
  2022-02-08 14:42 ` segher at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2022-02-08  9:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
fwprop should at least stand a chance of working that late.
Was wondering whether reinitialising the loop info would be
a problem, but we already do that later (when computing
alignments).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
  2022-02-08  9:15 ` [Bug rtl-optimization/104438] Combine optimization opportunity " rguenth at gcc dot gnu.org
  2022-02-08  9:50 ` rsandifo at gcc dot gnu.org
@ 2022-02-08 14:42 ` segher at gcc dot gnu.org
  2022-02-09  2:41 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: segher at gcc dot gnu.org @ 2022-02-08 14:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

--- Comment #3 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Also combine could work that late in principle: it can deal with hard
registers, after all.  But it would be a terrible idea.  A single combine
pass is expensive enough, we don't want to run it N times.  Also, if you
run combine more than once, you get odd effects, mostly because the results
of splitters are combined back again.

After *logue insertion all the simpler (more local!) optimisations are still
run (DSE, DCE, if conversion, const prop, peep).

It is unclear why the CFG wasn't straightened out here.  Is the bb commented
as "asm" actually asm?  Then GCC will not see it is very cheap/small, yeah.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2022-02-08 14:42 ` segher at gcc dot gnu.org
@ 2022-02-09  2:41 ` crazylht at gmail dot com
  2022-02-09  4:17 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-02-09  2:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
> It is unclear why the CFG wasn't straightened out here.  Is the bb commented
> as "asm" actually asm?  Then GCC will not see it is very cheap/small, yeah.

"asm" is not inline assembly. it's BB 6 below

(note 89 88 126 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
(call_insn 126 89 121 6 (parallel [
            (call (mem:QI (const_int 0 [0]) [0  S1 A8])
                (const_int 0 [0]))
            (unspec [
                    (const_int 1 [0x1])
                ] UNSPEC_CALLEE_ABI)
        ]) -1
     (expr_list:REG_EH_REGION (const_int -2147483648 [0xffffffff80000000])
        (nil))
    (nil))
(jump_insn 121 126 122 6 (set (pc)
        (label_ref 91)) 892 {jump}
     (nil)
 -> 91)
(barrier 122 121 112)
(code_label 112 122 111 7 4 (nil) [1 uses])
(note 111 112 11 7 [bb 7] NOTE_INSN_BASIC_BLOCK)
(insn 11 111 12 7 (set (reg:V8SI 20 xmm0 [orig:131 _168 ] [131])
        (const_vector:V8SI [
                (const_int 0 [0]) repeated x8
            ])) "test.c":28:19 1696 {movv8si_internal}
     (nil))
(insn 12 11 91 7 (set (reg/v:V4DI 23 xmm3 [orig:87 ymm ] [87])
        (const_vector:V4DI [
                (const_int 0 [0]) repeated x4
            ])) "test.c":24:19 1699 {movv4di_internal}
     (nil))
(code_label 91 12 92 8 2 (nil) [1 uses])
(note 92 91 94 8 [bb 8] NOTE_INSN_BASIC_BLOCK)
(insn 94 92 99 8 (set (reg:V2DI 23 xmm3 [176])
        (vec_select:V2DI (reg/v:V4DI 23 xmm3 [orig:87 ymm ] [87])
            (parallel [
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                ]))) "{vec_extract_hi_v4di}
     (nil))
(insn 99 94 127 8 (set (reg:V4SI 20 xmm0 [180])
        (plus:V4SI (reg:V4SI 20 xmm0 [178])
            (reg:V4SI 23 xmm3 [176]))) " {*addv4si3}
     (nil))
(insn 127 99 107 8 (set (reg:SI 0 ax [181])
        (reg:SI 20 xmm0 [180])) " {*movsi_internal}
     (nil))
(insn 107 127 123 8 (use (reg/i:SI 0 ax)) "test.c":40:1 -1
     (nil))
(note 123 107 0 NOTE_INSN_DELETED)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2022-02-09  2:41 ` crazylht at gmail dot com
@ 2022-02-09  4:17 ` crazylht at gmail dot com
  2022-02-10  3:00 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-02-09  4:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #4)
> > It is unclear why the CFG wasn't straightened out here.  Is the bb commented
> > as "asm" actually asm?  Then GCC will not see it is very cheap/small, yeah.
> 
> "asm" is not inline assembly. it's BB 6 below
> 
> (note 89 88 126 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
> (call_insn 126 89 121 6 (parallel [
>             (call (mem:QI (const_int 0 [0]) [0  S1 A8])
>                 (const_int 0 [0]))
>             (unspec [
>                     (const_int 1 [0x1])
>                 ] UNSPEC_CALLEE_ABI)
>         ]) -1
>      (expr_list:REG_EH_REGION (const_int -2147483648 [0xffffffff80000000])
>         (nil))
>     (nil))

According to PR104441, vzeroupper shouldn't be inserted here, after fix of
PR104441, CFG wasn't straightened.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
                   ` (4 preceding siblings ...)
  2022-02-09  4:17 ` crazylht at gmail dot com
@ 2022-02-10  3:00 ` crazylht at gmail dot com
  2022-02-10  3:01 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-02-10  3:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
The opportunity disappear after r12-7125.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
                   ` (5 preceding siblings ...)
  2022-02-10  3:00 ` crazylht at gmail dot com
@ 2022-02-10  3:01 ` crazylht at gmail dot com
  2022-02-24 10:46 ` marxin at gcc dot gnu.org
  2022-02-25  1:06 ` crazylht at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-02-10  3:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |INVALID

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
                   ` (6 preceding siblings ...)
  2022-02-10  3:01 ` crazylht at gmail dot com
@ 2022-02-24 10:46 ` marxin at gcc dot gnu.org
  2022-02-25  1:06 ` crazylht at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-02-24 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #6)
> The opportunity disappear after r12-7125.

Can you please install the latest contrib/gcc-git-customization.sh? Doing that,
you will se:

$ git gcc-descr 
r12-7369-ga046033ea0ba97

So hash is added to revision.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/104438] Combine optimization opportunity exposed after pro_and_epilogue
  2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
                   ` (7 preceding siblings ...)
  2022-02-24 10:46 ` marxin at gcc dot gnu.org
@ 2022-02-25  1:06 ` crazylht at gmail dot com
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-02-25  1:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104438

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Martin Liška from comment #7)
> (In reply to Hongtao.liu from comment #6)
> > The opportunity disappear after r12-7125.
> 
> Can you please install the latest contrib/gcc-git-customization.sh? Doing
> that, you will se:
> 
> $ git gcc-descr 
> r12-7369-ga046033ea0ba97
> 
> So hash is added to revision.

Thanks for the reminder, will install.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-25  1:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08  4:29 [Bug target/104438] New: Combine optimization exposed after pro_and_epilogue crazylht at gmail dot com
2022-02-08  9:15 ` [Bug rtl-optimization/104438] Combine optimization opportunity " rguenth at gcc dot gnu.org
2022-02-08  9:50 ` rsandifo at gcc dot gnu.org
2022-02-08 14:42 ` segher at gcc dot gnu.org
2022-02-09  2:41 ` crazylht at gmail dot com
2022-02-09  4:17 ` crazylht at gmail dot com
2022-02-10  3:00 ` crazylht at gmail dot com
2022-02-10  3:01 ` crazylht at gmail dot com
2022-02-24 10:46 ` marxin at gcc dot gnu.org
2022-02-25  1:06 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).