Hi Uros, Happy New Year. As requested here's a revised version of my patch to introduce a pattern for extendditi2, but implementing your suggestion to re-use the existing extendsidi2_1 splitters and peephole2 optimizations by using DWI/DWIH mode iterators. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2023-01-01 Roger Sayle Uroš Bizjak gcc/ChangeLog * config/i386/i386.md (extendditi2): New define_insn. (define_split): Use DWIH mode iterator to treat new extendditi2 identically to existing extendsidi2_1. (define_peephole2): Likewise. (define_peephole2): Likewise. (define_split): Likewise. gcc/testsuite/ChangeLog * gcc.target/i386/extendditi2-1.c: New test case. * gcc.target/i386/extendditi2-2.c: Likewise. Thanks in advance, Roger -- > -----Original Message----- > From: Uros Bizjak > Sent: 28 December 2022 09:28 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [x86_64 PATCH] Add post-reload splitter for extendditi2. > > On Wed, Dec 28, 2022 at 1:32 AM Roger Sayle > wrote: > > > > > > This is another step towards a possible solution for PR 105137. > > This patch introduces a define_insn_and_split for extendditi2, that > > allows DImode to TImode sign-extension to be represented in the early > > RTL optimizers, before being split post-reload into the exact same > > idiom as currently produced by RTL expansion. > > Please see extendsidi2_1 insn pattern and follow-up splitters and > peephole2 patterns that do exactly what you want to achieve, but they are > currently handling only SImode to DImode on 32-bit targets. OTOH, these > patterns handle several more cases (e.g. split to the memory > output) and just have to be macroized with DWIH mode iterator to also handle > DImode to TImode on 64-bit targets. Probably, an extendsidi expander will have > to be slightly adjusted when macroized to signal middle end the availability of > extendditi pattern. > > Following macroization, any possible follow-up optimizations and improvements > will then be automatically applied also to 32-bit targets. > > Uros. > > > > > Typically this produces the identical code, so the first new test > > case: > > __int128 foo(long long x) { return (__int128)x; } > > > > continues to generate: > > foo: movq %rdi, %rax > > cqto > > ret > > > > The "magic" is that this representation allows combine and the other > > RTL optimizers to do a better job. Hence, the second test case: > > > > __int128 foo(__int128 a, long long b) { > > a += ((__int128)b) << 70; > > return a; > > } > > > > which mainline with -O2 currently generates as: > > > > foo: movq %rsi, %rax > > movq %rdx, %rcx > > movq %rdi, %rsi > > salq $6, %rcx > > movq %rax, %rdi > > xorl %eax, %eax > > movq %rcx, %rdx > > addq %rsi, %rax > > adcq %rdi, %rdx > > ret > > > > with this patch now becomes: > > foo: movl $0, %eax > > salq $6, %rdx > > addq %rdi, %rax > > adcq %rsi, %rdx > > ret > > > > i.e. the same code for the signed and unsigned extension variants. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32}, > > with no new failures. Ok for mainline? > > > > 2022-12-28 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386.md (extendditi2): New define_insn_and_split > > to split DImode to TImode sign-extension after reload. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/extendditi2-1.c: New test case. > > * gcc.target/i386/extendditi2-2.c: Likewise. > > > > > > Thanks in advance, > > Roger > > -- > >