Hi Uros, Many thanks for your speedy review. This revised patch implements all three of your recommended improvements; the use of ix86_binary_operator_ok with code UNKNOWN, the removal of "n" constraints from const_int_operand predicates, and the use of force_reg (for input operands, and REG_P for output operands) to ensure that it's always safe to call gen_lowpart/gen_highpart. [If we proceed with the recent proposal to split double word addition, subtraction and other operations before reload, then these new add/sub variants should be updated at the same time, but for now this patch keeps double word patterns consistent]. This revised patch has been retested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2022-06-05 Roger Sayle Uroš Bizjak gcc/ChangeLog PR target/91681 * config/i386/i386.md (zero_extendditi2): New define_insn_and_split. (*add3_doubleword_zext): New define_insn_and_split. (*sub3_doubleword_zext): New define_insn_and_split. (*concat3_1): New define_insn_and_split replacing previous define_split for implementing DST = (HI<<32)|LO as pair of move instructions, setting lopart and hipart. (*concat3_2): Likewise. (*concat3_3): Likewise, where HI is zero_extended. (*concat3_4): Likewise, where HI is zero_extended. * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec. (kunpcksi): Likewise, add UNSPEC_MASKOP unspec. (kunpckdi): Likewise, add UNSPEC_MASKOP unspec. (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP unspec. (vec_pack_trunc_): Likewise. gcc/testsuite/ChangeLog PR target/91681 * g++.target/i386/pr91681.C: New test case (from the PR). * gcc.target/i386/pr91681-1.c: New int128 test case. * gcc.target/i386/pr91681-2.c: Likewise. * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. Thanks again, Roger -- > -----Original Message----- > From: Uros Bizjak > Sent: 03 June 2022 11:08 > To: Roger Sayle > Cc: GCC Patches > Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more > optimizations. > > On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle > wrote: > > > > > > Technically, PR target/91681 has already been resolved; we now > > recognize the highpart multiplication at the tree-level, we no longer > > use the stack, and we currently generate the same number of > > instructions as LLVM. However, it is still possible to do better, the > > current x86_64 code to generate a double word addition of a zero extended > operand, looks like: > > > > xorl %r11d, %r11d > > addq %r10, %rax > > adcq %r11, %rdx > > > > when it's possible (as LLVM does) to use an immediate constant: > > > > addq %r10, %rax > > adcq $0, %rdx > > > > To do this, the backend required one or two simple changes, that then > > themselves required one or two more obscure tweaks. > > > > The simple starting point is to define a zero_extendditi2 pattern, for > > zero extension from DImode to TImode on TARGET_64BIT that is split > > after reload. Double word (TImode) addition/subtraction is split > > after reload, so that constrains when things should happen. > > > > With zero extension now visible to combine, we add two new > > define_insn_and_split that add/subtract a zero extended operand in > > double word mode. These apply to both 32-bit and 64-bit code > > generation, to produce adc $0 and sbb $0. > > > > The first strange tweak is that these new patterns interfere with the > > optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair of > > register moves, or more accurately the combine splitter no longer > > triggers as we're now converting two instructions into two > > instructions (not three instructions into two instructions). This is > > easily repaired (and extended to handle TImode) by changing from a > > pair of define_split (that handle operand commutativity) to a set of > > four define_insn_and_split (again to handle operand commutativity). > > > > The other/final strange tweak that the above splitters now interfere > > with AVX512's kunpckdq instruction which is defined as identical RTL, > > DW:DI = (HI:SI<<32)|zero_extend(LO:SI). To distinguish this, and also > > avoid AVX512 mask registers being used by reload to perform SImode > > scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP) to the > > unpack mask operations, which matches what sse.md does for the other > > mask specific (logic) operations. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32}, > > with no new failures. Ok for mainline? > > > > > > 2022-06-03 Roger Sayle > > > > gcc/ChangeLog > > PR target/91681 > > * config/i386/i386.md (zero_extendditi2): New define_insn_and_split. > > (*add3_doubleword_zext): New define_insn_and_split. > > (*sub3_doubleword_zext): New define_insn_and_split. > > (*concat3_1): New define_insn_and_split replacing > > previous define_split for implementing DST = (HI<<32)|LO as > > pair of move instructions, setting lopart and hipart. > > (*concat3_2): Likewise. > > (*concat3_3): Likewise, where HI is zero_extended. > > (*concat3_4): Likewise, where HI is zero_extended. > > * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec. > > (kunpcksi): Likewise, add UNSPEC_MASKOP unspec. > > (kunpckdi): Likewise, add UNSPEC_MASKOP unspec. > > (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP > > unspec. > > (vec_pack_trunc_): Likewise. > > > > gcc/testsuite/ChangeLog > > PR target/91681 > > * g++.target/i386/pr91681.C: New test case (from the PR). > > * gcc.target/i386/pr91681-1.c: New int128 test case. > > * gcc.target/i386/pr91681-2.c: Likewise. > > * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. > > + "MEM_P (operands[0]) ? rtx_equal_p (operands[0], operands[1]) && > + !MEM_P (operands[2]) > + : !MEM_P (operands[1])" > > Can we use ix86_binary_operator_ok (UNKNOWN, ...mode..., operands) here > instead? > > (UNKNOWN RTX code is used to prevent unwanted optimization with > commutative operands). > > Uros.