This patch upgrades x86_64's scalar-to-vector (STV) pass to more aggressively transform 128-bit scalar TImode operations into vector V1TImode operations performed on SSE registers. TImode functionality already exists in STV, but only for move operations, this changes brings support for logical operations (AND, IOR, XOR, NOT and ANDN) and comparisons. The effect of these changes are conveniently demonstrated by the new sse4_1-stv-5.c test case: __int128 a[16]; __int128 b[16]; __int128 c[16]; void foo() { for (unsigned int i=0; i<16; i++) a[i] = b[i] & ~c[i]; } which when currently compiled on mainline wtih -O2 -msse4 produces: foo: xorl %eax, %eax .L2: movq c(%rax), %rsi movq c+8(%rax), %rdi addq $16, %rax notq %rsi notq %rdi andq b-16(%rax), %rsi andq b-8(%rax), %rdi movq %rsi, a-16(%rax) movq %rdi, a-8(%rax) cmpq $256, %rax jne .L2 ret but with this patch now produces: foo: xorl %eax, %eax .L2: movdqa c(%rax), %xmm0 pandn b(%rax), %xmm0 addq $16, %rax movaps %xmm0, a-16(%rax) cmpq $256, %rax jne .L2 ret Technically, the STV pass is implemented by three C++ classes, a common abstract base class "scalar_chain" that contains common functionality, and two derived classes: general_scalar_chain (which handles SI and DI modes) and timode_scalar_chain (which handles TI modes). As mentioned previously, because only TI mode moves were handled the two worker classes behaved significantly differently. These changes bring the functionality of these two classes closer together, which is reflected by refactoring more shared code from general_scalar_chain to the parent scalar_chain and reusing it from timode. There still remain significant differences (and simplifications) so the existing division of classes (as specializations) continues to make sense. Obviously, there are more changes to come (shifts and rotates), and compute_convert_gain doesn't yet have its final (tuned) form, but is already an improvement over the "return 1;" used previously. This patch has been tested on x86_64-pc-linux-gnu with make boostrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2022-07-09 Roger Sayle gcc/ChangeLog * config/i386/i386-features.h (scalar_chain): Add fields insns_conv, n_sse_to_integer and n_integer_to_sse to this parent class, moved from general_scalar_chain. (scalar_chain::convert_compare): Protected method moved from general_scalar_chain. (mark_dual_mode_def): Make protected, not private virtual. (scalar_chain:convert_op): New private virtual method. (general_scalar_chain::general_scalar_chain): Simplify constructor. (general_scalar_chain::~general_scalar_chain): Delete destructor. (general_scalar_chain): Move insns_conv, n_sse_to_integer and n_integer_to_sse fields to parent class, scalar_chain. (general_scalar_chain::mark_dual_mode_def): Delete prototype. (general_scalar_chain::convert_compare): Delete prototype. (timode_scalar_chain::compute_convert_gain): Remove simplistic implementation, convert to a method prototype. (timode_scalar_chain::mark_dual_mode_def): Delete prototype. (timode_scalar_chain::convert_op): Prototype new virtual method. * config/i386/i386-features.cc (scalar_chain::scalar_chain): Allocate insns_conv and initialize n_sse_to_integer and n_integer_to_sse fields in constructor. (scalar_chain::scalar_chain): Free insns_conv in destructor. (general_scalar_chain::general_scalar_chain): Delete constructor, now defined in the class declaration. (general_scalar_chain::~general_scalar_chain): Delete destructor. (scalar_chain::mark_dual_mode_def): Renamed from general_scalar_chain::mark_dual_mode_def. (timode_scalar_chain::mark_dual_mode_def): Delete. (scalar_chain::convert_compare): Renamed from general_scalar_chain::convert_compare. (timode_scalar_chain::compute_convert_gain): New method to determine the gain from converting a TImode chain to V1TImode. (timode_scalar_chain::convert_op): New method to convert an operand from TImode to V1TImode. (timode_scalar_chain::convert_insn) : Only PUT_MODE on REG_EQUAL notes that were originally TImode (not CONST_INT). Handle AND, ANDN, XOR, IOR, NOT and COMPARE. (timode_mem_p): Helper predicate to check where operand is memory reference with sufficient alignment for TImode STV. (timode_scalar_to_vector_candidate_p): Use convertible_comparison_p to check whether COMPARE is convertible. Handle SET_DESTs that that are REG_P or MEM_P and SET_SRCs that are REG, CONST_INT, CONST_WIDE_INT, MEM, AND, ANDN, IOR, XOR or NOT. gcc/testsuite/ChangeLog * gcc.target/i386/sse4_1-stv-2.c: New test case, pand. * gcc.target/i386/sse4_1-stv-3.c: New test case, por. * gcc.target/i386/sse4_1-stv-4.c: New test case, pxor. * gcc.target/i386/sse4_1-stv-5.c: New test case, pandn. * gcc.target/i386/sse4_1-stv-6.c: New test case, ptest. Roger --