Many thanks to Uros for reviewing/approving all of the previous pieces. This patch adds support for converting 128-bit TImode shifts and rotates to SSE equivalents using V1TImode during the TImode STV pass. Previously, only logical shifts by multiples of 8 were handled (from my patch earlier this month). As an example of the benefits, the following rotate by 32-bits: unsigned __int128 a, b; void rot32() { a = (b >> 32) | (b << 96); } when compiled on x86_64 with -O2 previously generated: movq b(%rip), %rax movq b+8(%rip), %rdx movq %rax, %rcx shrdq $32, %rdx, %rax shrdq $32, %rcx, %rdx movq %rax, a(%rip) movq %rdx, a+8(%rip) ret with this patch, now generates: movdqa b(%rip), %xmm0 pshufd $57, %xmm0, %xmm0 movaps %xmm0, a(%rip) ret [which uses a V4SI permutation for those that don't read SSE]. This should help 128-bit cryptography codes, that interleave XORs with rotations (but that don't use additions or subtractions). This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-08-15 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (timode_scalar_chain::compute_convert_gain): Provide costs for shifts and rotates. Provide gains for comparisons against 0/-1. (timode_scalar_chain::convert_insn): Handle ASHIFTRT, ROTATERT and ROTATE just like existing ASHIFT and LSHIFTRT cases. (timode_scalar_to_vector_candidate_p): Handle all shifts and rotates by integer constants between 0 and 127. gcc/testsuite/ChangeLog * gcc.target/i386/sse4_1-stv-9.c: New test case. Thanks in advance, Roger --