在 2023-05-26 14:46, Stefan Kanthak 写道: > OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set" > (... ...) > OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the > right side? Please stop yelling like that. It makes you look like a naughty pupil. > 14 instructions in 33 bytes # 11 instructions in 32 bytes > > OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous > memory write? Apart from the SSE question: You are performing 64-bit arithmetic on a 32-bit machine, which GCC isn't good at. The preferred way to check whether a 64-bit integer is a power of two is to cast it to a float, then examine whether its 23-bit mantissa is all zeroes: Like yours, this also mistakes zero as a 'power of two', but it isn't. ``` sub esp, 0x0C ; 83 EC 0C fild qword ptr [esp + 0x10] ; DF 6C 24 10 xor eax, eax ; 33 C0 fstp dword ptr [esp] ; D9 1C 24 shl dword ptr [esp], 9 ; C1 24 24 09 setz al ; 0F 94 C0 add esp, 0x0C ; 83 C4 0C ret ; C3 ``` That's 8 instructions and 23 bytes in total. In 64-bit mode, 64-bit integers can be converted to floats directly: ``` cvtsi2ss xmm0, qword ptr [rsp + 0x08] ; F3 48 0F 2A 44 24 08 xor eax, eax ; 33 C0 movd ecx, xmm0 ; 66 0F 7E C1 shl ecx, 9 ; C1 E1 09 setz al ; 0F 94 C0 ret ; C3 ``` That's 6 instructions and 20 bytes in total. GCC has its own limitation, so if you would like aggressive optimization like this, you must do it yourself. -- Best regards, LIU Hao