I removed unnecessary expands/builtins and tests are now compiled with -O2. Is this version ok? 2011/8/22 Uros Bizjak : > On Mon, Aug 22, 2011 at 6:25 PM, Ilya Tocar wrote: > >>> You don't need to add "negated" versions, one FMA builtin per mode is >>> enough, please see existing FMA4 descriptions. Just put unary minus >>> sign in the intrinsics header for "negated" operand and let GCC do its >>> job. Please see existing FMA4 intrinsics header. >>> >> Actually i tried that.But in such case  when i compile(FMA4 example) >> #include >> extern  __m128 a,b,c; >> void foo(){ >>   a = _mm_nmsub_ps(a,b,c); >> } >> with -S -O0 -mfma4 >> The asm have >> >>        vxorps  %xmm1, %xmm0, %xmm0 >>        vmovaps -16(%rbp), %xmm1 >>        vmovaps .LC0(%rip), %xmm2 >>        vxorps  %xmm2, %xmm1, %xmm1 >>        vfmaddps        %xmm0, -32(%rbp), %xmm1, %xmm0 >> So vfmaddps of negated values is generated instead of vfnmsubps. >> I think it is bad that intrinsic for  instruction can generate code >> without this instruction. >> So to make sure that exact instruction is always generated i >> introduced additional expands and builtins. >> Is it wrong? > > This is artificial limitation. User requested the functionality of the > intrinsic, and should not bother with how the compiler realizes it. > With -O2, negation would propagate into the insn during combine pass, > and optimal instruction would be generated. > > So, to answer your question - it is wrong to expect exact instruction > from builtins. Maybe from using -O0, but this should not be used > anyway in the testsuite. > > Uros. >