------- Comment #17 from vvv at ru dot ru 2009-05-12 16:40 ------- (In reply to comment #16) > Created an attachment (id=17783) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17783&action=view) [edit] > gcc45-pr39942.patch > Patch that attempts to take into account .p2align directives that are emitted > for (some) CODE_LABELs and also the gen_align insns that the pass itself > inserts. For a CODE_LABEL, say .p2align 16,,10 means either that the .p2align > directive starts a new 16 byte page (then insns before it are never > interesting), or nothing was skipped because more than 10 bytes would need to > be skipped. But that means the current group could contain only 5 or less > bytes of instructions before the label, so again, we don't have to look at > instructions not in the last 5 bytes. > Another fix is that for MAX_SKIP < 7, ASM_OUTPUT_MAX_SKIP_ALIGN shouldn't emit > the second .p2align 3, which might (and often does) skip more than MAX_SKIP > bytes (up to 7). Nice path. Code looks better. It checked on Linux kernel 2.6.29.2. But 2 notes: 1.There is no garanty that .p2align will be translated to NOPs. Example: # cat test.c void f(int i) { if (i == 1) F(1); if (i == 2) F(2); if (i == 3) F(3); if (i == 4) F(4); if (i == 5) F(5); } # gcc -o test.s test.c -O2 -S # cat test.s .file "test.c" .text .p2align 4,,15 .globl f .type f, @function f: .LFB0: .cfi_startproc cmpl $1, %edi je .L7 cmpl $2, %edi je .L7 cmpl $3, %edi je .L7 cmpl $4, %edi .p2align 4,,5 <------- attempt of padding je .L7 cmpl $5, %edi je .L7 rep ret .p2align 4,,10 .p2align 3 .L7: xorl %eax, %eax jmp F .cfi_endproc .LFE0: .size f, .-f .ident "GCC: (GNU) 4.5.0 20090512 (experimental)" .section .note.GNU-stack,"",@progbits # gcc -o test.out test.s -O2 -c # objdump -d test.out 0000000000000000 : 0: 83 ff 01 cmp $0x1,%edi 3: 74 1b je 20 5: 83 ff 02 cmp $0x2,%edi 8: 74 16 je 20 a: 83 ff 03 cmp $0x3,%edi d: 74 11 je 20 f: 83 ff 04 cmp $0x4,%edi 12: 74 0c je 20 <---- no NOP here 14: 83 ff 05 cmp $0x5,%edi 17: 74 07 je 20 19: f3 c3 repz retq IMHO, better to insert not .p2align, but NOPs directly. ( I mean line - emit_insn_before (gen_align (GEN_INT (padsize)), insn); ) 2. IMHO, it's bad idea to insert somthing between CMP and conditional jmp. Quote from Intel 64 and IA-32 Architectures Optimization Reference Manual >> 3.4.2.2 Optimizing for Macro-fusion >> Macro-fusion merges two instructions to a single μop. Intel Core Microarchitecture >> performs this hardware optimization under limited circumstances. >> The first instruction of the macro-fused pair must be a CMP or TEST instruction. This >> instruction can be REG-REG, REG-IMM, or a micro-fused REG-MEM comparison. The >> second instruction (adjacent in the instruction stream) should be a conditional >> branch. So if we need to insert NOPs, better to do it _before_ CMP. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942