From mboxrd@z Thu Jan 1 00:00:00 1970 From: Charles Wilson To: DJ Delorie Cc: binutils@sources.redhat.com, cygwin@cygwin.com, Paul Sokolovsky Subject: Re: [aida_s@mx12.freecom.ne.jp: A serious bug of "ld --enable-auto-import"] Date: Sun, 26 Aug 2001 15:35:00 -0000 Message-id: <3B8979A4.5060605@ece.gatech.edu> References: <3B8884F6.80708@ece.gatech.edu> <200108260530.BAA28221@envy.delorie.com> <3B888D76.6090102@ece.gatech.edu> <200108260613.CAA28557@envy.delorie.com> <3B891172.9000207@ece.gatech.edu> <200108261543.LAA06415@envy.delorie.com> <3B891E23.9090407@ece.gatech.edu> <200108261643.MAA06855@envy.delorie.com> X-SW-Source: 2001-08/msg00616.html DJ Delorie wrote: >>Well, that's interesting. Since arrays ARE pointers(*), then perhaps >>it's enough to change gcc's behavior from >> > > Not from gcc's perspective. From C's perspective, array symbols and > pointer symbols are mostly interchangeable, but they are not the same. > For example, these two declarations: > > extern char *foo; > extern char foo[]; > > are *not* the same, and using the wrong one results in a broken > program. Thanks for not ridiculing my thinko. I knew this. > For our purposes, a pointer is a symbol referencing a four-byte range > of memory that holds the address of a range of memory that holds a > sequence of characters, and an array is a symbol referencing a range > of memory that holds a sequence of characters. Because a pointer > requires an extra indirection, gcc is limited in the optimizations it > can do on it, but dealing with imports becomes simpler because the > address occurs in exactly one place. > > Since a symbol is always a constant (regardless of what it refers to), > offsetting it by a constant results in a sum that can always be > computed at compile time (well, link time) and gcc will always do it > that way. This is a fairly fundamental concept in gcc, and I doubt it > would be practical to tell gcc to do it otherwise. > AHA! But that the auto-import code replaces the extra indirection (for DATA access into a DLL) with the actual address in the loaded DLL. (see docs pasted below). Perhaps the auto-import needs to create additional pseudo-symbols for index-array access. E.g. hwstr hwstr[1] hwstr[2] hwstr[12] could each be mapped to *different* "fake" symbols. The, the runtime loader would just replace them as before -- but this time, with the correct (offset) address in the DLL. Downside: could lead to an explosion of symbols, if there's a lot of constant-offset indexing into arrays exported by the DLL. (Variable offsets are computed at runtime, of course. No problem there. And it seems that ONLY arrays are subject to this problem...if I understand correctly) Oh shoot. I just realized that the above is garbage. How will the DLL know *which* fake symbols to export? It can't know how an external client will access an array variable, so the DLL has to export fake symbols for every conceivable constant index. This is *possible* -- since we're talking about arrays (e.g. with fixed length; these are *not* pointers ) -- but not really practical. A simple array foo[4096] leads to 4097 exported symbols. No, that's just silly. I'm going back to square one on this problem. I'm out of ideas on this one. Paul? Paaauuulll? FWIW, this is what a disassembly of hello.exe looks like (no declspec decorators, using the auto-import stuff. Notice the "fixup" labels __fuN__symbol): 00401044 <_main>: 401044: 55 push %ebp 401045: 89 e5 mov %esp,%ebp 401047: 83 ec 18 sub $0x18,%esp 40104a: e8 8d 00 00 00 call 4010dc <___main> 40104f: c6 05 04 41 40 00 21 movb $0x21,0x404104 ^^^^^^^^ this is off by 12 00401051 <__fu0__hwstr1>: 401051: 04 41 add $0x41,%al 401053: 40 inc %eax 401054: 00 21 add %ah,(%ecx) 401056: c7 45 fc fc 40 40 00 movl $0x4040fc,0xfffffffc(%ebp) 00401059 <__fu2__hwstr2>: 401059: fc cld 40105a: 40 inc %eax 40105b: 40 inc %eax 40105c: 00 8b 45 fc 83 c0 add %cl,0xc083fc45(%ebx) 401062: 0c c6 or $0xc6,%al 401064: 00 21 add %ah,(%ecx) 401066: 83 c4 f4 add $0xfffffff4,%esp 401069: 68 f8 40 40 00 push $0x4040f8 0040106a <__fu1__hwstr1>: 40106a: f8 clc 40106b: 40 inc %eax 40106c: 40 inc %eax 40106d: 00 e8 add %ch,%al 40106f: 71 00 jno 401071 <__fu1__hwstr1+0x7> 401071: 00 00 add %al,(%eax) 401073: 83 c4 10 add $0x10,%esp 401076: 83 c4 f4 add $0xfffffff4,%esp 401079: 68 fc 40 40 00 push $0x4040fc 0040107a <__fu3__hwstr2>: 40107a: fc cld 40107b: 40 inc %eax 40107c: 40 inc %eax 40107d: 00 e8 add %ch,%al 40107f: 61 popa 401080: 00 00 add %al,(%eax) 401082: 00 83 c4 10 31 c0 add %al,0xc03110c4(%ebx) 401088: eb 02 jmp 40108c <__fu3__hwstr2+0x12> 40108a: 89 f6 mov %esi,%esi 40108c: 89 ec mov %ebp,%esp 40108e: 5d pop %ebp 40108f: c3 ret Funky, huh? --Chuck Quoting from the pe-dll.c: ------------------------------------ Auto-import feature by Paul Sokolovsky Quick facts: 1. With this feature on, DLL clients can import variables from DLL without any concern from their side (for example, without any source code modifications). 2. This is done completely in bounds of the PE specification (to be fair, there's a place where it pokes nose out of, but in practise it works). So, resulting module can be used with any other PE compiler/linker. 3. Auto-import is fully compatible with standard import method and they can be mixed together. 4. Overheads: space: 8 bytes per imported symbol, plus 20 for each reference to it; load time: negligible; virtual/physical memory: should be less than effect of DLL relocation, and I sincerely hope it doesn't affect DLL sharability (too much). Idea The obvious and only way to get rid of dllimport insanity is to make client access variable directly in the DLL, bypassing extra dereference. I.e., whenever client contains someting like mov dll_var,%eax, address of dll_var in the command should be relocated to point into loaded DLL. The aim is to make OS loader do so, and than make ld help with that. Import section of PE made following way: there's a vector of structures each describing imports from particular DLL. Each such structure points to two other parellel vectors: one holding imported names, and one which will hold address of corresponding imported name. So, the solution is de-vectorize these structures, making import locations be sparse and pointing directly into code. Before continuing, it is worth a note that, while authors strives to make PE act ELF-like, there're some other people make ELF act PE-like: elfvector, ;-) . Implementation For each reference of data symbol to be imported from DLL (to set of which belong symbols with name , if __imp_ is found in implib), the import fixup entry is generated. That entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 subsection. Each fixup entry contains pointer to symbol's address within .text section (marked with __fuN_ symbol, where N is integer), pointer to DLL name (so, DLL name is referenced by multiple entries), and pointer to symbol name thunk. Symbol name thunk is singleton vector (__nm_th_) pointing to IMAGE_IMPORT_BY_NAME structure (__nm_) directly containing imported name. Here comes that "om the edge" problem mentioned above: PE specification rambles that name vector (OriginalFirstThunk) should run in parallel with addresses vector (FirstThunk), i.e. that they (so, DLL name is referenced by multiple entries), and pointer to symbol name thunk. Symbol name thunk is singleton vector (__nm_th_) pointing to IMAGE_IMPORT_BY_NAME structure (__nm_) directly containing imported name. Here comes that "om the edge" problem mentioned above: PE specification rambles that name vector (OriginalFirstThunk) should run in parallel with addresses vector (FirstThunk), i.e. that they should have same number of elements and terminated with zero. We violate this, since FirstThunk points directly into machine code. But in practise, OS loader implemented the sane way: it goes thru OriginalFirstThunk and puts addresses to FirstThunk, not something else. It once again should be noted that dll and symbol name structures are reused across fixup entries and should be there anyway to support standard import stuff, so sustained overhead is 20 bytes per reference. Other question is whether having several IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, it is done even by native compiler/linker (libth32's functions are in fact reside in windows9x kernel32.dll, so if you use it, you have two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is whether referencing the same PE structures several times is valid. The answer is why not, prohibitting that (detecting violation) would require more work on behalf of loader than not doing it. --------------------------------------------