public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Functions and Global Variables defined in Libraries
@ 2023-03-07 12:13 Frederick Virchanza Gotham
  2023-03-07 13:05 ` Xi Ruoyao
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2023-03-07 12:13 UTC (permalink / raw)
  To: gcc-help

Let's say we have a program that links with a library that exports a
global variable and a function. So the library looks like this:

int lib_global_variable = 0;

void Func(void) { }

The main program has the following declarations:

extern int lib_global_variable;

extern void Func(void);

The program links fine and runs fine if we give the linker "-L.
-lname_of_library".

If we use the program "nm" on the main executable and grep for
"lib_global_variable" and "Func", we see that both are listed as
undefined symbols:

U lib_global_variable
U _Z7LibFuncv

If we use 'readelf' on the main executable and grep for the same two
symbols, we see:

000000003fc8 000700000006 R_X86_64_GLOB_DAT 0000000000000000
lib_global_variable + 0
000000004038 000900000007 R_X86_64_JUMP_SLO 0000000000000000 _Z7LibFuncv + 0
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND lib_global_variable
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _Z7LibFuncv
39: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND lib_global_variable
41: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _Z7LibFuncv

I've been doing some testing and tinkering, and I've found that the
strategy of using 'dlopen' at runtime to load a library works fine so
long as the undefined symbol is listed under R_X86_64_JUMP_SLO. It
doesn't work if the symbol is listed under R_X86_64_GLOB_DAT.

Typically all undefined functions get listed under R_X86_64_JUMP_SLO,
and all global variables get listed under R_X86_64_GLOB_DAT, however
it is possible to get functions listed under R_X86_64_GLOB_DAT, and my
strategy of using 'dlopen' doesn't work if the function is under
R_X86_64_GLOB_DAT.

It seems that GNU g++ by default puts the undefined function under
R_X86_64_JUMP_SLO, however if you try to use the address of the
function at all, for example:

cout << (std::uintptr_t)(void*)LibFunc << endl;

then the function gets moved to R_X86_64_GLOB_DAT, and then my
strategy no longer works as 'dlopen' doesn't resolve the unresolved
symbol.

So I'd like to ask two questions:

(1) Is the R_X86_64_JUMP_SLO category just for functions, or can we
put global variables in there too? Is it possible to get 'dlopen' to
resolve global variables?

(2) Is there any way to stop the GNU linker from putting an undefined
function in R_X86_64_GLOB_DAT?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-07 12:13 Functions and Global Variables defined in Libraries Frederick Virchanza Gotham
@ 2023-03-07 13:05 ` Xi Ruoyao
  2023-03-07 13:25   ` Alexander Monakov
  0 siblings, 1 reply; 9+ messages in thread
From: Xi Ruoyao @ 2023-03-07 13:05 UTC (permalink / raw)
  To: cauldwell.thomas, gcc-help

Neither R_X86_64_JUMP_SLOT nor R_X86_64_GLOB_DAT are emitted by GCC so
the question is off-topic.  I'll explain them briefly though.

On Tue, 2023-03-07 at 12:13 +0000, Frederick Virchanza Gotham via Gcc-
help wrote:
> (1) Is the R_X86_64_JUMP_SLO category just for functions, or can we
> put global variables in there too? Is it possible to get 'dlopen' to
> resolve global variables?

No.

> (2) Is there any way to stop the GNU linker from putting an undefined
> function in R_X86_64_GLOB_DAT?

No, unless you avoid extracting its address.

You need to understand how R_X86_64_JUMP_SLOT works.  When a program or
library is loaded, the dynamic linker do nothing for it.

When you call a function foo in a shared library, it's implemented by
calling a function called foo@plt first.  foo@plt is in the main
executable.  It attempts to load the address of foo from the GOT and
jump to the address.

As R_X86_64_JUMP_SLOT is not handled by the dynamic linker, on the first
call the GOT does not contains the address of foo.  Instead it contains
the address of a "resolver function".  The resolver calculates the real
address of foo, fills it into the GOT, then jumps to the address.  In
the subsequent calls foo@plt will directly jump to foo as the GOT
already contains the address of foo.

This obviously won't work for a global variable because you can't call
it.  This won't work for a function pointer either: the value of the
function pointer must be the address of foo itself, not foo@plt.  Or the
result of &foo in the shared library and the main executable will be
different, violating the C or C++ standard.  Then we must use
R_X86_64_JUMP_SLOT which is handled by the dynamic linker.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-07 13:05 ` Xi Ruoyao
@ 2023-03-07 13:25   ` Alexander Monakov
  2023-03-08 23:08     ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Monakov @ 2023-03-07 13:25 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: cauldwell.thomas, gcc-help


On Tue, 7 Mar 2023, Xi Ruoyao via Gcc-help wrote:

> This won't work for a function pointer either: the value of the
> function pointer must be the address of foo itself, not foo@plt.  Or the
> result of &foo in the shared library and the main executable will be
> different, violating the C or C++ standard.  Then we must use
> R_X86_64_JUMP_SLOT which is handled by the dynamic linker.

(you probably meant GLOB_DAT in the last statement)

This paragraph is inaccurate: traditional non-PIC, non-PIE codegen uses
direct symbol references. So, when you have a direct reference to foo
in non-PIC main executable, the reference is resolved to its PLT slot,
and the address of that PLT slot becomes the canonical address of 'foo'
for the whole program.

When the main executable is PIE, it may or may not have a PLT slot for
'foo', and if it doesn't, the canonical address of 'foo' is its actual
implementation.

Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-07 13:25   ` Alexander Monakov
@ 2023-03-08 23:08     ` Frederick Virchanza Gotham
  2023-03-09  2:47       ` Xi Ruoyao
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2023-03-08 23:08 UTC (permalink / raw)
  To: gcc-help

On Tue, Mar 7, 2023 at 1:05 PM Xi Ruoyao wrote:
>
>
> > (2) Is there any way to stop the GNU linker from putting an undefined
> > function in R_X86_64_GLOB_DAT?
>
> No, unless you avoid extracting its address.


I've been doing some major tinkering today.

The entry point of an executable is '_start'. So, first I wrote a new
entry point in x64 assembler that could differentiate between GUI mode
and console mode depending upon the value of 'argc':

; This file contains x86_64 assembler for NASM, also known as x64.
; This file contains two functions:
; static void print8bytes(uint64_t eight_chars,uint64_t new_line);
; extern void pre_start(int argc);

section .text

print8bytes: ; This is a function that returns void
; Two parameters:
; r9: The 8-byte string to print
; r8: If true, prints a trailing new line

; save all the register values we're going to use
push rax
push rsi
push rdi
push rdx

;zero out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx

;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
mov rsi, r9
push rsi
mov rsi, rsp
mov dl, 8 ; Print 8 bytes at a time
syscall
pop rsi

cmp r8, 1 ; check if r8 is true or false
jl no_new_line
;zero out the registers we are going to need
xor rax, rax
xor rsi, rsi
xor rdi, rdi
xor rdx, rdx
;write(int fd, char *msg, unsigned int len)
mov al, 1
add di, 1
mov rsi, 0x000000000000000a ; new line
push rsi
mov rsi, rsp
mov dl, 1 ; Print just one byte
syscall
pop rsi
no_new_line: ; just a jump label - not a function name
pop rdx
pop rdi
pop rsi
pop rax
ret

global pre_start:function
pre_start:
; The 'argc' argument to 'main' is on the top of the stack so
; we will use the frame pointer 'rbp' to keep track of it.
push rbp
mov rbp, rsp

push r9 ; save because we'll use it - pop it back later
push r8 ; save because we'll use it - pop it back later

mov r8, 0 ; false = don't put trailing new line
mov r9, 0x3d3d3d3d3d3d3d3d ; "========"
call print8bytes
call print8bytes
call print8bytes

mov r9, 0x6174735f65727020 ; " pre_sta"
call print8bytes

cmp qword[rbp+8], 2 ; check if argc < 2
jl $+2+10+2 ; if argc < 2 then we want GUI mode
mov r9, 0x646d63202d207472 ; "rt - cmd"
jmp $+2+10 ; skip the next 10-byte instruction
mov r9, 0x495547202d207472 ; "rt - GUI"
call print8bytes

mov r9, 0x3d3d3d3d3d3d3d3d ; "========"
call print8bytes
call print8bytes
mov r8, 1 ; true = put trailing new line
call print8bytes

pop r8
pop r9

mov rsp, rbp
pop rbp

extern _start
jmp _start


If you see the last line there, I jump straight into _start. So then I
build my program with a new entry point as follows:

    g++ -o prog prog.cpp object_file_from_assembler.o -e pre_start

When I run it at the command line, the first thing I get is:

    ======================== pre_start - GUI========================

and then it continues execution as normal. No problems.

So then the next thing I did was I used 'patchelf' to remove the
NEEDED for the graphical user interface library:

    patchelf --remove-needed libgtk-3.so.0 ./prog

And then I tried to run it again, but this time around I got back:

    ./ssh: symbol lookup error: ./ssh: undefined symbol: gtk_true

This means that the program falls over ***before*** the entry point is reached.

So the part of the Linux operating system that loads executable files
is not even going into the entry point for my program, it's falling
over before then. I need to stop this happening some how. Perhaps I
can put dummy values in the GOT table so that the loader doesn't think
they're null?

On Tue, Mar 7, 2023 at 1:25 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Tue, 7 Mar 2023, Xi Ruoyao via Gcc-help wrote:
>
> > This won't work for a function pointer either: the value of the
> > function pointer must be the address of foo itself, not foo@plt.  Or the
> > result of &foo in the shared library and the main executable will be
> > different, violating the C or C++ standard.  Then we must use
> > R_X86_64_JUMP_SLOT which is handled by the dynamic linker.
>
> (you probably meant GLOB_DAT in the last statement)
>
> This paragraph is inaccurate: traditional non-PIC, non-PIE codegen uses
> direct symbol references. So, when you have a direct reference to foo
> in non-PIC main executable, the reference is resolved to its PLT slot,
> and the address of that PLT slot becomes the canonical address of 'foo'
> for the whole program.
>
> When the main executable is PIE, it may or may not have a PLT slot for
> 'foo', and if it doesn't, the canonical address of 'foo' is its actual
> implementation.
>
> Alexander

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-08 23:08     ` Frederick Virchanza Gotham
@ 2023-03-09  2:47       ` Xi Ruoyao
  2023-03-09  8:01         ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Xi Ruoyao @ 2023-03-09  2:47 UTC (permalink / raw)
  To: cauldwell.thomas, gcc-help

On Wed, 2023-03-08 at 23:08 +0000, Frederick Virchanza Gotham via Gcc-help wrote:
> When I run it at the command line, the first thing I get is:
> 
>     ======================== pre_start - GUI========================
> 
> and then it continues execution as normal. No problems.
> 
> So then the next thing I did was I used 'patchelf' to remove the
> NEEDED for the graphical user interface library:
> 
>     patchelf --remove-needed libgtk-3.so.0 ./prog
> 
> And then I tried to run it again, but this time around I got back:
> 
>     ./ssh: symbol lookup error: ./ssh: undefined symbol: gtk_true
> 
> This means that the program falls over ***before*** the entry point is reached.

It does not happen to me:

$ cat t.c
#include <gtk/gtk.h>

int main()
{
	volatile int flag = 0;
	if (flag) {
		volatile int r = gtk_true ();
	}
}
$ cc t.c -I /usr/include/gtk-3.0 -I /usr/include/glib-2.0 -I /usr/lib/glib-2.0/include -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/cairo -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/atk-1.0
$ patchelf --remove-needed libgtk-3.so.0 ./a.out
$ objdump -d | grep 'call.*gtk' 
    117f:	e8 ac fe ff ff       	call   1030 <gtk_true@plt>
$ objdump -T | grep gtk
0000000000000000      DF *UND*	0000000000000000  Base        gtk_true
$ { readelf -d a.out | grep gtk; } || echo "nothing"
nothing
$ ./a.out && echo "fine"
fine

I guess it does not work for you because your distro has enabled
BIND_NOW (-Wl,-z,now) by default.

And anyway your question has nothing related to GCC.  Try to find a more
proper channel to discuss it.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-09  2:47       ` Xi Ruoyao
@ 2023-03-09  8:01         ` Frederick Virchanza Gotham
  2023-03-09  8:05           ` Xi Ruoyao
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2023-03-09  8:01 UTC (permalink / raw)
  To: gcc-help

On Thu, Mar 9, 2023 at 2:48 AM Xi Ruoyao <xry111@xry111.site> wrote:
>
> I guess it does not work for you because your distro has enabled
> BIND_NOW (-Wl,-z,now) by default.


This can be gotten around as the GNU linker allows us to build the
executable with -Wl,-z,lazy.


> int main()
> {
>         volatile int flag = 0;
>         if (flag) {
>                 volatile int r = gtk_true ();
>         }
> }


Please add one more line to 'main':

       void *volatile p = (void*)gtk_true;

and test it again.


> And anyway your question has nothing related to GCC.  Try to find a more
> proper channel to discuss it.


It is related to the GNU compiler suite, specifically the linker 'ld'
and how it generates the tables (GLOBAL_DATA,JUMP_SLOT,got,plt).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-09  8:01         ` Frederick Virchanza Gotham
@ 2023-03-09  8:05           ` Xi Ruoyao
  2023-03-09  9:27             ` Frederick Virchanza Gotham
  0 siblings, 1 reply; 9+ messages in thread
From: Xi Ruoyao @ 2023-03-09  8:05 UTC (permalink / raw)
  To: cauldwell.thomas, gcc-help

On Thu, 2023-03-09 at 08:01 +0000, Frederick Virchanza Gotham via Gcc-
help wrote:
> Please add one more line to 'main':
> 
>        void *volatile p = (void*)gtk_true;
> 
> and test it again.

As I've explained you can't do that.

> > And anyway your question has nothing related to GCC.  Try to find a
> more
> > proper channel to discuss it.
> 
> 
> It is related to the GNU compiler suite, specifically the linker 'ld'
> and how it generates the tables (GLOBAL_DATA,JUMP_SLOT,got,plt).

GNU linker is not a part of GCC.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-09  8:05           ` Xi Ruoyao
@ 2023-03-09  9:27             ` Frederick Virchanza Gotham
  2023-03-09  9:53               ` Andrew Haley
  0 siblings, 1 reply; 9+ messages in thread
From: Frederick Virchanza Gotham @ 2023-03-09  9:27 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: gcc-help

On Thu, Mar 9, 2023 at 8:06 AM Xi Ruoyao <xry111@xry111.site> wrote:
>
>
> GNU linker is not a part of GCC.


I thought the name 'gcc' was used to lump everything in together, gcc
+ g++ + ld + cc1 and a few others.

I went looking for a forum / mailing list specifically for 'ld' but I
couldn't find one. Is there one?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Functions and Global Variables defined in Libraries
  2023-03-09  9:27             ` Frederick Virchanza Gotham
@ 2023-03-09  9:53               ` Andrew Haley
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Haley @ 2023-03-09  9:53 UTC (permalink / raw)
  To: gcc-help

On 3/9/23 09:27, Frederick Virchanza Gotham via Gcc-help wrote:
> I went looking for a forum / mailing list specifically for 'ld' but I
> couldn't find one. Is there one?

https://sourceware.org/mailman/listinfo/binutils

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-03-09  9:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07 12:13 Functions and Global Variables defined in Libraries Frederick Virchanza Gotham
2023-03-07 13:05 ` Xi Ruoyao
2023-03-07 13:25   ` Alexander Monakov
2023-03-08 23:08     ` Frederick Virchanza Gotham
2023-03-09  2:47       ` Xi Ruoyao
2023-03-09  8:01         ` Frederick Virchanza Gotham
2023-03-09  8:05           ` Xi Ruoyao
2023-03-09  9:27             ` Frederick Virchanza Gotham
2023-03-09  9:53               ` Andrew Haley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).