From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) by sourceware.org (Postfix) with ESMTPS id B47043858414 for ; Wed, 8 Sep 2021 17:41:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B47043858414 Received: by mail-yb1-xb29.google.com with SMTP id v10so5188500ybq.7 for ; Wed, 08 Sep 2021 10:41:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=zrLU+qv+mM9Z23n+kBRfpb11/5sWIhtHPlp76yvWDGc=; b=fnhbBxITzxB4FWug9FpFGiopmivggDQaz4KwrBznEi5pd9eAsvquYi/Zmf+nHV+eQA wQYtHIdtvwH9+VhkbtckDGN7s32YDJLXeV1qFy5wxfZIqtxVnMrF0EQt1bnmrZv9AKsg p0VlDwKOeHEgDRWAHc0OM7xj4inewv2H21Ch5pT/UerPx/OoJceOWrQgFKWB/CHuXkBB 4hF3PYMZ5/R8LMXQtMfT95CjnCZcfhnMo5kK10073Pzn6cVzCS7Hjwf8obPTVoHdbpbD N7B5QpT1fHYMvNtq+0fusnqtUyf+/VKwEK7mwxieFurQVifdUkhvzJ6gXrPosaJuk7ve VyvA== X-Gm-Message-State: AOAM5302QiPIFONmwI0WwJN9ZeI5uF9pX/+QvzFA/kYQhiJkR9182NHJ VdcOZoqx9UBCvk8003ORph1lZM3ZgvrpkncUeWEFCA== X-Google-Smtp-Source: ABdhPJxIUKKYAWcOMOFXht+kXivf321YiOdwpMw9sUBQZNBv97FT1TiSDtBmN/k+OMs/+9fnoX6HpVx10a6Iynt+KD4= X-Received: by 2002:a05:6902:1201:: with SMTP id s1mr6569157ybu.432.1631122872081; Wed, 08 Sep 2021 10:41:12 -0700 (PDT) MIME-Version: 1.0 References: <20210907131616.23472-1-lukma@denx.de> <20210907164906.yt6nonvfyhvbrx6p@google.com> <20210907193227.6047f9cc@ktm> <20210907174417.sctsswphsyae4mpc@google.com> <20210908170524.4be44bf0@ktm> In-Reply-To: <20210908170524.4be44bf0@ktm> From: =?UTF-8?B?RsSBbmctcnXDrCBTw7JuZw==?= Date: Wed, 8 Sep 2021 10:41:01 -0700 Message-ID: Subject: Re: [PATCH] dl: Use "adr" assembler command to get proper load address To: Lukasz Majewski Cc: Adhemerval Zanella , Florian Weimer , Joseph Myers , "Carlos O'Donell" , libc-alpha Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-22.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Sep 2021 17:41:14 -0000 On Wed, Sep 8, 2021 at 8:05 AM Lukasz Majewski wrote: > > Hi Fangrui, > > > On 2021-09-07, Lukasz Majewski wrote: > > >Hi Fangrui, > > > > > >> On 2021-09-07, Lukasz Majewski wrote: > > >> >This change is a partial revert of commit > > >> >bca0f5cbc9257c13322b99e55235c4f21ba0bd82 which imposed usage of > > >> >__ehdr_start linker variable to get the address of loaded program. > > >> > > > >> >The elf_machine_load_address() function is declared in the > > >> >sysdeps/arm/dl-machine.h header. It is called from _dl_start() > > >> >entry point for the program. It shall return the load address of > > >> >the dynamic linker program. > > >> > > >> Yes. > > >> > > >> >With this revert the 'adr' assembler instruction is used instead > > >> >of a place holder: > > >> > > > >> >arm-poky-linux-gnueabi-objdump -t ld-linux-armhf.so.3 | grep ehdr > > >> >00000000 l .note.gnu.build-id 00000000 __ehdr_start > > >> > > > >> >which shall be pre-set by binutils. > > >> > > >> I don't understand this. The sh_addr field of the binutils defined > > >> __ehdr_start is 0. > > >> > > >> Declararing __ehdr_start as hidden and accessing it with > > >> PC-relative addressing generates computes the runtime address of > > >> __ehdr_start, which equals the load base. > > > > > >Maybe I don't get it - but shouldn't this be filled in by binutils > > >(linker ?) when the program is assembled before run? > > > > The linker just sets its link-time address (sh_addr): 0. > > The address is computed at runtime. > > > > >Or is the __ehdr_start used just as a relative offset to get the > > >proper position of ld-linux-armh.so when called? > > > > It's a relative offset. > > > > % cat a.c > > __attribute__((visibility("hidden"))) > > extern char __ehdr_start[]; > > > > char *load_address() { return __ehdr_start; } > > > > % arm-linux-gnueabi-gcc -O1 -fpic a.c -nostdlib > > % arm-linux-gnueabi-objdump -dr a.out > > > > a.out: file format elf32-littlearm > > > > > > Disassembly of section .text: > > > > 00000198 : > > 198: e59f0004 ldr r0, [pc, #4] ; 1a4 > > 19c: e08f0000 add r0, pc, r0 > > 1a0: e12fff1e bx lr > > 1a4: fffffe5c .word 0xfffffe5c > > > > At runtime, `load_address` will compute the load address. > > Note that there is no runtime relocation. > > I think that the problem may be with having the negative value > calculated. Unfamiliar with arm, but I don't see why a negative value can be a problem. The address computation just wraps around as expected. > The relevant snipet: > > 116c: bf00 nop > 116e: bf00 nop > 1170: bf00 nop > 1172: f8df 3508 ldr.w r3, [pc, #1288] ; 167c <_dl_start= +0x520> > > 1176: f8df 1508 ldr.w r1, [pc, #1288] ; 1680 <_dl_start= +0x524> > > 117a: 447b add r3, pc > 117c: 4479 add r1, pc > 117e: f8c3 1598 str.w r1, [r3, #1432] ; 0x598 > 1182: bf00 nop > 1184: bf00 nop > 1186: bf00 nop > 1188: bf00 nop > 118a: bf00 nop > 118c: bf00 nop > 118e: f8df 24f4 ldr.w r2, [pc, #1268] ; 1684 <_dl_start= +0x528> > 1192: f8d3 5598 ldr.w r5, [r3, #1432] ; 0x598 > 1196: 447a add r2, pc > 1198: 442a add r2, r5 > 119a: 1a52 subs r2, r2, r1 > 119c: f8c3 25a0 str.w r2, [r3, #1440] ; 0x5a0 > 11a0: 6813 ldr r3, [r2, #0] > > > 167c: 0002be92 .word 0x0002be92 > 1680: ffffee80 .word 0xffffee80 > > The r1 gets the 0xffffee80 (negative offset) value. It is then added to p= c > and used to calculate r2. > > For working code (with this patch applied) - there are NO such large > values (like aforementioned 0xffffee80). The arithmetic is done on > > 1690: 00000020 .word 0x00000020 > 1694: 0002be7e .word 0x0002be7e > > which seems to work. > > This shouldn't be a problem as with U2 the arithmetic shall work. > However, I've noticed (with passing LD_DEBUG=3Dall) that the > ld-linux-armhf.so.3 (and init) are relocated twice before execution. > > Why do we need to relocate it? No idea... > Another question is why on this particular case the large (i.e. > negative) offset matters? No idea... > > > > > > >> > > >> What's wrong with it? The objdump -t line doesn't tell why this > > >> change is good. > > > > > >I'm also puzzled why this causes the QEMU versalite board to break. > > >(/sbin/init dies with SIGSEGFAULT =3D 0xb [*]) > > > > > >It looks like the &__ehdr_start is not equal in the result to 'adr' > > >assembler code in QEMU. > > > > > >I also thought that the issue is caused by fs/binfmt_elf.c changes in > > >the Linux kernel (as mine is 5.1), so I've tested it with 5.10 and > > >5.14. No change. > > > > The kernel change must be unrelated. > > The kernel doesn't even know __ehdr_start as a symbol exists. > > > > >The QEMU is pretty recent - it is 6.0.0 version for ARM. > > > > > >> Can you compare objdump -dr output and show the incorrect > > >> code sequences with C assessing __ehdr_start? > > > > > >Ok. I will explicitly compare _dl_start function with and without > > >this patch. > > > > In case it helps, perhaps replace always_inline with noinline > > and check why the function is incorrect under your qemu setup. > > > > >> > > >> >This is crucial in the QEMU ARM environment for which (when > > >> >/sbin/init is executed) values set in __ehdr_start symbol are > > >> >wrong. This causes the program to crash very early - when the > > >> >/lib/ld-linux-armhf.so.3 is executed as a prerequisite to > > >> >/sbin/init execution. > > >> > > >> >The kernel's fs/binfmt_elf.c is though responsible for setting > > >> >up execution environment, not binutils. > > >> > > >> Whether the symbol entry __ehdr_start exists doesn't matter. > > >> The hidden definition does not have any associated relocation. > > > > > >I'm also wondering if similar issue is visible with "normal" - i.e. > > >not QEMU ARM run init. > > > > > >Maybe the "rough" environment in which /sbin/init is run is the > > >culprit? > > > > > > > I had tested running ld.so and elf/ldconfig which covered the usage. > > > > I don't know. It needs some debugging from you:) > > > > >> > > >> >It looks like the only robust way to obtain the _dl_start offset > > >> >is to use assembler instruction - not rely on values provided by > > >> >binutils. > > >> > > > >> >HW: > > >> >Hardware name: ARM-Versatile Express (Run with QEMU) > > >> >Tested (affected) kernels v5.1.12, v5.10.62 and v5.14.1 > > >> > > > >> >When the /sbin/init is setup for run from Linux kernel's very > > >> >small environment with LD_DEBUG=3Dall the __ehdr_start is not shown > > >> >at all. > > >> > > > >> >Fixes: BZ #28293 > > >> >--- > > >> > sysdeps/arm/dl-machine.h | 28 +++++++++++++++++++++++++--- > > >> > 1 file changed, 25 insertions(+), 3 deletions(-) > > >> > > > >> >diff --git a/sysdeps/arm/dl-machine.h b/sysdeps/arm/dl-machine.h > > >> >index eb13cb8b57..58ebef6ecd 100644 > > >> >--- a/sysdeps/arm/dl-machine.h > > >> >+++ b/sysdeps/arm/dl-machine.h > > >> >@@ -38,11 +38,33 @@ elf_machine_matches_host (const Elf32_Ehdr > > >> >*ehdr) > > >> > } > > >> > > > >> > /* Return the run-time load address of the shared object. */ > > >> >-static inline ElfW(Addr) __attribute__ ((unused)) > > >> >+static inline Elf32_Addr __attribute__ ((unused)) > > >> > elf_machine_load_address (void) > > >> > { > > >> >- extern const ElfW(Ehdr) __ehdr_start attribute_hidden; > > >> >- return (ElfW(Addr)) &__ehdr_start; > > >> >+ Elf32_Addr pcrel_addr; > > >> >+#ifdef SHARED > > >> >+ extern Elf32_Addr __dl_start (void *) asm ("_dl_start"); > > >> >+ Elf32_Addr got_addr =3D (Elf32_Addr) &__dl_start; > > >> >+ asm ("adr %0, _dl_start" : "=3Dr" (pcrel_addr)); > > >> >+#else > > >> >+ extern Elf32_Addr __dl_relocate_static_pie (void *) > > >> >+ asm ("_dl_relocate_static_pie") attribute_hidden; > > >> >+ Elf32_Addr got_addr =3D (Elf32_Addr) &__dl_relocate_static_pie; > > >> >+ asm ("adr %0, _dl_relocate_static_pie" : "=3Dr" (pcrel_addr)); > > >> >+#endif > > >> >+#ifdef __thumb__ > > >> >+ /* Clear the low bit of the function address. > > >> >+ > > >> >+ NOTE: got_addr is from GOT table whose lsb is always set by > > >> >linker if it's > > >> >+ Thumb function address. PCREL_ADDR comes from PC-relative > > >> >calculation > > >> >+ which will finish during assembling. GAS assembler before > > >> >the fix for > > >> >+ PR gas/21458 was not setting the lsb but does after that. > > >> >Always do the > > >> >+ strip for both, so the code works with various combinations > > >> >of glibc and > > >> >+ Binutils. */ > > >> >+ got_addr &=3D ~(Elf32_Addr) 1; > > >> >+ pcrel_addr &=3D ~(Elf32_Addr) 1; > > >> >+#endif > > >> >+ return pcrel_addr - got_addr; > > >> > } > > >> > > > >> > /* Return the link-time address of _DYNAMIC. */ > > >> >-- > > >> >2.20.1 > > >> > > > > > > >Note: > > > > > >[*] - the QEMU debugging is pretty difficult with /sbin/init. Qemu > > >has good support for kernel debugging (-s -S switches). Then for > > >user space program one could use the "on-board" gdb or gdbserver. > > > > > >However, /sbin/init needs to be debugged from linux and scheduler > > >context switches needs to be taken into account to review how the > > >code is executed. And such debugging scenario has issue with > > >QEMU-arm. > > > > > > > > >Best regards, > > > > > >Lukasz Majewski > > > > > >-- > > > > > >DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > > >HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > > >Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: > > >lukma@denx.de > > > > > > > > > Best regards, > > Lukasz Majewski > > -- > > DENX Software Engineering GmbH, Managing Director: Wolfgang Denk > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de --=20 =E5=AE=8B=E6=96=B9=E7=9D=BF