public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: mengqinggang <mengqinggang@loongson.cn>
To: libc-alpha@sourceware.org
Cc: adhemerval.zanella@linaro.org, xuchenghua@loongson.cn,
	caiyinyu@loongson.cn, chenglulu@loongson.cn, cailulu@loongson.cn,
	xry111@xry111.site, i.swmail@xen0n.name, maskray@google.com,
	luweining@loongson.cn, wanglei@loongson.cn,
	hejinyang@loongson.cn
Subject: Re: [PATCH v2] LoongArch: Add cfi instructions for _dl_tlsdesc_dynamic
Date: Mon, 1 Jul 2024 17:27:20 +0800	[thread overview]
Message-ID: <f357d86b-0747-d06e-aff8-271fe893b8ca@loongson.cn> (raw)
In-Reply-To: <20240626063439.4121365-1-mengqinggang@loongson.cn>

Ping.


The reason of changing to three _dl_tlsdesc_dynamic:
In one _dl_tlsdesc_dynamic,  there are three cfi_adjust_cfa_offset for 
Float/LSX/LASX path.
Three cfi_adjust_cfa_offset are always executed in stack unwinding, but 
only once stack down
instruction is executed. It resulting in incorrect CFA address.


With three _dl_tlsdesc_dynamic functions, three cfi_adjust_cfa_offset 
can be distributed to three functions.
So cfi instructions can correspond to stack down instructions.


在 2024/6/26 下午2:34, mengqinggang 写道:
> Change _dl_tlsdesc_dynamic to _dl_tlsdesc_dynamic,
> _dl_tlsdesc_dynamic_lsx and _dl_tlsdesc_dynamic_lasx.
> Conflicting cfi instructions can be distributed to the
> three functions.
> ---
> Changes v1 -> v2:
> - Change _dl_tlsdesc_dynamic to _dl_tlsdesc_dynamic,
>    _dl_tlsdesc_dynamic_lsx and _dl_tlsdesc_dynamic_lasx.
>
> v1 link: https://sourceware.org/pipermail/libc-alpha/2024-June/157270.html
>
>   sysdeps/loongarch/dl-machine.h         |   7 +
>   sysdeps/loongarch/dl-tlsdesc-dynamic.h | 403 +++++++++++++++++++++++++
>   sysdeps/loongarch/dl-tlsdesc.S         | 386 ++---------------------
>   sysdeps/loongarch/dl-tlsdesc.h         |   4 +
>   4 files changed, 436 insertions(+), 364 deletions(-)
>   create mode 100644 sysdeps/loongarch/dl-tlsdesc-dynamic.h
>
> diff --git a/sysdeps/loongarch/dl-machine.h b/sysdeps/loongarch/dl-machine.h
> index ab6f1da7c0..04fabbf598 100644
> --- a/sysdeps/loongarch/dl-machine.h
> +++ b/sysdeps/loongarch/dl-machine.h
> @@ -223,6 +223,13 @@ elf_machine_rela (struct link_map *map, struct r_scope_elem *scope[],
>   	      {
>   		td->arg = _dl_make_tlsdesc_dynamic (sym_map,
>   			      sym->st_value + reloc->r_addend);
> +# ifndef __loongarch_soft_float
> +		if (SUPPORT_LASX)
> +		  td->entry = _dl_tlsdesc_dynamic_lasx;
> +		else if (SUPPORT_LSX)
> +		  td->entry = _dl_tlsdesc_dynamic_lsx;
> +		else
> +# endif
>   		td->entry = _dl_tlsdesc_dynamic;
>   	      }
>   	    else
> diff --git a/sysdeps/loongarch/dl-tlsdesc-dynamic.h b/sysdeps/loongarch/dl-tlsdesc-dynamic.h
> new file mode 100644
> index 0000000000..5b1f43aaf4
> --- /dev/null
> +++ b/sysdeps/loongarch/dl-tlsdesc-dynamic.h
> @@ -0,0 +1,403 @@
> +/* Thread-local storage handling in the ELF dynamic linker.
> +   LoongArch version.
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define FRAME_SIZE	  (-((-14 * SZREG) & ALMASK))
> +#define FRAME_SIZE_LSX	  (-((-32 * SZVREG) & ALMASK))
> +#define FRAME_SIZE_LASX	  (-((-32 * SZXREG) & ALMASK))
> +#define FRAME_SIZE_FLOAT  (-((-24 * SZFREG) & ALMASK))
> +
> +	/* Handler for dynamic TLS symbols.
> +	   Prototype:
> +	   _dl_tlsdesc_dynamic (tlsdesc *) ;
> +
> +	   The second word of the descriptor points to a
> +	   tlsdesc_dynamic_arg structure.
> +
> +	   Returns the offset between the thread pointer and the
> +	   object referenced by the argument.
> +
> +	   ptrdiff_t
> +	   _dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> +	   {
> +	     struct tlsdesc_dynamic_arg *td = tdp->arg;
> +	     dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer - SIZE_OF_TCB);
> +	     if (__glibc_likely (td->gen_count <= dtv[0].counter
> +		&& (dtv[td->tlsinfo.ti_module].pointer.val
> +		    != TLS_DTV_UNALLOCATED),
> +		1))
> +	       return dtv[td->tlsinfo.ti_module].pointer.val
> +		+ td->tlsinfo.ti_offset
> +		- __thread_pointer;
> +
> +	     return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> +	   }  */
> +	.hidden _dl_tlsdesc_dynamic
> +	.global	_dl_tlsdesc_dynamic
> +	.type	_dl_tlsdesc_dynamic,%function
> +	cfi_startproc
> +	.align 2
> +_dl_tlsdesc_dynamic:
> +	/* Save just enough registers to support fast path, if we fall
> +	   into slow path we will save additional registers.  */
> +	ADDI	sp, sp, -32
> +	cfi_adjust_cfa_offset (32)
> +	REG_S	t0, sp, 0
> +	REG_S	t1, sp, 8
> +	REG_S	t2, sp, 16
> +	cfi_rel_offset (12, 0)
> +	cfi_rel_offset (13, 8)
> +	cfi_rel_offset (14, 16)
> +
> +/* Runtime Storage Layout of Thread-Local Storage
> +   TP point to the start of TLS block.
> +
> +				      dtv
> +Low address	TCB ----------------> dtv0(counter)
> +	 TP -->	static_block0  <----- dtv1
> +		static_block1  <----- dtv2
> +		static_block2  <----- dtv3
> +		dynamic_block0 <----- dtv4
> +Hign address	dynamic_block1 <----- dtv5  */
> +
> +	REG_L	t0, tp, -SIZE_OF_TCB	  /* t0 = dtv */
> +	REG_L	a0, a0, TLSDESC_ARG	  /* a0(td) = tdp->arg */
> +	REG_L	t1, a0, TLSDESC_GEN_COUNT /* t1 = td->gen_count */
> +	REG_L	t2, t0, DTV_COUNTER	  /* t2 = dtv[0].counter */
> +	/* If dtv[0].counter < td->gen_count, goto slow path.  */
> +	bltu	t2, t1, .Lslow
> +
> +	REG_L	t1, a0, TLSDESC_MODID /* t1 = td->tlsinfo.ti_module */
> +	/* t1 = t1 * sizeof(dtv_t) = t1 * (2 * sizeof(void*)) */
> +	slli.d	t1, t1, 4
> +	add.d	t1, t1, t0  /* t1 = dtv[td->tlsinfo.ti_module] */
> +	REG_L	t1, t1, 0   /* t1 = dtv[td->tlsinfo.ti_module].pointer.val */
> +	li.d	t2, TLS_DTV_UNALLOCATED
> +	/* If dtv[td->tlsinfo.ti_module].pointer.val is TLS_DTV_UNALLOCATED,
> +	   goto slow path.  */
> +	beq	t1, t2, .Lslow
> +
> +	cfi_remember_state
> +	REG_L	t2, a0, TLSDESC_MODOFF	/* t2 = td->tlsinfo.ti_offset */
> +	/* dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset */
> +	add.d	a0, t1, t2
> +.Lret:
> +	sub.d	a0, a0, tp
> +	REG_L	t0, sp, 0
> +	REG_L	t1, sp, 8
> +	REG_L	t2, sp, 16
> +	ADDI	sp, sp, 32
> +	cfi_adjust_cfa_offset (-32)
> +	RET
> +
> +.Lslow:
> +	/* This is the slow path.  We need to call __tls_get_addr() which
> +	   means we need to save and restore all the register that the
> +	   callee will trash.  */
> +
> +	/* Save the remaining registers that we must treat as caller save.  */
> +	cfi_restore_state
> +	ADDI	sp, sp, -FRAME_SIZE
> +	cfi_adjust_cfa_offset (FRAME_SIZE)
> +	REG_S	ra, sp, 0 * SZREG
> +	REG_S	a1, sp, 1 * SZREG
> +	REG_S	a2, sp, 2 * SZREG
> +	REG_S	a3, sp, 3 * SZREG
> +	REG_S	a4, sp, 4 * SZREG
> +	REG_S	a5, sp, 5 * SZREG
> +	REG_S	a6, sp, 6 * SZREG
> +	REG_S	a7, sp, 7 * SZREG
> +	REG_S	t3, sp, 8 * SZREG
> +	REG_S	t4, sp, 9 * SZREG
> +	REG_S	t5, sp, 10 * SZREG
> +	REG_S	t6, sp, 11 * SZREG
> +	REG_S	t7, sp, 12 * SZREG
> +	REG_S	t8, sp, 13 * SZREG
> +	cfi_rel_offset (1, 0 * SZREG)
> +	cfi_rel_offset (5, 1 * SZREG)
> +	cfi_rel_offset (6, 2 * SZREG)
> +	cfi_rel_offset (7, 3 * SZREG)
> +	cfi_rel_offset (8, 4 * SZREG)
> +	cfi_rel_offset (9, 5 * SZREG)
> +	cfi_rel_offset (10, 6 * SZREG)
> +	cfi_rel_offset (11, 7 * SZREG)
> +	cfi_rel_offset (15, 8 * SZREG)
> +	cfi_rel_offset (16, 9 * SZREG)
> +	cfi_rel_offset (17, 10 * SZREG)
> +	cfi_rel_offset (18, 11 * SZREG)
> +	cfi_rel_offset (19, 12 * SZREG)
> +	cfi_rel_offset (20, 13 * SZREG)
> +
> +#ifndef __loongarch_soft_float
> +
> +	/* Save fcsr0 register.
> +	   Only one physical fcsr0 register, fcsr1-fcsr3 are aliases
> +	   of some fields in fcsr0.  */
> +	movfcsr2gr  t0, fcsr0
> +	st.w	t0, sp, FRAME_SIZE + 24 /* Use the spare slot above t2.  */
> +
> +#ifdef USE_LASX
> +
> +	/* Save 256-bit vector registers.
> +	   FIXME: Without vector ABI, save all vector registers.  */
> +	ADDI	sp, sp, -FRAME_SIZE_LASX
> +	cfi_adjust_cfa_offset (FRAME_SIZE_LASX)
> +	xvst	xr0, sp, 0*SZXREG
> +	xvst	xr1, sp, 1*SZXREG
> +	xvst	xr2, sp, 2*SZXREG
> +	xvst	xr3, sp, 3*SZXREG
> +	xvst	xr4, sp, 4*SZXREG
> +	xvst	xr5, sp, 5*SZXREG
> +	xvst	xr6, sp, 6*SZXREG
> +	xvst	xr7, sp, 7*SZXREG
> +	xvst	xr8, sp, 8*SZXREG
> +	xvst	xr9, sp, 9*SZXREG
> +	xvst	xr10, sp, 10*SZXREG
> +	xvst	xr11, sp, 11*SZXREG
> +	xvst	xr12, sp, 12*SZXREG
> +	xvst	xr13, sp, 13*SZXREG
> +	xvst	xr14, sp, 14*SZXREG
> +	xvst	xr15, sp, 15*SZXREG
> +	xvst	xr16, sp, 16*SZXREG
> +	xvst	xr17, sp, 17*SZXREG
> +	xvst	xr18, sp, 18*SZXREG
> +	xvst	xr19, sp, 19*SZXREG
> +	xvst	xr20, sp, 20*SZXREG
> +	xvst	xr21, sp, 21*SZXREG
> +	xvst	xr22, sp, 22*SZXREG
> +	xvst	xr23, sp, 23*SZXREG
> +	xvst	xr24, sp, 24*SZXREG
> +	xvst	xr25, sp, 25*SZXREG
> +	xvst	xr26, sp, 26*SZXREG
> +	xvst	xr27, sp, 27*SZXREG
> +	xvst	xr28, sp, 28*SZXREG
> +	xvst	xr29, sp, 29*SZXREG
> +	xvst	xr30, sp, 30*SZXREG
> +	xvst	xr31, sp, 31*SZXREG
> +
> +#elif defined USE_LSX
> +
> +	/* Save 128-bit vector registers.  */
> +	ADDI	sp, sp, -FRAME_SIZE_LSX
> +	cfi_adjust_cfa_offset (FRAME_SIZE_LSX)
> +	vst	vr0, sp, 0*SZVREG
> +	vst	vr1, sp, 1*SZVREG
> +	vst	vr2, sp, 2*SZVREG
> +	vst	vr3, sp, 3*SZVREG
> +	vst	vr4, sp, 4*SZVREG
> +	vst	vr5, sp, 5*SZVREG
> +	vst	vr6, sp, 6*SZVREG
> +	vst	vr7, sp, 7*SZVREG
> +	vst	vr8, sp, 8*SZVREG
> +	vst	vr9, sp, 9*SZVREG
> +	vst	vr10, sp, 10*SZVREG
> +	vst	vr11, sp, 11*SZVREG
> +	vst	vr12, sp, 12*SZVREG
> +	vst	vr13, sp, 13*SZVREG
> +	vst	vr14, sp, 14*SZVREG
> +	vst	vr15, sp, 15*SZVREG
> +	vst	vr16, sp, 16*SZVREG
> +	vst	vr17, sp, 17*SZVREG
> +	vst	vr18, sp, 18*SZVREG
> +	vst	vr19, sp, 19*SZVREG
> +	vst	vr20, sp, 20*SZVREG
> +	vst	vr21, sp, 21*SZVREG
> +	vst	vr22, sp, 22*SZVREG
> +	vst	vr23, sp, 23*SZVREG
> +	vst	vr24, sp, 24*SZVREG
> +	vst	vr25, sp, 25*SZVREG
> +	vst	vr26, sp, 26*SZVREG
> +	vst	vr27, sp, 27*SZVREG
> +	vst	vr28, sp, 28*SZVREG
> +	vst	vr29, sp, 29*SZVREG
> +	vst	vr30, sp, 30*SZVREG
> +	vst	vr31, sp, 31*SZVREG
> +
> +# else
> +
> +	/* Save float registers.  */
> +	ADDI	sp, sp, -FRAME_SIZE_FLOAT
> +	cfi_adjust_cfa_offset (FRAME_SIZE_FLOAT)
> +	FREG_S	fa0, sp, 0*SZFREG
> +	FREG_S	fa1, sp, 1*SZFREG
> +	FREG_S	fa2, sp, 2*SZFREG
> +	FREG_S	fa3, sp, 3*SZFREG
> +	FREG_S	fa4, sp, 4*SZFREG
> +	FREG_S	fa5, sp, 5*SZFREG
> +	FREG_S	fa6, sp, 6*SZFREG
> +	FREG_S	fa7, sp, 7*SZFREG
> +	FREG_S	ft0, sp, 8*SZFREG
> +	FREG_S	ft1, sp, 9*SZFREG
> +	FREG_S	ft2, sp, 10*SZFREG
> +	FREG_S	ft3, sp, 11*SZFREG
> +	FREG_S	ft4, sp, 12*SZFREG
> +	FREG_S	ft5, sp, 13*SZFREG
> +	FREG_S	ft6, sp, 14*SZFREG
> +	FREG_S	ft7, sp, 15*SZFREG
> +	FREG_S	ft8, sp, 16*SZFREG
> +	FREG_S	ft9, sp, 17*SZFREG
> +	FREG_S	ft10, sp, 18*SZFREG
> +	FREG_S	ft11, sp, 19*SZFREG
> +	FREG_S	ft12, sp, 20*SZFREG
> +	FREG_S	ft13, sp, 21*SZFREG
> +	FREG_S	ft14, sp, 22*SZFREG
> +	FREG_S	ft15, sp, 23*SZFREG
> +
> +#endif /* #ifdef USE_LASX */
> +#endif /* #ifndef __loongarch_soft_float */
> +
> +	bl	HIDDEN_JUMPTARGET(__tls_get_addr)
> +	ADDI	a0, a0, -TLS_DTV_OFFSET
> +
> +#ifndef __loongarch_soft_float
> +#ifdef USE_LASX
> +
> +	/* Restore 256-bit vector registers.  */
> +	xvld	xr0, sp, 0*SZXREG
> +	xvld	xr1, sp, 1*SZXREG
> +	xvld	xr2, sp, 2*SZXREG
> +	xvld	xr3, sp, 3*SZXREG
> +	xvld	xr4, sp, 4*SZXREG
> +	xvld	xr5, sp, 5*SZXREG
> +	xvld	xr6, sp, 6*SZXREG
> +	xvld	xr7, sp, 7*SZXREG
> +	xvld	xr8, sp, 8*SZXREG
> +	xvld	xr9, sp, 9*SZXREG
> +	xvld	xr10, sp, 10*SZXREG
> +	xvld	xr11, sp, 11*SZXREG
> +	xvld	xr12, sp, 12*SZXREG
> +	xvld	xr13, sp, 13*SZXREG
> +	xvld	xr14, sp, 14*SZXREG
> +	xvld	xr15, sp, 15*SZXREG
> +	xvld	xr16, sp, 16*SZXREG
> +	xvld	xr17, sp, 17*SZXREG
> +	xvld	xr18, sp, 18*SZXREG
> +	xvld	xr19, sp, 19*SZXREG
> +	xvld	xr20, sp, 20*SZXREG
> +	xvld	xr21, sp, 21*SZXREG
> +	xvld	xr22, sp, 22*SZXREG
> +	xvld	xr23, sp, 23*SZXREG
> +	xvld	xr24, sp, 24*SZXREG
> +	xvld	xr25, sp, 25*SZXREG
> +	xvld	xr26, sp, 26*SZXREG
> +	xvld	xr27, sp, 27*SZXREG
> +	xvld	xr28, sp, 28*SZXREG
> +	xvld	xr29, sp, 29*SZXREG
> +	xvld	xr30, sp, 30*SZXREG
> +	xvld	xr31, sp, 31*SZXREG
> +	ADDI	sp, sp, FRAME_SIZE_LASX
> +	cfi_adjust_cfa_offset (-FRAME_SIZE_LASX)
> +
> +#elif defined USE_LSX
> +
> +	/* Restore 128-bit vector registers.  */
> +	vld	vr0, sp, 0*SZVREG
> +	vld	vr1, sp, 1*SZVREG
> +	vld	vr2, sp, 2*SZVREG
> +	vld	vr3, sp, 3*SZVREG
> +	vld	vr4, sp, 4*SZVREG
> +	vld	vr5, sp, 5*SZVREG
> +	vld	vr6, sp, 6*SZVREG
> +	vld	vr7, sp, 7*SZVREG
> +	vld	vr8, sp, 8*SZVREG
> +	vld	vr9, sp, 9*SZVREG
> +	vld	vr10, sp, 10*SZVREG
> +	vld	vr11, sp, 11*SZVREG
> +	vld	vr12, sp, 12*SZVREG
> +	vld	vr13, sp, 13*SZVREG
> +	vld	vr14, sp, 14*SZVREG
> +	vld	vr15, sp, 15*SZVREG
> +	vld	vr16, sp, 16*SZVREG
> +	vld	vr17, sp, 17*SZVREG
> +	vld	vr18, sp, 18*SZVREG
> +	vld	vr19, sp, 19*SZVREG
> +	vld	vr20, sp, 20*SZVREG
> +	vld	vr21, sp, 21*SZVREG
> +	vld	vr22, sp, 22*SZVREG
> +	vld	vr23, sp, 23*SZVREG
> +	vld	vr24, sp, 24*SZVREG
> +	vld	vr25, sp, 25*SZVREG
> +	vld	vr26, sp, 26*SZVREG
> +	vld	vr27, sp, 27*SZVREG
> +	vld	vr28, sp, 28*SZVREG
> +	vld	vr29, sp, 29*SZVREG
> +	vld	vr30, sp, 30*SZVREG
> +	vld	vr31, sp, 31*SZVREG
> +	ADDI	sp, sp, FRAME_SIZE_LSX
> +	cfi_adjust_cfa_offset (-FRAME_SIZE_LSX)
> +
> +#else
> +
> +	/* Restore float registers.  */
> +	FREG_L	fa0, sp, 0*SZFREG
> +	FREG_L	fa1, sp, 1*SZFREG
> +	FREG_L	fa2, sp, 2*SZFREG
> +	FREG_L	fa3, sp, 3*SZFREG
> +	FREG_L	fa4, sp, 4*SZFREG
> +	FREG_L	fa5, sp, 5*SZFREG
> +	FREG_L	fa6, sp, 6*SZFREG
> +	FREG_L	fa7, sp, 7*SZFREG
> +	FREG_L	ft0, sp, 8*SZFREG
> +	FREG_L	ft1, sp, 9*SZFREG
> +	FREG_L	ft2, sp, 10*SZFREG
> +	FREG_L	ft3, sp, 11*SZFREG
> +	FREG_L	ft4, sp, 12*SZFREG
> +	FREG_L	ft5, sp, 13*SZFREG
> +	FREG_L	ft6, sp, 14*SZFREG
> +	FREG_L	ft7, sp, 15*SZFREG
> +	FREG_L	ft8, sp, 16*SZFREG
> +	FREG_L	ft9, sp, 17*SZFREG
> +	FREG_L	ft10, sp, 18*SZFREG
> +	FREG_L	ft11, sp, 19*SZFREG
> +	FREG_L	ft12, sp, 20*SZFREG
> +	FREG_L	ft13, sp, 21*SZFREG
> +	FREG_L	ft14, sp, 22*SZFREG
> +	FREG_L	ft15, sp, 23*SZFREG
> +	ADDI	sp, sp, FRAME_SIZE_FLOAT
> +	cfi_adjust_cfa_offset (-FRAME_SIZE_FLOAT)
> +
> +#endif /* #ifdef USE_LASX */
> +
> +	/* Restore fcsr0 register.  */
> +	ld.w	t0, sp, FRAME_SIZE + 24
> +	movgr2fcsr  fcsr0, t0
> +
> +#endif /* #ifndef __loongarch_soft_float */
> +
> +	REG_L	ra, sp, 0 * SZREG
> +	REG_L	a1, sp, 1 * SZREG
> +	REG_L	a2, sp, 2 * SZREG
> +	REG_L	a3, sp, 3 * SZREG
> +	REG_L	a4, sp, 4 * SZREG
> +	REG_L	a5, sp, 5 * SZREG
> +	REG_L	a6, sp, 6 * SZREG
> +	REG_L	a7, sp, 7 * SZREG
> +	REG_L	t3, sp, 8 * SZREG
> +	REG_L	t4, sp, 9 * SZREG
> +	REG_L	t5, sp, 10 * SZREG
> +	REG_L	t6, sp, 11 * SZREG
> +	REG_L	t7, sp, 12 * SZREG
> +	REG_L	t8, sp, 13 * SZREG
> +	ADDI	sp, sp, FRAME_SIZE
> +	cfi_adjust_cfa_offset (-FRAME_SIZE)
> +
> +	b	.Lret
> +	cfi_endproc
> +	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> +	.hidden HIDDEN_JUMPTARGET(__tls_get_addr)
> diff --git a/sysdeps/loongarch/dl-tlsdesc.S b/sysdeps/loongarch/dl-tlsdesc.S
> index a6627cc754..b6cfd6121d 100644
> --- a/sysdeps/loongarch/dl-tlsdesc.S
> +++ b/sysdeps/loongarch/dl-tlsdesc.S
> @@ -59,376 +59,34 @@ _dl_tlsdesc_undefweak:
>   	cfi_endproc
>   	.size	_dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
>   
> -
>   #ifdef SHARED
>   
> -#define FRAME_SIZE	  (-((-14 * SZREG) & ALMASK))
> -#define FRAME_SIZE_LSX	  (-((-32 * SZVREG) & ALMASK))
> -#define FRAME_SIZE_LASX	  (-((-32 * SZXREG) & ALMASK))
> -#define FRAME_SIZE_FLOAT  (-((-24 * SZFREG) & ALMASK))
> -
> -	/* Handler for dynamic TLS symbols.
> -	   Prototype:
> -	   _dl_tlsdesc_dynamic (tlsdesc *) ;
> -
> -	   The second word of the descriptor points to a
> -	   tlsdesc_dynamic_arg structure.
> -
> -	   Returns the offset between the thread pointer and the
> -	   object referenced by the argument.
> -
> -	   ptrdiff_t
> -	   _dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> -	   {
> -	     struct tlsdesc_dynamic_arg *td = tdp->arg;
> -	     dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer - SIZE_OF_TCB);
> -	     if (__glibc_likely (td->gen_count <= dtv[0].counter
> -		&& (dtv[td->tlsinfo.ti_module].pointer.val
> -		    != TLS_DTV_UNALLOCATED),
> -		1))
> -	       return dtv[td->tlsinfo.ti_module].pointer.val
> -		+ td->tlsinfo.ti_offset
> -		- __thread_pointer;
> -
> -	     return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> -	   }  */
> -	.hidden _dl_tlsdesc_dynamic
> -	.global	_dl_tlsdesc_dynamic
> -	.type	_dl_tlsdesc_dynamic,%function
> -	cfi_startproc
> -	.align 2
> -_dl_tlsdesc_dynamic:
> -	/* Save just enough registers to support fast path, if we fall
> -	   into slow path we will save additional registers.  */
> -	ADDI	sp, sp, -32
> -	REG_S	t0, sp, 0
> -	REG_S	t1, sp, 8
> -	REG_S	t2, sp, 16
> -
> -/* Runtime Storage Layout of Thread-Local Storage
> -   TP point to the start of TLS block.
> -
> -				      dtv
> -Low address	TCB ----------------> dtv0(counter)
> -	 TP -->	static_block0  <----- dtv1
> -		static_block1  <----- dtv2
> -		static_block2  <----- dtv3
> -		dynamic_block0 <----- dtv4
> -Hign address	dynamic_block1 <----- dtv5  */
> -
> -	REG_L	t0, tp, -SIZE_OF_TCB	  /* t0 = dtv */
> -	REG_L	a0, a0, TLSDESC_ARG	  /* a0(td) = tdp->arg */
> -	REG_L	t1, a0, TLSDESC_GEN_COUNT /* t1 = td->gen_count */
> -	REG_L	t2, t0, DTV_COUNTER	  /* t2 = dtv[0].counter */
> -	/* If dtv[0].counter < td->gen_count, goto slow path.  */
> -	bltu	t2, t1, .Lslow
> -
> -	REG_L	t1, a0, TLSDESC_MODID /* t1 = td->tlsinfo.ti_module */
> -	/* t1 = t1 * sizeof(dtv_t) = t1 * (2 * sizeof(void*)) */
> -	slli.d	t1, t1, 4
> -	add.d	t1, t1, t0  /* t1 = dtv[td->tlsinfo.ti_module] */
> -	REG_L	t1, t1, 0   /* t1 = dtv[td->tlsinfo.ti_module].pointer.val */
> -	li.d	t2, TLS_DTV_UNALLOCATED
> -	/* If dtv[td->tlsinfo.ti_module].pointer.val is TLS_DTV_UNALLOCATED,
> -	   goto slow path.  */
> -	beq	t1, t2, .Lslow
> -
> -	REG_L	t2, a0, TLSDESC_MODOFF	/* t2 = td->tlsinfo.ti_offset */
> -	/* dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset */
> -	add.d	a0, t1, t2
> -.Lret:
> -	sub.d	a0, a0, tp
> -	REG_L	t0, sp, 0
> -	REG_L	t1, sp, 8
> -	REG_L	t2, sp, 16
> -	ADDI	sp, sp, 32
> -	RET
> -
> -.Lslow:
> -	/* This is the slow path. We need to call __tls_get_addr() which
> -	   means we need to save and restore all the register that the
> -	   callee will trash.  */
> -
> -	/* Save the remaining registers that we must treat as caller save.  */
> -	ADDI	sp, sp, -FRAME_SIZE
> -	REG_S	ra, sp, 0 * SZREG
> -	REG_S	a1, sp, 1 * SZREG
> -	REG_S	a2, sp, 2 * SZREG
> -	REG_S	a3, sp, 3 * SZREG
> -	REG_S	a4, sp, 4 * SZREG
> -	REG_S	a5, sp, 5 * SZREG
> -	REG_S	a6, sp, 6 * SZREG
> -	REG_S	a7, sp, 7 * SZREG
> -	REG_S	t3, sp, 8 * SZREG
> -	REG_S	t4, sp, 9 * SZREG
> -	REG_S	t5, sp, 10 * SZREG
> -	REG_S	t6, sp, 11 * SZREG
> -	REG_S	t7, sp, 12 * SZREG
> -	REG_S	t8, sp, 13 * SZREG
> -
>   #ifndef __loongarch_soft_float
>   
> -	/* Save fcsr0 register.
> -	   Only one physical fcsr0 register, fcsr1-fcsr3 are aliases
> -	   of some fields in fcsr0.  */
> -	movfcsr2gr  t0, fcsr0
> -	st.w	t0, sp, FRAME_SIZE + 24 /* Use the spare slot above t2 */
> -
> -	/* Whether support LASX.  */
> -	la.global   t0, _rtld_global_ro
> -	REG_L	t0, t0, GLRO_DL_HWCAP_OFFSET
> -	andi	t1, t0, HWCAP_LOONGARCH_LASX
> -	beqz	t1, .Llsx
> -
> -	/* Save 256-bit vector registers.
> -	   FIXME: Without vector ABI, save all vector registers.  */
> -	ADDI	sp, sp, -FRAME_SIZE_LASX
> -	xvst	xr0, sp, 0*SZXREG
> -	xvst	xr1, sp, 1*SZXREG
> -	xvst	xr2, sp, 2*SZXREG
> -	xvst	xr3, sp, 3*SZXREG
> -	xvst	xr4, sp, 4*SZXREG
> -	xvst	xr5, sp, 5*SZXREG
> -	xvst	xr6, sp, 6*SZXREG
> -	xvst	xr7, sp, 7*SZXREG
> -	xvst	xr8, sp, 8*SZXREG
> -	xvst	xr9, sp, 9*SZXREG
> -	xvst	xr10, sp, 10*SZXREG
> -	xvst	xr11, sp, 11*SZXREG
> -	xvst	xr12, sp, 12*SZXREG
> -	xvst	xr13, sp, 13*SZXREG
> -	xvst	xr14, sp, 14*SZXREG
> -	xvst	xr15, sp, 15*SZXREG
> -	xvst	xr16, sp, 16*SZXREG
> -	xvst	xr17, sp, 17*SZXREG
> -	xvst	xr18, sp, 18*SZXREG
> -	xvst	xr19, sp, 19*SZXREG
> -	xvst	xr20, sp, 20*SZXREG
> -	xvst	xr21, sp, 21*SZXREG
> -	xvst	xr22, sp, 22*SZXREG
> -	xvst	xr23, sp, 23*SZXREG
> -	xvst	xr24, sp, 24*SZXREG
> -	xvst	xr25, sp, 25*SZXREG
> -	xvst	xr26, sp, 26*SZXREG
> -	xvst	xr27, sp, 27*SZXREG
> -	xvst	xr28, sp, 28*SZXREG
> -	xvst	xr29, sp, 29*SZXREG
> -	xvst	xr30, sp, 30*SZXREG
> -	xvst	xr31, sp, 31*SZXREG
> -	b	    .Ltga
> -
> -.Llsx:
> -	/* Whether support LSX.  */
> -	andi	t1, t0, HWCAP_LOONGARCH_LSX
> -	beqz	t1, .Lfloat
> -
> -	/* Save 128-bit vector registers.  */
> -	ADDI	sp, sp, -FRAME_SIZE_LSX
> -	vst	vr0, sp, 0*SZVREG
> -	vst	vr1, sp, 1*SZVREG
> -	vst	vr2, sp, 2*SZVREG
> -	vst	vr3, sp, 3*SZVREG
> -	vst	vr4, sp, 4*SZVREG
> -	vst	vr5, sp, 5*SZVREG
> -	vst	vr6, sp, 6*SZVREG
> -	vst	vr7, sp, 7*SZVREG
> -	vst	vr8, sp, 8*SZVREG
> -	vst	vr9, sp, 9*SZVREG
> -	vst	vr10, sp, 10*SZVREG
> -	vst	vr11, sp, 11*SZVREG
> -	vst	vr12, sp, 12*SZVREG
> -	vst	vr13, sp, 13*SZVREG
> -	vst	vr14, sp, 14*SZVREG
> -	vst	vr15, sp, 15*SZVREG
> -	vst	vr16, sp, 16*SZVREG
> -	vst	vr17, sp, 17*SZVREG
> -	vst	vr18, sp, 18*SZVREG
> -	vst	vr19, sp, 19*SZVREG
> -	vst	vr20, sp, 20*SZVREG
> -	vst	vr21, sp, 21*SZVREG
> -	vst	vr22, sp, 22*SZVREG
> -	vst	vr23, sp, 23*SZVREG
> -	vst	vr24, sp, 24*SZVREG
> -	vst	vr25, sp, 25*SZVREG
> -	vst	vr26, sp, 26*SZVREG
> -	vst	vr27, sp, 27*SZVREG
> -	vst	vr28, sp, 28*SZVREG
> -	vst	vr29, sp, 29*SZVREG
> -	vst	vr30, sp, 30*SZVREG
> -	vst	vr31, sp, 31*SZVREG
> -	b	    .Ltga
> -
> -.Lfloat:
> -	/* Save float registers.  */
> -	ADDI	sp, sp, -FRAME_SIZE_FLOAT
> -	FREG_S	fa0, sp, 0*SZFREG
> -	FREG_S	fa1, sp, 1*SZFREG
> -	FREG_S	fa2, sp, 2*SZFREG
> -	FREG_S	fa3, sp, 3*SZFREG
> -	FREG_S	fa4, sp, 4*SZFREG
> -	FREG_S	fa5, sp, 5*SZFREG
> -	FREG_S	fa6, sp, 6*SZFREG
> -	FREG_S	fa7, sp, 7*SZFREG
> -	FREG_S	ft0, sp, 8*SZFREG
> -	FREG_S	ft1, sp, 9*SZFREG
> -	FREG_S	ft2, sp, 10*SZFREG
> -	FREG_S	ft3, sp, 11*SZFREG
> -	FREG_S	ft4, sp, 12*SZFREG
> -	FREG_S	ft5, sp, 13*SZFREG
> -	FREG_S	ft6, sp, 14*SZFREG
> -	FREG_S	ft7, sp, 15*SZFREG
> -	FREG_S	ft8, sp, 16*SZFREG
> -	FREG_S	ft9, sp, 17*SZFREG
> -	FREG_S	ft10, sp, 18*SZFREG
> -	FREG_S	ft11, sp, 19*SZFREG
> -	FREG_S	ft12, sp, 20*SZFREG
> -	FREG_S	ft13, sp, 21*SZFREG
> -	FREG_S	ft14, sp, 22*SZFREG
> -	FREG_S	ft15, sp, 23*SZFREG
> -
> -#endif /* #ifndef __loongarch_soft_float */
> -
> -.Ltga:
> -	bl	HIDDEN_JUMPTARGET(__tls_get_addr)
> -	ADDI	a0, a0, -TLS_DTV_OFFSET
> -
> -#ifndef __loongarch_soft_float
> -
> -	la.global   t0, _rtld_global_ro
> -	REG_L	t0, t0, GLRO_DL_HWCAP_OFFSET
> -	andi	t1, t0, HWCAP_LOONGARCH_LASX
> -	beqz	t1, .Llsx1
> -
> -	/* Restore 256-bit vector registers.  */
> -	xvld	xr0, sp, 0*SZXREG
> -	xvld	xr1, sp, 1*SZXREG
> -	xvld	xr2, sp, 2*SZXREG
> -	xvld	xr3, sp, 3*SZXREG
> -	xvld	xr4, sp, 4*SZXREG
> -	xvld	xr5, sp, 5*SZXREG
> -	xvld	xr6, sp, 6*SZXREG
> -	xvld	xr7, sp, 7*SZXREG
> -	xvld	xr8, sp, 8*SZXREG
> -	xvld	xr9, sp, 9*SZXREG
> -	xvld	xr10, sp, 10*SZXREG
> -	xvld	xr11, sp, 11*SZXREG
> -	xvld	xr12, sp, 12*SZXREG
> -	xvld	xr13, sp, 13*SZXREG
> -	xvld	xr14, sp, 14*SZXREG
> -	xvld	xr15, sp, 15*SZXREG
> -	xvld	xr16, sp, 16*SZXREG
> -	xvld	xr17, sp, 17*SZXREG
> -	xvld	xr18, sp, 18*SZXREG
> -	xvld	xr19, sp, 19*SZXREG
> -	xvld	xr20, sp, 20*SZXREG
> -	xvld	xr21, sp, 21*SZXREG
> -	xvld	xr22, sp, 22*SZXREG
> -	xvld	xr23, sp, 23*SZXREG
> -	xvld	xr24, sp, 24*SZXREG
> -	xvld	xr25, sp, 25*SZXREG
> -	xvld	xr26, sp, 26*SZXREG
> -	xvld	xr27, sp, 27*SZXREG
> -	xvld	xr28, sp, 28*SZXREG
> -	xvld	xr29, sp, 29*SZXREG
> -	xvld	xr30, sp, 30*SZXREG
> -	xvld	xr31, sp, 31*SZXREG
> -	ADDI	sp, sp, FRAME_SIZE_LASX
> -	b .Lfcsr
> -
> -.Llsx1:
> -	andi	t1, t0, HWCAP_LOONGARCH_LSX
> -	beqz	t1, .Lfloat1
> -
> -	/* Restore 128-bit vector registers.  */
> -	vld	vr0, sp, 0*SZVREG
> -	vld	vr1, sp, 1*SZVREG
> -	vld	vr2, sp, 2*SZVREG
> -	vld	vr3, sp, 3*SZVREG
> -	vld	vr4, sp, 4*SZVREG
> -	vld	vr5, sp, 5*SZVREG
> -	vld	vr6, sp, 6*SZVREG
> -	vld	vr7, sp, 7*SZVREG
> -	vld	vr8, sp, 8*SZVREG
> -	vld	vr9, sp, 9*SZVREG
> -	vld	vr10, sp, 10*SZVREG
> -	vld	vr11, sp, 11*SZVREG
> -	vld	vr12, sp, 12*SZVREG
> -	vld	vr13, sp, 13*SZVREG
> -	vld	vr14, sp, 14*SZVREG
> -	vld	vr15, sp, 15*SZVREG
> -	vld	vr16, sp, 16*SZVREG
> -	vld	vr17, sp, 17*SZVREG
> -	vld	vr18, sp, 18*SZVREG
> -	vld	vr19, sp, 19*SZVREG
> -	vld	vr20, sp, 20*SZVREG
> -	vld	vr21, sp, 21*SZVREG
> -	vld	vr22, sp, 22*SZVREG
> -	vld	vr23, sp, 23*SZVREG
> -	vld	vr24, sp, 24*SZVREG
> -	vld	vr25, sp, 25*SZVREG
> -	vld	vr26, sp, 26*SZVREG
> -	vld	vr27, sp, 27*SZVREG
> -	vld	vr28, sp, 28*SZVREG
> -	vld	vr29, sp, 29*SZVREG
> -	vld	vr30, sp, 30*SZVREG
> -	vld	vr31, sp, 31*SZVREG
> -	ADDI	sp, sp, FRAME_SIZE_LSX
> -	b	    .Lfcsr
> -
> -.Lfloat1:
> -	/* Restore float registers.  */
> -	FREG_L	fa0, sp, 0*SZFREG
> -	FREG_L	fa1, sp, 1*SZFREG
> -	FREG_L	fa2, sp, 2*SZFREG
> -	FREG_L	fa3, sp, 3*SZFREG
> -	FREG_L	fa4, sp, 4*SZFREG
> -	FREG_L	fa5, sp, 5*SZFREG
> -	FREG_L	fa6, sp, 6*SZFREG
> -	FREG_L	fa7, sp, 7*SZFREG
> -	FREG_L	ft0, sp, 8*SZFREG
> -	FREG_L	ft1, sp, 9*SZFREG
> -	FREG_L	ft2, sp, 10*SZFREG
> -	FREG_L	ft3, sp, 11*SZFREG
> -	FREG_L	ft4, sp, 12*SZFREG
> -	FREG_L	ft5, sp, 13*SZFREG
> -	FREG_L	ft6, sp, 14*SZFREG
> -	FREG_L	ft7, sp, 15*SZFREG
> -	FREG_L	ft8, sp, 16*SZFREG
> -	FREG_L	ft9, sp, 17*SZFREG
> -	FREG_L	ft10, sp, 18*SZFREG
> -	FREG_L	ft11, sp, 19*SZFREG
> -	FREG_L	ft12, sp, 20*SZFREG
> -	FREG_L	ft13, sp, 21*SZFREG
> -	FREG_L	ft14, sp, 22*SZFREG
> -	FREG_L	ft15, sp, 23*SZFREG
> -	ADDI	sp, sp, FRAME_SIZE_FLOAT
> -
> -.Lfcsr:
> -	/* Restore fcsr0 register.  */
> -	ld.w	t0, sp, FRAME_SIZE + 24
> -	movgr2fcsr  fcsr0, t0
> +#define USE_LASX
> +#define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_lasx
> +#define Lret Lret_lasx
> +#define Lslow Lslow_lasx
> +#include "dl-tlsdesc-dynamic.h"
> +#undef FRAME_SIZE
> +#undef USE_LASX
> +#undef _dl_tlsdesc_dynamic
> +#undef Lret
> +#undef Lslow
> +
> +#define USE_LSX
> +#define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_lsx
> +#define Lret Lret_lsx
> +#define Lslow Lslow_lsx
> +#include "dl-tlsdesc-dynamic.h"
> +#undef FRAME_SIZE
> +#undef USE_LSX
> +#undef _dl_tlsdesc_dynamic
> +#undef Lret
> +#undef Lslow
>   
>   #endif /* #ifndef __loongarch_soft_float */
>   
> -	REG_L	ra, sp, 0 * SZREG
> -	REG_L	a1, sp, 1 * SZREG
> -	REG_L	a2, sp, 2 * SZREG
> -	REG_L	a3, sp, 3 * SZREG
> -	REG_L	a4, sp, 4 * SZREG
> -	REG_L	a5, sp, 5 * SZREG
> -	REG_L	a6, sp, 6 * SZREG
> -	REG_L	a7, sp, 7 * SZREG
> -	REG_L	t3, sp, 8 * SZREG
> -	REG_L	t4, sp, 9 * SZREG
> -	REG_L	t5, sp, 10 * SZREG
> -	REG_L	t6, sp, 11 * SZREG
> -	REG_L	t7, sp, 12 * SZREG
> -	REG_L	t8, sp, 13 * SZREG
> -	ADDI	sp, sp, FRAME_SIZE
> -
> -	b	.Lret
> -	cfi_endproc
> -	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> -	.hidden HIDDEN_JUMPTARGET(__tls_get_addr)
> +#include "dl-tlsdesc-dynamic.h"
>   
>   #endif /* #ifdef SHARED */
> diff --git a/sysdeps/loongarch/dl-tlsdesc.h b/sysdeps/loongarch/dl-tlsdesc.h
> index ff8c69cb93..45c43a5b52 100644
> --- a/sysdeps/loongarch/dl-tlsdesc.h
> +++ b/sysdeps/loongarch/dl-tlsdesc.h
> @@ -43,6 +43,10 @@ extern ptrdiff_t attribute_hidden _dl_tlsdesc_undefweak (struct tlsdesc *);
>   
>   #ifdef SHARED
>   extern void *_dl_make_tlsdesc_dynamic (struct link_map *, size_t);
> +#ifndef __loongarch_soft_float
> +extern ptrdiff_t attribute_hidden _dl_tlsdesc_dynamic_lasx (struct tlsdesc *);
> +extern ptrdiff_t attribute_hidden _dl_tlsdesc_dynamic_lsx (struct tlsdesc *);
> +#endif
>   extern ptrdiff_t attribute_hidden _dl_tlsdesc_dynamic (struct tlsdesc *);
>   #endif
>   


  reply	other threads:[~2024-07-01  9:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26  6:34 mengqinggang
2024-07-01  9:27 ` mengqinggang [this message]
2024-07-02 10:44   ` Jinyang He
2024-07-02 11:48     ` mengqinggang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f357d86b-0747-d06e-aff8-271fe893b8ca@loongson.cn \
    --to=mengqinggang@loongson.cn \
    --cc=adhemerval.zanella@linaro.org \
    --cc=cailulu@loongson.cn \
    --cc=caiyinyu@loongson.cn \
    --cc=chenglulu@loongson.cn \
    --cc=hejinyang@loongson.cn \
    --cc=i.swmail@xen0n.name \
    --cc=libc-alpha@sourceware.org \
    --cc=luweining@loongson.cn \
    --cc=maskray@google.com \
    --cc=wanglei@loongson.cn \
    --cc=xry111@xry111.site \
    --cc=xuchenghua@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).