From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <woodard@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [63.128.21.124])
 by sourceware.org (Postfix) with ESMTP id AE0EC386F433
 for <libc-alpha@sourceware.org>; Wed, 23 Sep 2020 15:10:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AE0EC386F433
Received: from mail-ot1-f72.google.com (mail-ot1-f72.google.com
 [209.85.210.72]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-204-b9QTirk0NYKqxRN5O6xPIw-1; Wed, 23 Sep 2020 11:10:19 -0400
X-MC-Unique: b9QTirk0NYKqxRN5O6xPIw-1
Received: by mail-ot1-f72.google.com with SMTP id s13so4845790otq.15
 for <libc-alpha@sourceware.org>; Wed, 23 Sep 2020 08:10:19 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:subject:to:cc:message-id:in-reply-to
 :references:mime-version;
 bh=uOdiBmsFR/gC6L1w5zp13/z7/wHnJS68YjELp1LQTeg=;
 b=QlVVuo4CCmY3oeVasjhqsQJRLQ7ePM7cRkQcN3/VSyU7X0/1yRZwYeHzft4EJ07f7c
 yvUPAiF7zza9eIZUFluNHPyJXFvD3xwCckmVelZwK6Nd9iUSnK8DvtJFfzJEXF42kUCe
 jqljC33tN6X/Pt9I72Ymg0p9GQducRNojDMBRoB78HZ4PfB6V0pO7sDCT7TznnkGs2ho
 cwJoOwHxp315RhpS7kOv91Od32A9wj6IERLiF9+MYOZtnFmfehcTiqNbOpbCxgKBgs6L
 vb9aMSbpks0lJn1U3/POLxiHEpyctNX5b9jWr37/9aGtfAnA4eUAdGgT6jb50LzPf62F
 Ivyw==
X-Gm-Message-State: AOAM530gMJxJzu6CgRBS23k/FO7QgSuJwrJzRGVMxSl/o/bCRqod2tXD
 yMdO6IAh7ZWiG0juvPojE22xIv8IaoUv2jkki9I8uhynYBnqINgFDhfMPGdQumAqqoDzL7XwpJW
 lHUOFr9a1Bz6B9ALzBU8V
X-Received: by 2002:a05:6830:1e90:: with SMTP id
 n16mr93756otr.304.1600873818378; 
 Wed, 23 Sep 2020 08:10:18 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJwCDiAbHhT7hbKdqkvN+wuho77B5dcf4PJMoU9M8N7Qb/LqrNrrtgjtFup99SW0MY3Ce3PpOQ==
X-Received: by 2002:a05:6830:1e90:: with SMTP id
 n16mr93717otr.304.1600873817806; 
 Wed, 23 Sep 2020 08:10:17 -0700 (PDT)
Received: from [192.168.1.234]
 (47-208-193-143.trckcmtc01.res.dyn.suddenlink.net. [47.208.193.143])
 by smtp.gmail.com with ESMTPSA id j24sm62027otn.64.2020.09.23.08.10.16
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 23 Sep 2020 08:10:17 -0700 (PDT)
Date: Wed, 23 Sep 2020 08:10:09 -0700
From: Ben Coyote Woodard <woodard@redhat.com>
Subject: Re: [PATCH] Fix runtime linker auditing on aarch64
To: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: libc-alpha@sourceware.org
Message-Id: <X4A4HQ.1CQGG0D4I4DC3@redhat.com>
In-Reply-To: <20200923123426.GD16385@arm.com>
References: <20200923011613.2243151-1-woodard@redhat.com>
 <20200923123426.GD16385@arm.com>
X-Mailer: geary/3.36.3.1
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 HTML_MESSAGE, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,
 RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
Content-Type: text/plain; charset=us-ascii; format=flowed
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Sep 2020 15:10:44 -0000


On Wed, Sep 23, 2020 at 13:34, Szabolcs Nagy <szabolcs.nagy@arm.com> 
wrote:
> The 09/22/2020 18:16, Ben Woodard via Libc-alpha wrote:
>>  The dynamic linker's auditing was not working on aarch64. See 
>> PR#26643
>>  <https://sourceware.org/bugzilla/show_bug.cgi?id=26643>
>> 
>>  There were two distinct problems:
>>    * _dl_runtime_resolve was not preserving x8 the indirect result 
>> location
>>      register.
>>    * The NEON Q registers pushed onto the stack by 
>> _dl_runtime_resolve
>>      were twice the size of D registers extracted from the stack 
>> frame by
>>      _dl_runtime_profile.
>> 
>>  To fix this
>>    * The La_aarch64_regs structure was expanded to include x8 and 
>> the full
>>      sized NEON V registers that are required to be preserved by the 
>> ABI.
>>    * _dl_runtime_profile needed to extract registers saved by
>>      _dl_runtime_resolve and put them into the new correctly sized
>>      La_aarch64_regs structure.
>>    * The return value structure La_aarch64_retval also didn't have 
>> the correctly
>>      sized NEON V registers.
>> 
>>  As a couple of additional cleanups
>>    * The names of the NEON registers saved within the 
>> La_aarch64_regs and the
>>      La_aarch_retval structures referred to the old D registers 
>> which were
>>      doubles. Now the registers are quads and are called V for 
>> vector registers.
>>      So the name of the field in the structure and the names of the 
>> offsets
>>      within that structure were named to use the more modern names.
>>    * The ABI specification says that r0-r7 + r8 the indirect result 
>> location
>>      register as well as the NEON v0-v7 registers can be used to 
>> return values
>>      from a function. Therefore, I addded those to the 
>> La_aarch64_retval
>>      structure so that it also correctly matches the ABI.
>> 
>>  An additional problem not addressed by this patch is what to do 
>> about the
>>  changes to the aarch64 ABI needed to support SVE. A discussion 
>> about what to
>>  do about that was begun on libc-alpha here:
>>  
>> <https://sourceware.org/pipermail/libc-alpha/2020-September/117797.html>
> 
> SVE calls are special (marked as STO_AARCH64_VARIANT_PCS
> in the dynamic symbol table) and currently load time
> bound (like BIND_NOW) so i think they don't go through
> the PLT0 sequence that calls the profile entry in ld.so
> and thus audit hooks are not called for them.
> 
> this is probably not what LD_AUDIT users would want
> (do they care about hooking into sve calls?), but
> VARIANT_PCS essentially allows any call convention,
> so all registers have to be saved and restored if
> such call enters the dynamic linker which is a problem
> if register state may be extended in the future
> (although probably ldaudit is special enough that its
> users can update glibc if they care about new regs?).
> 
> (one way to expose variant pcs calls to audit hooks
> is to detect the symbol type in the asm entry point
> and then call a different hook, but this sounds
> sufficiently ugly that i think we would then prefer
> to update the elf abi with a second plt entry sequence
> for variant pcs calls that can just use a different
> entry into ld.so and new linkers would generate that
> for dsos with variant pcs symbols.)

I'll deal with the SVE stuff in a different email.

> 
>>  ---
>>   sysdeps/aarch64/bits/link.h     | 17 ++++----
>>   sysdeps/aarch64/dl-link.sym     |  4 +-
>>   sysdeps/aarch64/dl-trampoline.S | 75 
>> +++++++++++++++++++++------------
>>   3 files changed, 59 insertions(+), 37 deletions(-)
>> 
>>  diff --git a/sysdeps/aarch64/bits/link.h 
>> b/sysdeps/aarch64/bits/link.h
>>  index 0c54e6ea7b..2b43ace57c 100644
>>  --- a/sysdeps/aarch64/bits/link.h
>>  +++ b/sysdeps/aarch64/bits/link.h
>>  @@ -23,19 +23,20 @@
>>   /* Registers for entry into PLT on AArch64.  */
>>   typedef struct La_aarch64_regs
>>   {
>>  -  uint64_t lr_xreg[8];
>>  -  uint64_t lr_dreg[8];
>>  -  uint64_t lr_sp;
>>  -  uint64_t lr_lr;
>>  +  uint64_t    lr_xreg[9];
>>  +  __uint128_t lr_vreg[8];
>>  +  uint64_t    lr_sp;
>>  +  uint64_t    lr_lr;
>>   } La_aarch64_regs;
> 
> ok.
> 
> changing abi is fine with me: old abi was
> unusably broken, if audit modules use some
> versioning that's even better.
> 
>> 
>>   /* Return values for calls from PLT on AArch64.  */
>>   typedef struct La_aarch64_retval
>>   {
>>  -  /* Up to two integer registers can be used for a return value.  
>> */
>>  -  uint64_t lrv_xreg[2];
>>  -  /* Up to four D registers can be used for a return value.  */
>>  -  uint64_t lrv_dreg[4];
>>  +  /* Up to eight integer registers and the indirect result 
>> location register
>>  +     can be used for a return value.  */
>>  +  uint64_t    lrv_xreg[9];
> 
> x8 is not preserved so recording it at function exit
> is not useful. (on entry it points to where results
> are stored but on exit it can be clobbered)

OK that was not clear to me reading the AAPCS. Do you want to ping 
you're colleagues the tech writers over at arm and see if they can 
tighten up the language a bit.

I dropped it back down to 8.

> 
>>  +  /* Up to eight V registers can be used for a return value.  */
>>  +  __uint128_t lrv_vreg[8];
>> 
>>   } La_aarch64_retval;
>>   __BEGIN_DECLS
> 
> note: i don't like to use non-standard types in
> public apis (like __uint128_t), but we already
> made this mistake in the linux sigcontext, so this
> is probably ok.
> 
> (my preference normally is to use a standard type
> e.g. long double or char[] with alignment attr,
> but in practice __uint128_t is probably easier to
> deal with)
> 

I kind of prefer "long double" here as well. It is after all what it 
likely is. I'm not really attached to __uint128_t; the previous version 
of the interface had uint64_t and so when making the registers the 
correct size I changed it to uint128_t but that didn't compile and then 
when I grepped the source I found __uint128_t. It wasn't like I put a 
lot of thought into that decision.

> 
>>  diff --git a/sysdeps/aarch64/dl-link.sym 
>> b/sysdeps/aarch64/dl-link.sym
>>  index d67d28b40c..70d153a1d5 100644
>>  --- a/sysdeps/aarch64/dl-link.sym
>>  +++ b/sysdeps/aarch64/dl-link.sym
>>  @@ -7,9 +7,9 @@ DL_SIZEOF_RG		sizeof(struct La_aarch64_regs)
>>   DL_SIZEOF_RV		sizeof(struct La_aarch64_retval)
>> 
>>   DL_OFFSET_RG_X0		offsetof(struct La_aarch64_regs, lr_xreg)
>>  -DL_OFFSET_RG_D0		offsetof(struct La_aarch64_regs, lr_dreg)
>>  +DL_OFFSET_RG_V0		offsetof(struct La_aarch64_regs, lr_vreg)
>>   DL_OFFSET_RG_SP		offsetof(struct La_aarch64_regs, lr_sp)
>>   DL_OFFSET_RG_LR		offsetof(struct La_aarch64_regs, lr_lr)
>> 
>>   DL_OFFSET_RV_X0		offsetof(struct La_aarch64_retval, lrv_xreg)
>>  -DL_OFFSET_RV_D0		offsetof(struct La_aarch64_retval, lrv_dreg)
>>  +DL_OFFSET_RV_V0		offsetof(struct La_aarch64_retval, lrv_vreg)
> 
> ok.
> 
>>  diff --git a/sysdeps/aarch64/dl-trampoline.S 
>> b/sysdeps/aarch64/dl-trampoline.S
>>  index 794876fffa..c91341e8fc 100644
>>  --- a/sysdeps/aarch64/dl-trampoline.S
>>  +++ b/sysdeps/aarch64/dl-trampoline.S
>>  @@ -46,6 +46,8 @@ _dl_runtime_resolve:
>>   	cfi_rel_offset (lr, 8)
>> 
>>   	/* Save arguments.  */
>>  +	/* Note: Saving x9 is not required by the ABI but the assember 
>> requires
>>  +	   the immediate values of operand 3 to be a multiple of 16 */
>>   	stp	x8, x9, [sp, #-(80+8*16)]!
>>   	cfi_adjust_cfa_offset (80+8*16)
>>   	cfi_rel_offset (x8, 0)
> 
> ok.
> 
>>  @@ -183,19 +185,23 @@ _dl_runtime_profile:
> 
> there is a comment at the entry point with offsets
> which i think is no longer valid (i think it's
> ok to remove the offsets just document the order
> of things on the stack)

oops missed that part. Fixed.
In an earlier version of the patch forgot to update the retval 
structure and I remember looking at that comment and thinking, "I 
didn't change retval and so everything should be the same" but then I 
changed the retval structure.

> 
>>   	stp	x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>>   	cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0)
>>   	cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8)
>>  -
>>  -	stp	d0, d1, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>>  -	cfi_rel_offset (d0, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0)
>>  -	cfi_rel_offset (d1, OFFSET_RG + DL_OFFSET_RG_D0 + 16*0 + 8)
>>  -	stp	d2, d3, [X29, #OFFSET_RG+ DL_OFFSET_RG_D0 + 16*1]
>>  -	cfi_rel_offset (d2, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 0)
>>  -	cfi_rel_offset (d3, OFFSET_RG + DL_OFFSET_RG_D0 + 16*1 + 8)
>>  -	stp	d4, d5, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>>  -	cfi_rel_offset (d4, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 0)
>>  -	cfi_rel_offset (d5, OFFSET_RG + DL_OFFSET_RG_D0 + 16*2 + 8)
>>  -	stp	d6, d7, [X29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>>  -	cfi_rel_offset (d6, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 0)
>>  -	cfi_rel_offset (d7, OFFSET_RG + DL_OFFSET_RG_D0 + 16*3 + 8)
>>  +	str	x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0]
>>  +	cfi_rel_offset (x8, OFFSET_RG + DL_OFFSET_RG_X0 + 16*4 + 0)
>>  +	/* Note X9 is in the stack frame for alignment but it is not
>>  +	   required to be saved by the ABI */
>>  +
> 
> i dont see x9 here. you can just note that there is padding.

fixed
> 
>>  +	stp	q0, q1, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
>>  +	cfi_rel_offset (q0, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0)
>>  +	cfi_rel_offset (q1, OFFSET_RG + DL_OFFSET_RG_V0 + 32*0 + 16)
>>  +	stp	q2, q3, [X29, #OFFSET_RG+ DL_OFFSET_RG_V0 + 32*1]
>>  +	cfi_rel_offset (q2, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 0)
>>  +	cfi_rel_offset (q3, OFFSET_RG + DL_OFFSET_RG_V0 + 32*1 + 16)
>>  +	stp	q4, q5, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>>  +	cfi_rel_offset (q4, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 0)
>>  +	cfi_rel_offset (q5, OFFSET_RG + DL_OFFSET_RG_V0 + 32*2 + 16)
>>  +	stp	q6, q7, [X29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>>  +	cfi_rel_offset (q6, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 0)
>>  +	cfi_rel_offset (q7, OFFSET_RG + DL_OFFSET_RG_V0 + 32*3 + 16)
>> 
>>   	add     x0, x29, #SF_SIZE + 16
>>   	ldr	x1, [x29, #OFFSET_LR]
>>  @@ -234,10 +240,11 @@ _dl_runtime_profile:
>>   	ldp	x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
>>   	ldp	x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
>>   	ldp	x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>>  -	ldp	d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>>  -	ldp	d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
>>  -	ldp	d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>>  -	ldp	d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>>  +	ldr	x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>>  +	ldp	q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 16*0]
>>  +	ldp	q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
>>  +	ldp	q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>>  +	ldp	q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>> 
>>   	cfi_def_cfa_register (sp)
>>   	ldp	x29, x30, [x29, #0]
>>  @@ -280,14 +287,21 @@ _dl_runtime_profile:
>>   	ldp	x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
>>   	ldp	x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
>>   	ldp	x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
>>  -	ldp	d0, d1, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*0]
>>  -	ldp	d2, d3, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*1]
>>  -	ldp	d4, d5, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*2]
>>  -	ldp	d6, d7, [x29, #OFFSET_RG + DL_OFFSET_RG_D0 + 16*3]
>>  +	ldr	x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
>>  +	ldp	q0, q1, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*0]
>>  +	ldp	q2, q3, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*1]
>>  +	ldp	q4, q5, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*2]
>>  +	ldp	q6, q7, [x29, #OFFSET_RG + DL_OFFSET_RG_V0 + 32*3]
>>   	blr	ip0
> 
> ok.
> 
>>  -	stp	x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
>>  -	stp	d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
>>  -	stp	d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
>>  +	stp	x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
>>  +	stp	x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
>>  +	stp	x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
>>  +	stp	x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
>>  +	str	x8, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*4]
> 
> i think storing x8 is not useful.

fixed

> 
>>  +	stp	q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
>>  +	stp	q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
>>  +	stp	q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
>>  +	stp	q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
>> 
>>   	/* Setup call to pltexit  */
>>   	ldp	x0, x1, [x29, #OFFSET_SAVED_CALL_X0]
>>  @@ -295,9 +309,16 @@ _dl_runtime_profile:
>>   	add	x3, x29, #OFFSET_RV
>>   	bl	_dl_call_pltexit
>> 
>>  -	ldp	x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0]
>>  -	ldp	d0, d1, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*0]
>>  -	ldp	d2, d3, [x29, #OFFSET_RV + DL_OFFSET_RV_D0 + 16*1]
>>  +	ldp	x0, x1, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*0]
>>  +	ldp	x2, x3, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*1]
>>  +	ldp	x4, x5, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*2]
>>  +	ldp	x6, x7, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*3]
>>  +	ldr	x8, [x29, #OFFSET_RV + DL_OFFSET_RV_X0 + 16*4]
>>  +	ldp	q0, q1, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*0]
>>  +	ldp	q2, q3, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*1]
>>  +	ldp	q4, q5, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*2]
>>  +	ldp	q6, q7, [x29, #OFFSET_RV + DL_OFFSET_RV_V0 + 32*3]
>>  +
>>   	/* LR from within La_aarch64_reg */
>>   	ldr	lr, [x29, #OFFSET_RG + DL_OFFSET_RG_LR]
>>   	cfi_restore(lr)
> 
> thanks for the patch.

You're welcome. Standby for V2 of the patch.
>