On 12 October 2015 at 15:30, James Greenhalgh wrote: > On Fri, Oct 09, 2015 at 05:16:05PM +0100, Christophe Lyon wrote: >> On 8 October 2015 at 11:12, James Greenhalgh wrote: >> > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote: >> >> On 7 October 2015 at 17:09, James Greenhalgh wrote: >> >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote: >> >> > >> >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction >> >> > directly (as in the inline asm versions you replace)? >> >> > >> >> I just followed the pattern used for vtbx3. >> >> >> >> > This sequence does make sense for vtbx3. >> >> In fact, I don't see why vtbx3 and vtbx4 should be different? >> > >> > The difference between TBL and TBX is in their handling of a request to >> > select an out-of-range value. For TBL this returns zero, for TBX this >> > returns the value which was already in the destination register. >> > >> > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit >> > (so two of them togather allow selecting elements in the range 0-31), and >> > vtbx3 needs to emulate the AArch32 behaviour of picking elements from 3x64-bit >> > vectors (allowing elements in the range 0-23), we need to manually check for >> > values which would have been out-of-range on AArch32, but are not out >> > of range for AArch64 and handle them appropriately. For vtbx4 on the other >> > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give >> > the range 0..31, so we don't need the special masked handling. >> > >> > You can find the suggested instruction sequences for the Neon intrinsics >> > in this document: >> > >> > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf >> > >> >> Hi James, >> >> Please find attached an updated version which hopefully addresses your comments. >> Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation Model. >> >> OK? > > Looks good to me, > > Thanks, > James > I commited this as r228716, and noticed later that gcc.target/aarch64/table-intrinsics.c failed because of this patch. This is because that testcase scans the assembly for 'tbl v' or 'tbx v', but since I replaced some asm statements, the space is now a tab. I plan to commit this (probably obvious?):