Working on the builtins, I realized that I mixed up (again) bits and byes. While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a size of 128 bytes. Thus, there is sufficient space for 16 pointer-size/uin64_t values but I only need 6. This patch now makes use of the available space, avoiding one device-to-host memory copy; additionally, it avoids a 32bit vs 64bit alignment issue which I somehow missed :-( Tested with libgomp on gfx908 offloading and getting only the known fails: (libgomp.c-c++-common/teams-2.c, libgomp.fortran/async_io_*.f90, libgomp.oacc-c-c++-common/{deep-copy-10.c,static-variable-1.c,vprop.c}) OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955