Hi and happy new year, I have further investigated this issue. Sorry to be so persistent, but I still think something is not quite right here. To show my point: here is a comparison between x86_64 (I assume this is most tested platform for libffi) and s390x. In short: on x86_64 the closure helper behaves differently processing the return value. Here are some gdb steps in the x86_64 linux implementation: https://gist.github.com/planrich/fd25c31213ba565116a9 In a nutshell: I broke at the assembly position at https://github.com/python/cpython/blob/master/Modules/_ctypes/libffi/src/x86/unix64.S#L269 of my small ctypes sample program. (https://gist.github.com/planrich/3fd72767812754d9104d) As far as I can tell (on x86_64) ffi_closure_unix64_inner is the equivalent to ffi_closure_helper_SYSV on s390x. If the above is correct then: movzx eax,WORD PTR [rsp-0x18] zero extends the 16 bit value to a full 64bit value. That is what my initial patch is all about, s390x does not do this zero/sign extension just after invoking the user closure. > The point is that if the user-callback were to fill in a full ffi_arg, > then ret_buffer would be completely filled. If ret_buffer isn't fully > written, then that's a bug in the callback PyPy provides to libffi. The closure return value (which is written on the stack location of ret_buffer on s390x) is filled in ctypes here: https://github.com/python/cpython/blob/master/Modules/_ctypes/cfield.c#L551 This would mean that ctypes only writes 16 bits into ret_buffer? I have also debugged it, and it does only store 16 bits. If I'm wrong, could someone please point out the issue with my sample program? Cheers, Richard P.S. I have looked at the PPC asm implementation as well, there it is also zero/sign extended to the machine register size.