From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com [IPv6:2607:f8b0:4864:20::32c]) by sourceware.org (Postfix) with ESMTPS id 8A0B43858D20 for ; Mon, 8 Apr 2024 09:19:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8A0B43858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8A0B43858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::32c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712567956; cv=none; b=bQcrrlIFnP0DqyrDPeiJ4u+2EDLfd1ztQwwzVQ9PBcPVMkWTWM8WZvVR5YQP+MH9AabhFpxSo7fy0RXf40YFgw7n5j4XrpnvXs3JH3+pFs4d1t/vtwvwWSmy+My5BmRUjJHtV071uc9uHqgkgXVv2urEPDxGCu4FKdQo0GJEix8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712567956; c=relaxed/simple; bh=qJOjUlZbdY7D4IsASbdpxT5qU/+KRrWnqxB/b/cqCBM=; h=DKIM-Signature:Date:Message-ID:From:To:Subject; b=buJYjRkSqraU83uQdNNxYcIgRwm/6EX8uEw943r7TF7eLsgPkdFgs+G2vJJcQnXQaqElpFEYHTNg79EoyvCLOFVfWH6nJ14n0QgatAyyl9OfXAHmy1OVpSfC7fpvXglpG4rifxMfxJ8zJr/7sCom1RXdRrxleSWmGvvaZRzcCHs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ot1-x32c.google.com with SMTP id 46e09a7af769-6ea1ca3d894so185460a34.3 for ; Mon, 08 Apr 2024 02:19:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712567953; x=1713172753; darn=gcc.gnu.org; h=subject:to:from:message-id:date:from:to:cc:subject:date:message-id :reply-to; bh=PPKWvgsSLf+UBYfDNYAFe0y1JRc6/17i6X0qVWnx/ic=; b=VK6KsZOAwMgD170wPXQJXP+QbQB9DlMZJKRVOZUpLimmg75Sc2qZkb8nd+5ebGBPfn 3SEkW8Rv3K5NYMzviEmxYO7Nq3kf3UbsDmpCBWzM7iwqRTzcwuxv9kna8nGTgUN9mOx8 ClRy8DxySZL8GQZy1ZieW1gOaPl3yd4kboTodmq+gcX/SE//10uBI4FNfWSDGyFwfeCw 3ETbvv/eu9QayB9ab4DWMFdQ5yrRzEVMWblulVWPYR/ts7Fo/XyWo7kGAdSlOcI/q2Np CospdsbXxguqBW3dv0MiODhDTnjWZKMT40KWpuLTte2keoHSNjEJScVf9J2oNvt12fUj Pg7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712567953; x=1713172753; h=subject:to:from:message-id:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PPKWvgsSLf+UBYfDNYAFe0y1JRc6/17i6X0qVWnx/ic=; b=lCBimBhxxk01v5iFTRfINq8YalhBcj6dK3oNXVnXR6qD7i6cxTu3pPO9O2dsl1kb+Q OtlfgvS8TdEZgOuSMbP1+dT6tlnKeaYHq1r2HYh/afZvenebgSXfWgvaZZHQyHHi3UDw fD4WLvI0YVTCmkTIp2v2eSlzTwQkqiU+3qzp27Uoocqf4PtVkmC1yvLoqrQPa8N8BZ8A r5LVJslZEQR9G5eKk5S5g0aI9Xb7nuPi/u5Sh4YArh7pr9jMIA+PBp3iP0AM/GSPCxcC PYzoFUzAurfOnc+H5TMQMp18CBYTFWIYzQl9p6NGZtJPBp4f0zNYa+CGpqWNmxNYqDqa 1m+A== X-Gm-Message-State: AOJu0Yyhetffcz7MddHt95NvrApn15sYic2QBwQyoloeEX4VBcNWhaPB OEkSPLtxh/KxKx80OBNT5xnW1+IX18ThoLbBVfejWZTuDAx4PYuLBN1q8fT0 X-Google-Smtp-Source: AGHT+IE4zdKHX8pIj5HjHzTyhIqwm+hFKJJovvOpNECd9UlOO0o+kJAczw1qvmUsptTvOvndSIO+Kg== X-Received: by 2002:a05:6808:358:b0:3c3:dd91:12cc with SMTP id j24-20020a056808035800b003c3dd9112ccmr7258725oie.2.1712567952040; Mon, 08 Apr 2024 02:19:12 -0700 (PDT) Received: from localhost ([138.117.155.58]) by smtp.gmail.com with ESMTPSA id gx18-20020a056a001e1200b006ea8ba9902asm6034265pfb.28.2024.04.08.02.19.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Apr 2024 02:19:11 -0700 (PDT) Date: Mon, 08 Apr 2024 06:19:14 -0300 Message-ID: <2d2f1e405361d2b36dd513e3fabd1fe0@gmail.com> From: Matheus Afonso Martins Moreira To: gcc@gcc.gnu.org Subject: [RFC] Linux system call builtins X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello! I'm a beginner when it comes to GCC development. I want to learn how it works and start contributing. Decided to start by implementing something relatively simple but which would still be very useful for me: Linux builtins. I sought help in the OFTC IRC channel and it was suggested that I discuss it here first and obtain consensus before spending more time on it since it might not be acceptable. I'd like to add GCC builtins for generating Linux system call code for all architectures supported by Linux. They would look like this: __builtin_linux_system_call(long n, ...) __builtin_linux_system_call_1(long n, long _1) __builtin_linux_system_call_2(long n, long _1, long _2) /* More definitions, all the way up to 6 arguments */ Calling these builtins will make GCC place all the parameters in the correct registers for the system call, emit the appropriate instruction for the target architecture and return the result. In other words, they would implement the calling convention[1] of the Linux system calls. I'm often asked why anyone should care about this system call stuff, and I've been asked why I want this added to GCC in particular. My rationale is as follows: + It's stable This is one of the things which makes Linux unique in the operating system landscape: applications can target the kernel directly. Unlike in virtually every other operating system out there, the Linux kernel to user space binary interface is documented[2] as stable. Breaking it is considered a regression in the kernel. Therefore it makes sense for a compiler to target it. The same is not true for any other operating system. + It's a calling convention GCC already supports many calling conventions via function attributes. On x86 alone[3] there's cdecl, fastcall, thiscall, stdcall, ms_abi, sysv_abi, Win32 specific hot patching hooks. So I believe this would not at all be a strange addition to the compiler. + It's becoming common Despite being specific to the Linux kernel, support for it is showing up in other systems. FreeBSD implements limited support[4] for Linux ABIs. Windows Subsystem for Linux started out[5] similarly, as an implementation of this system call ABI. Apparently it's becoming something of a lingua franca. Maybe one day Linux programs will actually become portable by virtue of this stable binary interface. + It doesn't make sense for libraries to support it There are libraries out there that provide system call functionality. The various libcs do. However they usually don't support the full set of Linux system calls. Using certain system calls could invalidate global state in these libraries which leads to them not being supported. Clone is the quintessential example. So I think libraries are not the proper place for this functionality. + It allows freestanding software to easily target Linux Freestanding code usually refers to bare metal targets but Linux is also a viable target. This will make it much easier for developers to create freestanding nolibc no dependency software targeting Linux without having to write any assembly code at all, making GCC ever more useful. + It centralizes functionality in the compiler Currently every programmer who wants to use these system calls must rely on libraries with incomplete support or recreate the system call machinery via inline assembly. Even the Linux kernel ended up doing it[6]. It would be so much nicer if the compiler simply had support for it. I'm a huge fan of builtins like __builtin_frame_address, they make it very easy to solve difficult problems which would otherwise require tons of target specific assembly code. Getting the compiler to do that for Linux system calls is what this proposal is for. + It allows other languages to easily target Linux GCC is a compiler collection and has support for numerous languages. These builtins should allow all of them to target Linux directly in one fell swoop. + Compilers seem like the proper place for it The compiler knows everything about registers and instructions and calling conventions. It just seems like the right place for it. A just in time compiler could also generate this code instead of calling native functions. I really have no idea why they don't do that. Maybe this will prove that it's viable. Implementation wise, I have managed to define the above builtins in my GCC branch and compile it successfully. I have not yet figured out how or even where to implement the code generation. I was hoping to show up here with patches ready for review but it really is a complex project. That's why I would like to to see what the community thinks before proceeding. A related proposal: hard register operand constraints[7] for inline assembly code. Essentially, allowing the programmer to specify the exact registers that must be used in the inline assembly expression itself. This gets rid of numerous temporary variables whose only purpose is to get GCC to put them in the correct registers, as many as 7 local variables for system calls. I've been told that implementing it would make this proposal redundant. There is no doubt that this would make code much simpler, easier to write and understand. It would be a valuable enhancement to the compiler and I would certainly use it. However, even with better inline assembly, I still believe there's value in a simple system call builtin function. The API is much nicer if nothing else. Thanks for your attention, Matheus [1]: https://www.man7.org/linux/man-pages/man2/syscall.2.html [2]: https://www.kernel.org/doc/html/latest/admin-guide/abi-stable.html#the-kernel-syscall-interface [3]: https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html [4]: https://man.freebsd.org/cgi/man.cgi?linux [5]: https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux [6]: https://lwn.net/Articles/920158/ [7]: https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html