From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by sourceware.org (Postfix) with ESMTPS id A0B113857C68 for ; Thu, 19 Nov 2020 21:19:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A0B113857C68 Received: by mail-lj1-x22d.google.com with SMTP id s9so7777508ljo.11 for ; Thu, 19 Nov 2020 13:19:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=J2ebOPK/pjLdgiTg0vxZYJ8uGSx8XkWHUakpbOJUXQ8=; b=cYUuFdUIHjdb1mZ1NWtSpGGuJVWd7WQADyUBIfvwNTfTxC7uJBsQXkckqkOUbTOlZe 3PG0gxOXZm1+6Vc7RjN7I7WMYwJhYSqk0ziKyheHE49ynft8JEcbQj0Nlpi2HWuhd5jx UtpKS7ckmrh7cj6bbOTzxGiNs5CDSEogL0KCHQ3pYdP6ElcDlMNNYXFZH6WvhU9DL01Y KcOI9pj5KTx56CfSumufJaFOzpiW8burz0Qb3Sm7xLsKp60MScKeTzXU0urZZI6tp3sb q3V/UkRqHz/eF8reXPL+zlAXTS9EvF8wusxhgrQeFP6ovbySeyh55sfZAPrQWyPmLtop lK0g== X-Gm-Message-State: AOAM530NttbFiFYsq9aB9wYj7VU7U7slQKGwIhlF2dQvjQgMbzZ7CmGV lcjpeEw0DYrwzLqwr2oWuDY= X-Google-Smtp-Source: ABdhPJwGlX/61l8fIg1HOJOuhSfKID2KiXZQxMPa9BYz6kv3b9LTAwIDy1ucFrVG3+LcZLqSQ9Bq8w== X-Received: by 2002:a2e:7a0d:: with SMTP id v13mr7001086ljc.348.1605820755479; Thu, 19 Nov 2020 13:19:15 -0800 (PST) Received: from [192.168.1.62] (89-178-168-199.broadband.corbina.ru. [89.178.168.199]) by smtp.gmail.com with ESMTPSA id w28sm94395lfk.202.2020.11.19.13.19.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Nov 2020 13:19:14 -0800 (PST) Subject: Re: Kernel prctl feature for syscall interception and emulation From: Paul Gofman To: David Laight , 'Rich Felker' , Gabriel Krisman Bertazi Cc: "libc-alpha@sourceware.org" , Florian Weimer , "linux-kernel@vger.kernel.org" References: <873616v6g9.fsf@collabora.com> <20201119151317.GF534@brightrain.aerifal.cx> <87h7pltj9p.fsf@collabora.com> <20201119162801.GH534@brightrain.aerifal.cx> <87eekpmeux.fsf@collabora.com> <20201119173938.GJ534@brightrain.aerifal.cx> Message-ID: Date: Fri, 20 Nov 2020 00:19:13 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Nov 2020 21:19:18 -0000 On 11/19/20 23:54, Paul Gofman wrote: > On 11/19/20 20:57, David Laight wrote: >>>> The Windows code is not completely loaded at initialization time. It >>>> also has dynamic libraries loaded later. yes, wine knows the memory >>>> regions, but there is no guarantee there is a small number of segments >>>> or that the full picture is known at any given moment. >>> Yes, I didn't mean it was known statically at init time (although >>> maybe it can be; see below) just that all the code doing the loading >>> is under Wine's control (vs having system dynamic linker doing stuff >>> it can't reliably see, which is the case with host libraries). >> Since wine must itself make the mmap() system calls that make memory >> executable can't it arrange for windows code and linux code to be >> above/below some critical address? >> >> IIRC 32bit windows has the user/kernel split at 2G, so all the >> linux code could be shoe-horned into the top 1GB. >> >> A similar boundary could be picked for 64bit code. >> >> This would probably require flags to mmap() to map above/below >> the specified address (is there a flag for the 2G boundary >> these days - wine used to do very horrid things). >> It might also need a special elf interpreter to load the >> wine code itself high. >> > Wine does not control the loading of native libraries (which are subject > to ASLR and thus do not necessarily exactly follow mmap's top down > order). Wine is also not free to choose where to load the Windows > libraries. Some of Win libraries are relocatable, some are not. Even > those relocatable are still often assumed to be loaded at the base > address specified in PE, with assumption made either by library itself > or DRM or sandboxing / hotpatching / interception code from around. > > Also, it is very common to DRMs to unpack the encrypted code to a newly > allocated segment (which gives no clue at the moment of allocation > whether it is going to be executable later), and then make it > executable. There are a lot of tricks about that and such code sometimes > assumes very specific (and Windows implementation dependent) things, in > particular, about the memory layout. Windows VirtualAlloc[Ex] gives the > way to request top down or bottom up allocation order, as well as > specific allocation address. The latter is not guaranteed to succeed of > course just like on Linux for obvious reasons, but if specific (high) > address rangesĀ  always have some space available on Windows, then there > are the apps in the wild which depend of that, as far as our practice goes. > > If we were given mmap flag for specifying memory allocation boundary, > and also a sort of process-wide dlopen() config option for specifying > that boundary for every host shared library load, the address space > separation could probably work... until we hit a tricky case when the > app wants to get a memory specifically high address range. I think we > can't do that cleanly as both Windows and Linux currently have the same > 128TB limit for user address space on x64 and we've got no spare space > to safely put native code without potential interference with Windows code. > Maybe it is also interesting to mention that the initial Gabriel's patches version was introducing the emulation trigger by specifying a flag for memory region through mprotect(), so we could mark the regions calls from which should be trapped. That would be probably the easiest possible solution in terms of using that in Wine (as no memory allocated by Wine itself is supposed to contain native host syscalls) but that idea was not accepted. Mainly because, as I understand, such a functionality does not belong to VM management.