From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by sourceware.org (Postfix) with ESMTPS id 30D733858D20 for ; Thu, 9 Mar 2023 18:11:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 30D733858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 329HmrEj013820; Thu, 9 Mar 2023 18:11:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=references : from : to : cc : subject : in-reply-to : date : message-id : content-type : mime-version; s=corp-2022-7-12; bh=K8ZG0p/O8i3WHylgtzwPVt1rlqn5XHbM9bf4Vm8r3fY=; b=SEOLOMS8kUPIg5f27uAzclIzba3K3+MYUz3xZ2YC1rjkE94Af9aP+hdlmpXM/s8C1SZg ZmBp9RkNkL1E5d2U2B75EqQdEaSybQUF8tm//x7nCl6W6ZhORudXR21hdJ5AlvTDyoK0 BA7tXkwL4ZJVJpOklJZZaNZz09nPSvVnd7Ky1Ug660GQGGiwpym1Bc5k1YrCI7+XePrb RDgey0+RSDJvfu2XyCxtphH4W+AqMhhTix/JEQcKyn79M/e1fRgWZGTEOqEeoczBUfKe Rv5jzAI4vYpgZ9z58HYVkk7q0faQmk5JMU9iEfrvYWuOWVSgz5I94XAE33TiMCaLVVGJ rQ== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3p417ckcwb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 09 Mar 2023 18:11:46 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 329HZTsd020757; Thu, 9 Mar 2023 18:11:46 GMT Received: from nam10-bn7-obe.outbound.protection.outlook.com (mail-bn7nam10lp2106.outbound.protection.outlook.com [104.47.70.106]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3p6fu9uc4g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 09 Mar 2023 18:11:46 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nB7jxLRWXT5PLmaOcqz2lVlfBDt+gA21r4EPOlbDRVcDcOWx2aVmKJ99M3maIk2KKh5EMemxLITMhbEYu5bRXAPIvXRbYEMQ5xlU5Cxj+mbvgADdYq0AsAH/REyYVW3x/6A82FxlqfhkvAzyHUrtk1upK0fMFK9/WfYOill9LdQFBbxSktGfVJXMC/Tn4c0z4n/i8b01oG1V4YuiMxTgg36mfw6RH+NbmCsZ/9l3Z6ncSTP8xauMbBFheJkJH6xazWwnELiTTSkKFHZUV8MPugXPF8bTAlOBMoyxUfRERWly+Y+fg40wrj8AukFCDAQnW14z61LggpCQrrD3ho1+fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=K8ZG0p/O8i3WHylgtzwPVt1rlqn5XHbM9bf4Vm8r3fY=; b=T8PYnZlLwp/FLT3YV+kF9Awh5SNhI3CYk7bbHoj08d0iq9QGR6iKO/WmySZhpX3AoIVr9pQPMFiJJQTYgf5qc0MhBG8DhHw1p2XL9kn2H0HhE/xZbYE3TnrZUW1NkjTvdX7N3IoNB5rDF59WaYCINzigId0b6oOzedn/WgOMO1XZL0b3QPFNsFX1yM87cqeo4A4WYQhJ7jtqc3QX/VWIZ4Q9rdl/RYWTMql8RIgRMjQRTsXxWB2aWVLO9VpvYyg/i/R5rTjCO1BO0TqsoyfjJhBKqEOlHSQ2ietW5ofl20l6X3nY/R6aKtAs1M31zu9td4p4zXfDB+Upk/uOY8F0Fw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=K8ZG0p/O8i3WHylgtzwPVt1rlqn5XHbM9bf4Vm8r3fY=; b=DMjtzx9t4zJ3E9fA4f3z9iG/Yd9utXyfESf3J9JbUoh9y+Fkzu0hvB3/cv/VvLS2bsBayXHDiKVWBSodANHIZy1DIKTuxr0amtnpcd3ytZ/4FIpdKuy3hr6a6G95OC+TAd45QIAlzAh5/OM39sj+mGXzD4eQHdP545D6bIv9fJc= Received: from BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) by SA2PR10MB4553.namprd10.prod.outlook.com (2603:10b6:806:11a::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.19; Thu, 9 Mar 2023 18:11:43 +0000 Received: from BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728]) by BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728%6]) with mapi id 15.20.6156.023; Thu, 9 Mar 2023 18:11:43 +0000 References: <87pm9j4azf.fsf@oracle.com> <87mt4n49ak.fsf@oracle.com> <06a84799-3a73-2bff-e157-281eed68febf@linaro.org> <87edpy464g.fsf@oracle.com> <8f22594a-145a-a358-7ae0-dbbe16d709e8@linaro.org> User-agent: mu4e 1.4.15; emacs 28.1 From: Cupertino Miranda To: Adhemerval Zanella Netto Cc: libc-alpha@sourceware.org, "Jose E. Marchesi" , Elena Zannoni , Cupertino Miranda Subject: Re: [RFC] Stack allocation, hugepages and RSS implications In-reply-to: <8f22594a-145a-a358-7ae0-dbbe16d709e8@linaro.org> Date: Thu, 09 Mar 2023 18:11:37 +0000 Message-ID: <87y1o53icm.fsf@oracle.com> Content-Type: text/plain X-ClientProxiedBy: LO4P123CA0446.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a9::19) To BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN6PR1001MB2340:EE_|SA2PR10MB4553:EE_ X-MS-Office365-Filtering-Correlation-Id: dafbb595-24c4-47e7-7025-08db20c9bcdd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: w2WdfAxopvwpghIih4aBnc4TLxDqlAcd8ik/bVVkAN8bIVBAaQoaclvL5+SleuMYJXp4Z8cozwWlhrEwzlHQ0/UhoJky5JJrBMVuiL9PA64T/VRgYiv2QQREzIIBQD40VLSTdALwChYDKTqNyDIbgNl8mFO4+xtgkTIKxJ8LLPzMT6X9Eho/ovqyVzxshQs4b7Ne2kxgRTcd8LjyzEjt/IhqmDUtf/CjOT92t5aDkCm9ManI27EWXLCRkf3juROjEoWvnF+OCC3/nAj5/ZkV6BKK4dhBTCRY+BW3pJIQZ7DVH263I7rxxau1kZY0tIuPBWQ1djCf6SjKKAWyPFd/pNk+85eouuB/uPop+MzJFbqCQjzEPeoTavmCGna7FSn/L2UB7s0d7o51mVWSw9lVk7OVVJtVu8e2hzp51q88wX12af7T7YnufxY2JCcp7wvU6aneS29zbr4Z26LshloKkZgLv+fHZtMDeVory2Jkh8dTOne678yGMhJM3Mvz3hI8bG7m2JVPwisIArmuhXl22XXS4wr353aNvRuEeEq6XW5A5ov1Dk9EkMJFLbP5yc2WuNu+PUipxEFVIO5iVsoAcB2zGiu9xEmeUwan73AR4CNKugi2sk9TN+9zBWAyxssp86qsrdbih+CrrZzF3yL4lg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BN6PR1001MB2340.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(346002)(136003)(376002)(39860400002)(366004)(396003)(451199018)(36756003)(6486002)(186003)(2616005)(6666004)(53546011)(83380400001)(6512007)(6506007)(41300700001)(66476007)(66556008)(8936002)(8676002)(4326008)(6916009)(66946007)(2906002)(44832011)(5660300002)(38100700002)(86362001)(316002)(54906003)(478600001)(66899018);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?6u7kSwAJ8ri2foI7lrqJlTpWuAETRR3Tn+KRv1Llc45kDbIHwyAhMV8jqu2e?= =?us-ascii?Q?BIgaZ0euuc1E66K8+F8C/2/oGQWlvoNlrx/00lUht0DVOXPqifhu8ueDMybl?= =?us-ascii?Q?ViCwIU42otOAFDeXjsgwGcgL4o2Y471Ys4PPkjl2fHR6FgIGkeo5qm7hGq1a?= =?us-ascii?Q?JfSrtTjfJ9ZRRPvGmRBjH8mv149IyQNGeEGc1DUfSdrp8EggmYyUoofNLIpK?= =?us-ascii?Q?HIjh4R7YI4dmNUPcwqRFD3SoG5Shke8TuXOrq8UO0fHUDrI7nwMVuAWYGvGS?= =?us-ascii?Q?Mk5+5A4eXm1A6WuloIaczIZxmeDKJIKPxPFutQpABOGTe3iewmJ4SF9OerUZ?= =?us-ascii?Q?sr6muWHDx9SfpoE371RzIeF+S8FnwyY0ctQCSHz9Dqz3igAIYsMrrvb79VPt?= =?us-ascii?Q?5TrnMPHsBDhXFsm4rEAa4KUmkBP4t/tr9NOfy1N0C9cyg8ioYDSTtCzxsYEa?= =?us-ascii?Q?z9m9ZfHWpBC3TYrnz3Mze7r1tyfOyY0BQBqL7OudyGJdZb2PmUJcvKGf198l?= =?us-ascii?Q?GijknkS8y462zqzpP10srdaxiK+tJOxmc/I+N1v6k7deHExTjeGvSnYApVkg?= =?us-ascii?Q?6ibKjf9XpF6+TTBmgRKrMbqLEdvJGMEUwH4+DkkdA8lMJUeaG7IH+kjQheyU?= =?us-ascii?Q?q5t0o1Ew+PSDn7sK8hNwco3gwm94uuyE4QtIuBG6WA9DCR46C/2C4hqCIMjB?= =?us-ascii?Q?noiDgaGpOr1RWyknclPh/pS13J9yBe2QIZF50wXnfxXmM/qzbJk6vi7Iueu6?= =?us-ascii?Q?sXp3Ts4DOb2FsxzGMQNIo8rrsdZjebbVb9NAVTmXbfq9d1zC31Och3LyPfu6?= =?us-ascii?Q?C7Xtsx2XKKgW7mpzfZh/71/sJaqyDRpRDKQ24y8uYxZd5ppxL7XpEVFmf7Y8?= =?us-ascii?Q?ta/Zl4myqqX3quPTeMANCXcJ7rRiAG85degY6ApNE0DTvRO8/3dgd9IjS+nI?= =?us-ascii?Q?7xcuAOT9CE5U2CJo+8AucA1sYENkoM7K2BEbdsQ+Ev/mMN76dGmj648kJAOH?= =?us-ascii?Q?MXN+J1/76On3BYClL4JR53k4jOWsirmvS6yEiyH7lntkHiGy7etrxaPMP1KF?= =?us-ascii?Q?oTE7Qyo71b1IXsE5cLUUKJOPlcKOsiYTJtO59sBTdUpH3+rkX4GANqh3t98G?= =?us-ascii?Q?MTBMeioHI89V0xCBJ0NXzPySJhRod3RX34meWf6gEkLrdp2MAxgXfkJWLxBz?= =?us-ascii?Q?zaBRJ9fK7lk5wWIGsKmYrkVWMgLd1iMhfCVqZPBBGbGcB6t7VsGpotTQG3b5?= =?us-ascii?Q?XOgQD2Pi2HlO2iIw3fU6FOfYotbiIbJ9/o/h1b1RKUcKOgeFcldV2QxyPMk5?= =?us-ascii?Q?Jm56oAhN4xcOOqVqfp/LCey/DI2fTPcoXVYB58lL/LwtNEjIUNi6/zG5uy1L?= =?us-ascii?Q?BCWDwa+sa0jRDAx/cRnuY+8s9tH74wowSbwsaXDLYuV44G7/Aq+QQ/bJGMEo?= =?us-ascii?Q?RZIP9ODlzTXF1LM/Pup0v4wOo0iRDbiT6DNG37KEyLZxADcK3nq/HE+M36gM?= =?us-ascii?Q?IlOyREjgW9golZ0LG6AnsR3pfucO/pWsJn9g8XKmYiLUJU4pmpBBSkoqTlo1?= =?us-ascii?Q?emOjCCPm7XTWqhztDrjdHYYfboTIOzkEcpXUmc4083aAPBnIITib/gH2rZHW?= =?us-ascii?Q?MvT5v/ruua9y7Zk6sT90Wm5iZp7WtevlLfkzHP123bYs?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: vJGhuRsLPWtnHbBypNsFBWGp4a7gx1gUUu7asB9mNMbiaI1CtzDLjyRmL4pyLYFOhupLkENFbYaS4jEswKOk+CA/bxJ1I8QONSc4sLwmFxNwWMpiDfJAJH1cjhwUJJBzwznPYaihYBPmsISrDh4UG/EmYxRrzL7a3+5PRL9ccAkmPZb+tGfjSn2zbFWaRTcxC+3WK9/AtE2M6A+uJCXgkGewvl/b1oSo3R4MKgCQEbZCBUmAOmhl3PeYf18lp+buILKNks+LdL4um5MZN6vf/hX9tNPGYtgeIf1iM4zhTBSp0zrrAcZ117WC6VbfH7Xi6Z0Ld7YSOqJcjyGC7ZrnMTkCCMs+9PaIzTXRVYIU/mZJ+7TWA2kOJaHTRQYMqfnRP9VDZAig439T/cDRq9PtxGPaGh3ve9X/rZEqyE6826OLNUrL61ojM8NsL+vqXCDXjbSNEc8up5lrOaYO4FmG2VS9pRZZGK7NiPgeqToufbnIJ0J3g/QeWTexX6pN9K8y6eLZN3fSe6+RS2umcUHQbicwVR09vQRoyZzGfOvMfPdHVDpfTdPjWvpVO7FuA2T8/OH9Tf2rCb5U0SpMvmKWZM9JBFcZG6knOJW8bdkfoMT3BBRRhYbOet7JPhZArL8c0FKmz+/mvk4RT5qYIKueDQLF8U8rMHUlVpaJfuqRpYxFaADMITmSkOzPxz92FGhZe1hCY+V2iRAuQujz0M4YYaGc0mE2PH+Ua8kbiAEe0jTYhXZC3iTjpMnUUMpwdpXEOJz79rDflv6lUZNfr0JkfaXBJZH+F6LUqETHni9W3sTrHBmKWWZMoGuNYBq02Mor X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: dafbb595-24c4-47e7-7025-08db20c9bcdd X-MS-Exchange-CrossTenant-AuthSource: BN6PR1001MB2340.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Mar 2023 18:11:43.4701 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: TjCRY5yRlwWWNizUN5UoyZN3di1Pin2DGA46A5PhZ76qQuz8GSRz7n9nF3Ho3KYXALrvlNyCn1mLDbOjwxwHU/nfXzZuVngmQfR2yIsQ8/g= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA2PR10MB4553 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-09_10,2023-03-09_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 adultscore=0 phishscore=0 suspectscore=0 malwarescore=0 spamscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303090146 X-Proofpoint-GUID: hK7y_Efip2dNseAZS1z2EHIwNHi5ilHC X-Proofpoint-ORIG-GUID: hK7y_Efip2dNseAZS1z2EHIwNHi5ilHC X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Adhemerval Zanella Netto writes: > On 09/03/23 06:38, Cupertino Miranda wrote: >> >> Adhemerval Zanella Netto writes: >> >>> On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote: >>>> >>>> Hi everyone, >>>> >>>> For performance purposes, one of ours in-house applications requires to enable >>>> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the >>>> kernel to force all of the big enough and alligned memory allocations to >>>> reside in hugepages. I believe the reason behind this decision is to >>>> have more control on data location. >>> >>> He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1 >>> enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as >>> 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled). One option would >>> to use it instead of 'always' and use glibc.malloc.hugetlb=1. >>> >>> The main drawback of this strategy is this system wide setting, so it >>> might affect other user/programs as well. >>> >>>> >>>> For stack allocation, it seems that hugepages make resident set size >>>> (RSS) increase significantly, and without any apparent benefit, as the >>>> huge page will be split in small pages even before leaving glibc stack >>>> allocation code. >>>> >>>> As an example, this is what happens in case of a pthread_create with 2MB >>>> stack size: >>>> 1. mmap request for the 2MB allocation with PROT_NONE; >>>> a huge page is "registered" by the kernel >>>> 2. the thread descriptor is writen in the end of the stack. >>>> this will trigger a page exception in the kernel which will make the actual >>>> memory allocation of the 2MB. >>>> 3. an mprotect changes protection on the guard (one of the small pages of the >>>> allocated space): >>>> at this point the kernel needs to break the 2MB page into many small pages >>>> in order to change the protection on that memory region. >>>> This will eliminate any benefit of having small pages for stack allocation, >>>> but also makes RSS to be increaded by 2MB even though nothing was >>>> written to most of the small pages. >>>> >>>> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after >>>> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for >>>> the application. >>>> >>>> At this point I am very much confident that there is a real benefit in our >>>> particular use case to enforce stacks not ever to use hugepages. >>>> >>>> This RFC is to understand if I have missed some option in glibc that would >>>> allow to better control stack allocation. >>>> If not, I am tempted to propose/submit a change, in the form of a tunable, to >>>> enforce NOHUGEPAGES for stacks. >>>> >>>> In any case, I wonder if there is an actual use case where an hugepage would >>>> survive glibc stack allocation and will bring an actual benefit. >>>> >>>> Looking forward for your comments. >>> >>> Maybe also a similar strategy on pthread stack allocation, where if transparent >>> hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on >>> internal mmaps. So value of '3' means disable THP, which might be confusing >>> but currently we have '0' as 'use system default'. It can be also another >>> tunable, like glibc.hugetlb to decouple from malloc code. >>> >> The intent would not be to disable hugepages on all internal mmaps, as I >> think you said, but rather just do it for stack allocations. >> Although more work, I would say if we add this to a tunable then maybe >> we should move it from malloc namespace. > > I was thinking on mmap allocation where internal usage might trigger this > behavior. If I understood what is happening, since the initial stack is > aligned to the hugepage size (assuming x86 2MB hugepage and 8MB default > stack size) and 'always' is set a the policy, the stack will be always > backed up by hugepages. And then, when the guard page is set at > setup_stack_prot, it will force the kernel to split and move the stack > to default pages. Yes for the most part I think. Actually I think the kernel makes the split at the the first write. At the setup_stack_prot, it could sort of get to the conclusion that the pages would need to be split, but it does not do it. Only when the write and page exception occurs it realizes that it needs to split, and it materializes all of the pages as if the hugepage was already dirty. In my madvise experiments, only when I madvise after the write it gets RSS to bloat. > It seems to be a pthread specific problem, since I think alloc_new_heap > already mprotect if hugepage it is used. > > And I agree with Florian that backing up thread stack with hugepage it might > indeed reduce TLB misses. However, if you want to optimize to RSS maybe you > can force the total thread stack size to not be multiple of hugepages: Considering the default 8MB stack size, there is nothing to think about, it definetely is a requirement. > > $ cat /sys/kernel/mm/transparent_hugepage/enabled > [always] madvise never > $ grep -w STACK_SIZE_TOTAL tststackalloc.c > #define STACK_SIZE_TOTAL (3 * (HUGE_PAGE_SIZE)) / 4 > size_t stack_size = STACK_SIZE_TOTAL; > $ ./testrun.sh ./tststackalloc 1 > Page size: 4 kB, 2 MB huge pages > Will attempt to align allocations to make stacks eligible for huge pages > pid: 342503 (/proc/342503/smaps) > Creating 128 threads... > RSS: 537 pages (2199552 bytes = 2 MB) > Press enter to exit... > > $ ./testrun.sh ./tststackalloc 0 > Page size: 4 kB, 2 MB huge pages > pid: 342641 (/proc/342641/smaps) > Creating 128 threads... > RSS: 536 pages (2195456 bytes = 2 MB) > Press enter to exit... > > But I think a tunable to force it for all stack sizes might be useful indeed. > >> If moving it out of malloc is not Ok for backcompatibility reasons, then >> I would say create a new tunable specific for the purpose, like >> glibc.stack_nohugetlb ? > > We don't enforce tunable compatibility, but we have the glibc.pthread namespace > already. Maybe we can use glibc.pthread.stack_hugetlb, with 0 to use the default > and 1 to avoid by call mprotect (we might change this semantic). Will work on the patch right away. I would swap the 0 and the 1, otherwise it looks in reverse logic. 0 to enable and 1 to disable. > >> >> The more I think about this the less I feel we will ever be able to >> practically use hugepages in stacks. We can declare them as such, but >> soon enough the kernel would split them in small pages. >> >>> Ideally it will require to cache the __malloc_thp_mode, so we avoid the non >>> required mprotected calls, similar to what we need on malloc do_set_hugetlb >>> (it also assumes that once the programs calls the initial malloc, any system >>> wide change to THP won't take effect). >> Very good point. Did not think about this before.