From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by sourceware.org (Postfix) with ESMTPS id ECC763858D33 for ; Thu, 9 Mar 2023 14:30:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ECC763858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3297ea92016211; Thu, 9 Mar 2023 14:30:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=references : from : to : cc : subject : in-reply-to : date : message-id : content-type : mime-version; s=corp-2022-7-12; bh=XcMF3CEiOVhLdFR6ac1mdzTMeEi6WkCHU3vssy5qikI=; b=AB9KzqyiUChHDUYpkK9rlmNLVHqmr1pu2iJxJE4BLAUCabt6KdeGGFCp1KM4IwrJ1TTJ DmGTyUWQIMB/QMl+yiKCNg/UZpgZEUrrxUueFszpN9wzs77NgXHdMtI0H3amWy1oHRpQ t9Q19KQlL9c9jWuGzfuYw85Rv8tsvRqqhnQFvQeUPp64/eVTLnxHs8aU6pXox4cuG+nE 2+AEBuL555BkvVQ2NgXYJpKx+HzizaESdNXe1wgtIoMLXxQZ12G9NKumCFqcIzyajKHN qB4FuDvUmdW2zDhFUz5cVk1j0/gLb3viF5TIfaHGsHguH3+p4c5cTntwztVUjTFl2G3x qw== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3p417cjssd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 09 Mar 2023 14:30:09 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 329D3BnN020912; Thu, 9 Mar 2023 14:30:08 GMT Received: from nam10-mw2-obe.outbound.protection.outlook.com (mail-mw2nam10lp2104.outbound.protection.outlook.com [104.47.55.104]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3p6fu9j0s6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 09 Mar 2023 14:30:08 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=D2pzo8klCBrJmr6zfVl8kLqXMB8QWd02Exg7jHZva2OPbl568KunpHuci/9j5gmFNCvihihmoFSeAAh5gxJOwvChJO6zo3SzfbOrajoOJWIsNNr6FUOBff6axT0zEt6JxHqhG8BFWerniAiYVE94xfdoqKuSXut38AACzlzREvmMBiL6Ph7/F/8rxnRI2cun/PjJrkhPO8Sov33IJLM7jIEKxdTm5zYJ8YnroZxx0Pa3ooNQgH8YyEJr/jaa/qnWu9RODkDdvBUHOeUHjcoFaHNVdZVZhvwvyQLB1N23LwW8eWQ/1Pv/czUeHmoG6k+V9AejG2UprWk+frY0byLhmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XcMF3CEiOVhLdFR6ac1mdzTMeEi6WkCHU3vssy5qikI=; b=ELvM5NQhoJ9nvZP5Qd5wLZXFnrfeQmp5kP5mfy1+SXzJjiDJod6fnDBEjruNefPUtAhK3JuCllBx4FowEFzzClrWo91Q/aMuq1Buwg0eN4voGTYWMekFyZIUxnB7n5JdKaYAHFGdvHguzRUvJWIQ5JlStx8nVOgyhZrkmbKw0WtcsuO1zFMsAMn8RPeBjvosgIEUzzNXsyVpdTlJZYivus+kMyxd2PaYXw+It4DQiKwoFIKbFJTu083ZP+4YBRpXtb80qoTljqj7unVaA8xeh2RMc1EB+gg5u/P4+OsVIIyuvFRJKr+UB+4wBVhSGCEdyvmzhYQCB59/lmMa8Q5N8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XcMF3CEiOVhLdFR6ac1mdzTMeEi6WkCHU3vssy5qikI=; b=qjeRNYYzecfWIFbL7CzQbh4uykWuhDgmQDwEsm1dB9OaUEDlqiKg7krxQF2OGaioq/vpPjeKNqylXWlKjoeYk7DwIKurxjuS9TfDMuk9B2v6d0+HYp5WQflLlfZSYgPTBEYsw+4VR1553LS31XYGRgWoE33x1ZeeBAW5NYuayLo= Received: from BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) by DM6PR10MB4315.namprd10.prod.outlook.com (2603:10b6:5:219::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6178.19; Thu, 9 Mar 2023 14:30:06 +0000 Received: from BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728]) by BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728%6]) with mapi id 15.20.6156.023; Thu, 9 Mar 2023 14:30:06 +0000 References: <87pm9j4azf.fsf@oracle.com> <87mt4n49ak.fsf@oracle.com> <87bkl2b3f1.fsf@oldenburg.str.redhat.com> User-agent: mu4e 1.4.15; emacs 28.1 From: Cupertino Miranda To: Florian Weimer Cc: Cupertino Miranda via Libc-alpha , "Jose E. Marchesi" , Elena Zannoni , Cupertino Miranda Subject: Re: [RFC] Stack allocation, hugepages and RSS implications In-reply-to: <87bkl2b3f1.fsf@oldenburg.str.redhat.com> Date: Thu, 09 Mar 2023 14:29:56 +0000 Message-ID: <875yba3sm3.fsf@oracle.com> Content-Type: multipart/mixed; boundary="=-=-=" X-ClientProxiedBy: SGXP274CA0006.SGPP274.PROD.OUTLOOK.COM (2603:1096:4:b8::18) To BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN6PR1001MB2340:EE_|DM6PR10MB4315:EE_ X-MS-Office365-Filtering-Correlation-Id: 09d26644-ade6-4833-11e0-08db20aac728 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: X5kr3V7Aat+bEmkfqgdaCmGklQyJXcQanubjt7AC2AyFM0oUiho0M9d1ynzDOV5N1Nn4kWoNsfCQDj2HJ5HZ692Yt/fjSdXxgn9G8kvd64oLSTnQIWKMbY8juQ9xmmzJggDDl5xGY4bmXIlIMx0RKRWIXwWtHQHqtlggZrhUbB7PkJKLMTpYFNDWZaTUCn8EbxVkV2UEzdeGXlq7lUGFDLv1ZNMYEDqJCda1E6wuYpZPlTWut5qHCH3aashHoS6/FwmnhhYbrwofDQ4clsTUY6O+X1SEjnGFjq+fj9n2Ec/zqfFeJJnaTDrwSpEPoFBUbw65xq57uOyRBQ8W6AaKSHVwp5dfwfOCHjsslP8/sAxZn75UqQVCUSSzeXwUOqpWTqs5PwWA3Bx/3fpobkLoSYzWZOE770uu07bSIcJDiHaur2QDsXhw8o3Um5M859IBDGY2+YWG2Ym6zY9cLI9hYAZTc4UHFQViAtdcfGI/OU6EgP9N3mudohL/FLzXlWwMYYMygs8vNPfdjmEN/3LAKOmgvlKLrVLSzWGpcfiSdMEpBbB3v6cEBIiZ7lBK/swf9S2QrFuL3kUQy7BH4WeOQaMqr6TvHXewVoY70HDs6vpjAqpResnQFez0cL7iLcEKqKIFVN7vAaac2U/o2QtUtA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BN6PR1001MB2340.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(346002)(136003)(396003)(376002)(366004)(39860400002)(451199018)(66946007)(44832011)(2906002)(5660300002)(235185007)(8676002)(8936002)(36756003)(41300700001)(4326008)(66476007)(66556008)(6916009)(316002)(86362001)(54906003)(478600001)(6486002)(6666004)(38100700002)(6506007)(6512007)(2616005)(186003)(83380400001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?1du7eVTMcXriNuh41XelvAUXvY2E4VmfBsiwmRxUwMvhQgKly1A0vCC8wifQ?= =?us-ascii?Q?VQXaaotys8TaZx+Y5ACvi4CinJVB7H91Ilx00mJgiI7Awm+wuom75Q8w4Pt7?= =?us-ascii?Q?iXSnSCd8/3ZZ8fxF+vViHdwMairSI7ycv0opnrlcB6Vt6W5qISQ+B9+S/4zw?= =?us-ascii?Q?nW0A8gV2tJMIv+ZNq3/Af2YUajHgXBFaYqY6Pi1MC/y/e54N3PKLEZGQ/Sap?= =?us-ascii?Q?ZdbVsCm6RxMLebX2JnR4lZqfQLaua+PUSJW7dfycv6vtDLNfCcnbCC6cbIkC?= =?us-ascii?Q?mbAnE7F3+0Ifo1Lx6sDderTcrVhs2y3Gcdea7HO1fnjpctHu3chMM//n/ee8?= =?us-ascii?Q?1YfQuN0wQ2MMX89JjfjWFY8JI4J0iSsXd8nfdIlE5nxFcTV0KMEldbSF7qwp?= =?us-ascii?Q?X7YnnJcVRK+cb9Ef5S5ABBeDCDhb4WTZfBc+YRec9BsEeZfWn04MEfFhaW+I?= =?us-ascii?Q?5CtNJPL4YXXSjqBDDugpTxBkH6TlQxaRlJ12VtKo+AeRGitDLtOi7y4VW/L3?= =?us-ascii?Q?8xrAivoIgl7thyWdYjNScRIFse89X0A3ki9huuiTiwqraF4FoDKwNP8JG8Rm?= =?us-ascii?Q?+Xs86J53+jnS2BVEOqzKhnBkQOq515ujaq8SJ/sQ8GG8uEhiBIOBVCO1ien8?= =?us-ascii?Q?kzHKvcXANMWUoqO0QRdsZupZYiRUwElCAUoJnS6W93RZ9xnTl+ahCtN6t6+p?= =?us-ascii?Q?PvAdPK930CeDfug3gdhX7ROElKa/wjb4Dmc0YonE/aP4414GqjM//jkrUkt4?= =?us-ascii?Q?9iOYBJe0zm2zojo3FUit0qKfhp/gp8y52FPwWBfQ4afOz3avki/D2wbhR9Xs?= =?us-ascii?Q?Ptt3iecYJ13md5x+nYi5afS1MajvZ+KGgk8DkdH6O2WXJy80sDb3ftTWvRnI?= =?us-ascii?Q?ADOGDCgYvJyeUoyE6XW4bg1+YsSu81ozcizqJk4AkMW7tHPb35OzzmhDCxYu?= =?us-ascii?Q?fa5T8ZX0UBWg2p5r8KgDptj++B92KojSX2E5ppT3kqOnermYjea4Yh6Jvn+H?= =?us-ascii?Q?/FSECGuduzs0ledKEeszch191l+1LvufgJZn5XY9JDDGb+b4n+X5xVutofO2?= =?us-ascii?Q?x0fU2IYueHGxkmhia1ezjDWnyzmlvOBB0Oo/ELLczHGNrQsQCLMD0ypBDgTK?= =?us-ascii?Q?J4qQpji9CGxkEE6FnbPVQcQNhFq/R4ZdjY4Jl19yqtKqS34HGieN9/AjHd7u?= =?us-ascii?Q?n7/JPxzqXUZ9ankysu7QvgLDseP7JE2q9XCQOhO12rbYfmTVHgM5CT7ljo0Y?= =?us-ascii?Q?EUdwmhEeHcafiCEe2lCibT0NW637jgxuHHlAyuX19mtP6cfnOnhVRec82vAw?= =?us-ascii?Q?V/Gk2Yn34S9tppaPO49dX6K5PCByklSnvOZdP3vUjcj02f31UHJASGeKIG1m?= =?us-ascii?Q?OKTB+gi7HjkU9FwoIctPgiMEm8e34rVcEGofqC6wfxKx6y18on5ob8lJ73P5?= =?us-ascii?Q?1WR168FHwsuheeCGLJBhabGsdn0OMRdJQpbuMVU4JqpxGiYEIU904VhwYuug?= =?us-ascii?Q?AgXUNXC9ZnEgfWmAshn5gHQbX/hbMCahqMaJGnb8zzQyqERDnIfFnrIlklA3?= =?us-ascii?Q?I5OaAM9UlYxZmDL912LrGTZJhIamd1kMp+runwlW/NoRNMnFBUlbTBw/T4GL?= =?us-ascii?Q?9kq1Mn9IRteoc5+JLnE6Jhmb4dvhJGwCaOBkTSZMVSeL?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: /QCazbbX940X5ANStkvhTp/NzqOdcBfeKRac6KGJlVJQ5qfvkXRh2LulcNE3xpS/pHg5diaFcg1DabZayWm7Pm1gqY9s7O1WNGgcEQ0dQWJF7JxjodSt8chmQR7kT0TNQ9uXDX/46IwI07viB+NobkxCJOuN5XTnDYuaaFr56lVCBPqoKEIdx000KWRXzwCFt8DGFGZf6ezzNzh6j9G5qPvudqjYTr3K99rWGS0KbtrV55gxxMUkJjZiHNr4+QofkDGLKID9TbUmhjWakFdEXJekdiDefUBm5e8pRG5QgWOhm6B33k6FoWFdwQVOZHQ9FxPoEJ1HMsD6svFWj1uu98+DCrrLBXl5mmHhpDWTVkjZBtlpMVpMA7P/VTXXjgIm9PMgK9A7uQdvKFQJPyJtt6+gjs/a6ME8ovX6riI9K5l8a3cd55Bn7A+zMWJYovUiVnaJ+O2d8MAb65TetBi6TjP1D9eVFCcr4lDMTn+W+b6xQwCdeuA1iaybgZtidgn/isDWCGRYTVcG5hnCSOd4nK0mxAxtBejsE8mqsd16ZDO3Kow2DQdXtHRfxmrY9+xslwhbKNhxqnMRsJyFDgNBKXYnoq3mHTk3fen//xYQUOGsZalDsswzTufyAPS2Z5c8ITAtHgSeX3hmKuC430YeeDlS5yUvrezBli/0FbL82RCURfF1ovRc6zjzdxnNT/bkUuAZDTO4/zpeZp0OFRGj+i/adESl7seRjGO0jxuu4zONOWCKNIEh2PhOmDO9nKj3bsaqChmeDR8T+tR1W3oLozMcjjOIDYFLJfID2nfHolw7Hg+1ojx8v0Th+QkwJMwq X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 09d26644-ade6-4833-11e0-08db20aac728 X-MS-Exchange-CrossTenant-AuthSource: BN6PR1001MB2340.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Mar 2023 14:30:06.3846 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 3Mgj2qvpIMjl1B56m1D2CjKJNHAwBjkTfLB+Pxt2Eu8YojCAe+XVsztgA+LgbfsqJJyqktStNnqwZZACtfDV1nII2NpR9X1zUNbNyGOCTBY= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR10MB4315 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-09_07,2023-03-08_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 adultscore=0 phishscore=0 suspectscore=0 malwarescore=0 spamscore=0 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303090116 X-Proofpoint-GUID: pF9w-Eeghm0h7efw3aTZNsK0m94Tt2NR X-Proofpoint-ORIG-GUID: pF9w-Eeghm0h7efw3aTZNsK0m94Tt2NR X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-=-= Content-Type: text/plain Hi Florian, >> Hi everyone, >> >> For performance purposes, one of ours in-house applications requires to enable >> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the >> kernel to force all of the big enough and alligned memory allocations to >> reside in hugepages. I believe the reason behind this decision is to >> have more control on data location. >> >> For stack allocation, it seems that hugepages make resident set size >> (RSS) increase significantly, and without any apparent benefit, as the >> huge page will be split in small pages even before leaving glibc stack >> allocation code. >> >> As an example, this is what happens in case of a pthread_create with 2MB >> stack size: >> 1. mmap request for the 2MB allocation with PROT_NONE; >> a huge page is "registered" by the kernel >> 2. the thread descriptor is writen in the end of the stack. >> this will trigger a page exception in the kernel which will make the actual >> memory allocation of the 2MB. >> 3. an mprotect changes protection on the guard (one of the small pages of the >> allocated space): >> at this point the kernel needs to break the 2MB page into many small pages >> in order to change the protection on that memory region. >> This will eliminate any benefit of having small pages for stack allocation, >> but also makes RSS to be increaded by 2MB even though nothing was >> written to most of the small pages. >> >> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after the >> __mmap in nptl/allocatestack.c. As expected, RSS was significantly >> reduced for the application. > > Interesting. I did not expect to get hugepages right out of mmap. I > would have expected subsequent coalescing by khugepaged, taking actual > stack usage into account. But over-allocating memory might be > beneficial, see below. It is probably not getting the hugepages on mmap. Still the RSS is growing as if it did. > > (Something must be happening between step 1 & 2 to make the writes > possible.) Totally right. Could have explained it better. There is a call to setup_stack_prot that I believe changes the protection for the stack-related values single small page. The write happens right after when you start writting to stack-related values. This is the critical point where it makes RSS grow by the hugepage size. > >> In any case, I wonder if there is an actual use case where an hugepage would >> survive glibc stack allocation and will bring an actual benefit. > > It can reduce TLB misses. The first-level TLB might only have 64 > entries for 4K pages, for example. If the working set on the stack > (including the TCB) needs more than a couple of pages, it might > beneficial to use a 2M page and use just one TLB entry. Indeed it might only not make sense if (guardsize > 0) as it is the case of the example. I think that in this case you can never get a hugepage since the guard TLB entries will be write protected and would have different protection from the remaining of the stack pages. At least if you don't plan to allocate more than 2 hugepages. I believe allocating 2M+4k was considered but it made it hard to control data location. > In your case, if your stacks are quite small, maybe you can just > allocate slightly less than 2 MiB? > > The other question is whether the reported RSS is real, or if the kernel > will recover zero stack pages on memory pressure. Its a good point. I have no idea if the kernel is capable to recover the zero stack pages in this particular case. Is there any way to trigger a recover? In our example (in attach), there is a significant difference in reported RSS, when we madvise the kernel. Reported RSS is collected from /proc/self/statm. # LD_LIBRARY_PATH=${HOME}/glibc_example/lib ./tststackalloc 1 Page size: 4 kB, 2 MB huge pages Will attempt to align allocations to make stacks eligible for huge pages pid: 2458323 (/proc/2458323/smaps) Creating 128 threads... RSS: 65888 pages (269877248 bytes = 257 MB) After the madvise is added right before the writes to stack related values (patch below): # LD_LIBRARY_PATH=${HOME}/glibc_example/lib ./tststackalloc 1 Page size: 4 kB, 2 MB huge pages Will attempt to align allocations to make stacks eligible for huge pages pid: 2463199 (/proc/2463199/smaps) Creating 128 threads... RSS: 448 pages (1835008 bytes = 1 MB) Thanks, Cupertino > > Thanks, > Florian @@ -397,6 +397,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, } } + __madvise(mem, size, MADV_NOHUGEPAGE); /* Remember the stack-related values. */ pd->stackblock = mem; pd->stackblock_size = size; --=-=-= Content-Type: text/x-csrc Content-Disposition: attachment; filename=tststackalloc.c // Compile & run: // gcc -Wall -g -o tststackalloc tststackalloc.c $< -lpthread // ./tststackalloc 1 # Attempt to use huge pages for stacks -> RSS bloat // ./tststackalloc 0 # Do not attempt to use huge pages -> No RSS bloat #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include // Number of threads to create #define NOOF_THREADS (128) // Size of a small page (hard-coded) #define SMALL_PAGE_SIZE (4*1024) // Size of a huge page (hard-coded) #define HUGE_PAGE_SIZE (2*1024*1024) // Total size of the thread stack, including the guard page(s) #define STACK_SIZE_TOTAL (HUGE_PAGE_SIZE) // Size of the guard page(s) #define GUARD_SIZE (SMALL_PAGE_SIZE) //#define PRINT_STACK_RANGES //#define PRINT_PROC_SMAPS // When enabled (set to non-zero), tries to align thread stacks on // huge page boundaries, making them eligible for huge pages static int huge_page_align_stacks; static volatile int exit_thread = 0; #if defined(PRINT_STACK_RANGES) static void print_stack_range(void) { pthread_attr_t attr; void* bottom; size_t size; int err; err = pthread_getattr_np(pthread_self(), &attr); if (err != 0) { fprintf(stderr, "Error looking up attr\n"); exit(1); } err = pthread_attr_getstack(&attr, &bottom, &size); if (err != 0) { fprintf(stderr, "Cannot locate current stack attributes!\n"); exit(1); } pthread_attr_destroy(&attr); fprintf(stderr, "Stack: %p-%p (0x%zx/%zd)\n", bottom, bottom + size, size, size); } #endif static void* start(void* arg) { #if defined(PRINT_STACK_RANGES) print_stack_range(); #endif while(!exit_thread) { sleep(1); } return NULL; } #if defined(PRINT_PROC_SMAPS) static void print_proc_file(const char* file) { char path[128]; snprintf(path, sizeof(path), "/proc/self/%s", file); int smap = open(path, O_RDONLY); char buf[4096]; int x = 0; while ((x = read(smap, buf, sizeof(buf))) > 0) { write(1, buf, x); } close(smap); } #endif static size_t get_rss(void) { FILE* stat = fopen("/proc/self/statm", "r"); long rss; fscanf(stat, "%*d %ld", &rss); return rss; } uintptr_t align_down(uintptr_t value, uintptr_t alignment) { return value & ~(alignment - 1); } // Do a series of small, single page mmap calls to attempt to set // everything up so that the next mmap call (glibc allocating the // stack) returns a 2MB aligned range. The kernel "expands" vmas from // higher to lower addresses (subsequent calls return ranges starting // at lower addresses), so this function keeps calling mmap until it a // huge page aligned address is returned. The next range (the stack) // will then end on that same address. static void align_next_on(uintptr_t alignment) { uintptr_t p; do { p = (uintptr_t)mmap(NULL, SMALL_PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0); } while (p != align_down(p, HUGE_PAGE_SIZE)); } int main(int argc, char* argv[]) { pthread_t t[NOOF_THREADS]; pthread_attr_t attr; int i; if (argc != 2) { printf("Usage: %s \n", argv[0]); printf(" huge page stacks = 1 - attempt to use huge pages for stacks\n"); exit(1); } huge_page_align_stacks = atoi(argv[1]); void* dummy = malloc(1024); free(dummy); fprintf(stderr, "Page size: %d kB, %d MB huge pages\n", SMALL_PAGE_SIZE / 1024, HUGE_PAGE_SIZE / (1024 * 1024)); if (huge_page_align_stacks) { fprintf(stderr, "Will attempt to align allocations to make stacks eligible for huge pages\n"); } pid_t pid = getpid(); fprintf(stderr, "pid: %d (/proc/%d/smaps)\n", pid, pid); size_t guard_size = GUARD_SIZE; size_t stack_size = STACK_SIZE_TOTAL; pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, stack_size); pthread_attr_setguardsize(&attr, guard_size); fprintf(stderr, "Creating %d threads...\n", NOOF_THREADS); for (i = 0; i < NOOF_THREADS; i++) { if (huge_page_align_stacks) { // align (next) allocation on huge page boundary align_next_on(HUGE_PAGE_SIZE); } pthread_create(&t[i], &attr, start, NULL); } sleep(1); #if defined(PRINT_PROC_SMAPS) print_proc_file("smaps"); #endif size_t rss = get_rss(); fprintf(stderr, "RSS: %zd pages (%zd bytes = %zd MB)\n", rss, rss * SMALL_PAGE_SIZE, rss * SMALL_PAGE_SIZE / 1024 / 1024); fprintf(stderr, "Press enter to exit...\n"); getchar(); exit_thread = 1; for (i = 0; i < NOOF_THREADS; i++) { pthread_join(t[i], NULL); } return 0; } --=-=-=--