From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by sourceware.org (Postfix) with ESMTPS id 9AA433858D33 for ; Thu, 9 Mar 2023 09:38:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9AA433858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3297eb85016226; Thu, 9 Mar 2023 09:38:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=references : from : to : cc : subject : in-reply-to : date : message-id : content-type : mime-version; s=corp-2022-7-12; bh=hsSFObow/FNjKgyztWjjOjXJ8fNY3WeI4DUKHZ8di14=; b=sqXM+bpgOQ1XUvLf232sgkOgbIRZYafx7yXiqTLhwyyNeaUzsw7PVPXw3Kp6ptMFf1iU H/kzhbX6CdTBTbllhOZEzSWeJ1ZOMLwAJvOvKU4YMGu5WubG02VFJPwmE7nydk28M7hS 0AmstaS/bjN3dLEjt1lbX7e4B1LkuMLG83oWTh//6tC24QsxnMZ0gWp2Mh1qRr0oift0 bIG2kG/YlORcHRJWpoWrtrVHO5lcPyW5MQh25jZvC59ZEPZ6xwFInv+1/9VlcIe5TJbc WDPRlKvp62rWkfXr1BRnPEOgD8q1aMizjzgEIucPwEtLyZXKFn08dJbHEZASvcz8IndU WQ== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3p417cj7ma-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 09 Mar 2023 09:38:16 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 32985c7p021892; Thu, 9 Mar 2023 09:38:15 GMT Received: from nam02-dm3-obe.outbound.protection.outlook.com (mail-dm3nam02lp2041.outbound.protection.outlook.com [104.47.56.41]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3p6fra1hw2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 09 Mar 2023 09:38:15 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RqYq1xvWKB/HFAeOBXxfgoDaFWkVZg+S6Zw6+AxTN0/8Zh5cmcczHSqgqGCwCnTBbgu7CoiJ4vthG2N8p/1WU4ohhSURMVsWSH8J8jdOhCyi2H6SUEER9E6KcZiYnmrhr1v9O7aAeU94gKcUhXKSMqcx54jxgl59qsMdL4MCBqfdDqJqEMqUvfmaosHd6n+oJFJglN+HcWXixUt9Si1qfdNp8LrL0irMiBVZgVYFhtiX1fqeKY+SjUREF8J+d5BQeTe08HnhM+qrkJkEu3+RoJhfvp6hTMvWJUe/c7Cr7Yc2jv7VzJ4Fed/RELYxw9C67FhBePk5eUhnzd7UQANrVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=hsSFObow/FNjKgyztWjjOjXJ8fNY3WeI4DUKHZ8di14=; b=grH6gJchf99z67Io9RCEuvc3gCrzkeShvloDue+aKXp9pZngrfDphHJOOX3HqW4ez0aNYvF5V9Ji+LIrNOskCWvAFdBq3oL0dlLk2D8shTo2jvpdBD3VTwFU+5Y7sUONlL3BV2XOpj/J9XjpQ3i39c5CInbzOz/EA76SQb5oQTEiSr7OwcFmE88yDOOOFUy8HOfmTkhyQiz7v4eJgSoE9iEqcYMZzzrye+LCZO819WYIYoA5Ek4RymWq9SN2THdoGjjhKjKK1GgPTnEnE3//cSrOTAowUs0yM0qel+8AAUQDrjl4c1lrFeHOJYshD+vVcZYoqJxKwcq3rwu32HIKKQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hsSFObow/FNjKgyztWjjOjXJ8fNY3WeI4DUKHZ8di14=; b=yqwothiz2E2SYQGbu609JLVImg2uY4WH14r+vZA1OzkeWpR+ZBRDwHQVJLd9lLVy8oXdMFsZAA4zni7l1mHVqIldKG+LsfxqeifLkkO34nwDDcLtkFQ6knnHBIX/P04+MJmOMjlCdyWnozuIHFw8gN+Y9rB707SqopvnyU3aVr8= Received: from BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) by BL3PR10MB6113.namprd10.prod.outlook.com (2603:10b6:208:3b8::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6156.23; Thu, 9 Mar 2023 09:38:13 +0000 Received: from BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728]) by BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::a502:c948:c3f6:9728%6]) with mapi id 15.20.6156.023; Thu, 9 Mar 2023 09:38:13 +0000 References: <87pm9j4azf.fsf@oracle.com> <87mt4n49ak.fsf@oracle.com> <06a84799-3a73-2bff-e157-281eed68febf@linaro.org> User-agent: mu4e 1.4.15; emacs 28.1 From: Cupertino Miranda To: Adhemerval Zanella Netto Cc: libc-alpha@sourceware.org, "Jose E. Marchesi" , Elena Zannoni , Cupertino Miranda Subject: Re: [RFC] Stack allocation, hugepages and RSS implications In-reply-to: <06a84799-3a73-2bff-e157-281eed68febf@linaro.org> Date: Thu, 09 Mar 2023 09:38:07 +0000 Message-ID: <87edpy464g.fsf@oracle.com> Content-Type: text/plain X-ClientProxiedBy: AM0PR02CA0014.eurprd02.prod.outlook.com (2603:10a6:208:3e::27) To BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN6PR1001MB2340:EE_|BL3PR10MB6113:EE_ X-MS-Office365-Filtering-Correlation-Id: 182267ff-b5ec-4fec-019e-08db20820081 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: UUlapDoyq/lGntjA15f5z0GirbuFLRZgMv47ak+RtOYNDuD6fvqBxSjFMZPUxgIMOXilzehQd409FBh3SixEsGY32RVpIIY8bb6rEJL8oe9UJzCGCQ7e0hMQ4OeyTDUF9LEDRtc2XT0EWJsZXc0SMezbOuMBGlYnaujbhsxsGHDRpFgusN3HXblFTz4GFYrkzA16wm3DD9PhobaqM/7M+uxuWw/uNFXW7C4GCO750yEejs2eosWGFh8U7AaLs3KFYt9+NWZ8cPmZhuVkEmmWgTOiG0nlzBOf5xoPnXhUXvOtRKdyWA9A37nxloh4DvXSdF0RC7cCnsg6lEtBtiJ4qiaO2CXSA/QVCMdZKwzpnn9fUuWpbvZGizkOJX4QA72QlCltmfalq9bseMdStkf5YFouBec3bJ5yPG/+VH0hr2uHuy82P53xa5wVAcOVQ2q698CQPdjMgeMONXXFJ0Hn24y/ywH8Jwsxc9zHuDKIya4mwGLTmDbu5fZKIXEs0sMMeg1qz1G9bNmG0s4vCx+Dq6uiL+JJNIUduFXwBbTH/wAzGAr+W8cWysdDFoLFrtCmntPabqi85hiIGfsS1fgm5JhFzndbLt/BrvANfIAtrCLbHAZ2+dcUo3KyFhbuIkZuD/xi4SBjKCx7cLW0wmFUCg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BN6PR1001MB2340.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230025)(366004)(346002)(136003)(396003)(376002)(39860400002)(451199018)(66899018)(41300700001)(66556008)(66946007)(8676002)(6486002)(4326008)(6916009)(316002)(54906003)(478600001)(5660300002)(44832011)(86362001)(36756003)(66476007)(6666004)(6506007)(186003)(38100700002)(53546011)(6512007)(2616005)(83380400001)(2906002)(8936002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?K1/ibwi5w0tEtYhaulQTXEHyYLOTIRDXP7vVgpCQDLQia6t+7lnl86Jud8fr?= =?us-ascii?Q?rS/8y0/bIKHQTWNmd7Xoy6jnUiTx4h6BaDUXN0AYgLV70e7JWJfDFX82PX58?= =?us-ascii?Q?/RBTruQDs5939SW/2JaMYhcS72llKV380Oxn7poUZGDVCo388XY04kfONrqw?= =?us-ascii?Q?gJhorBbTVK8lXlzYCGPJGGnOBC7RLT+1uzPZxTiELCSkIBMBlATODdpVtmP+?= =?us-ascii?Q?hlMX/UaQGBpnaJbDX34DyLK3MUYM2ECuqWUSur9BNwnhbVDx9UXE/z7NYpwi?= =?us-ascii?Q?XnbMOFDa+XmdpiN/G+QGJ92J3aEUAQrKGMpLsJsIlqAgJWA6XxSmIudkYmAI?= =?us-ascii?Q?1MaloN0QBU0j+hwBiqkJjLVEaSn58jXWSawbnVs+kw9PkbPwSduVRAjMwK4u?= =?us-ascii?Q?EYa6qzAfC4snd07eyT0936IqYKV/GsmTaNzNS9bDTv2Iiwi4gjXeAVu6p/n9?= =?us-ascii?Q?cCOUivzk+p1gsqGckZF9tzfHaELD+YHeihehfoShlkXFDgH7LOqXPpunDUA5?= =?us-ascii?Q?BXVKwcH+Z22QYopResz0DHAmLJWR/2Net2qvRDQbElJl4NGqwGgSkSUkLyKd?= =?us-ascii?Q?0dU25imyYosoIHXHuB4gI0wBUFO8nG47vWp/1PMxWgrV+u3GTKg3Dxt6Y7P/?= =?us-ascii?Q?ppvYOHnJkLyOnNRxXxl1CAZkLHhxzq8nC38eljkWdepJYiOSIHD7jmUJQSFF?= =?us-ascii?Q?1fvLeWO0CczbvmliYYwOwXQFEJyXKTr0A5PMZ/bsBMVCehvsTDmImj4wZzFe?= =?us-ascii?Q?uurhOtkxySx0qLBHp39WdTsWN69yoe0sKfcNNkc+l6MQLH7tPLmv/HNUsSXW?= =?us-ascii?Q?d9vdsKtNSjHbdNrf7tEUdWas+ZmoafqDLPHjDBYDXr2YKwKcoiTLOn85rrlj?= =?us-ascii?Q?z4Q9br03E1Amk1MUZa6GgcB01ktFt4qEoPCZHqZwbKOvGA16tbcz57kyLEFC?= =?us-ascii?Q?PRNFWSNlaQ/nMWV7fiYvQYXo8VWRYMss4IgDam9fKR5fsFetYJfp8oyMH/kN?= =?us-ascii?Q?YrWwm/nHnSQfXaDu+dcoAMk7ba68d0UlXNKqnvmF/+Nyl/x0aaC2uK0Dkzda?= =?us-ascii?Q?1mOToCXbNOhF31odfgg/G9bXHU70+alMZTjO6H1trOSxgJkyg/n5JRm+wS16?= =?us-ascii?Q?fnVYNpodBTRIQYgQBmnrUPCeSDYxygA04UAHLXtK8YtwH4S5Q8/vk1vOtSmQ?= =?us-ascii?Q?HfWi7c/PHyZhYHx+EzTttX9fsciIMtvbFHmlVtLKuoIejopYl+gZ8U3sUJo7?= =?us-ascii?Q?F/Ug5hPrstUlp4rRgN6TAgxiZtkx84PH6MZYuZUbB1epwd44H675WoebOE8I?= =?us-ascii?Q?BnvU6B9xn1PoY7EOTHCcmjD6rs+av6Iu+EQ/kv/DpoUwMnZbWdkHtLqr1Ev7?= =?us-ascii?Q?lQ21QG8ZYv+EAdemQ0o2reuSYnPUXzV/kSGOblclYbE3Ahia9dB54SVlpLGM?= =?us-ascii?Q?iEylBURyjEFndW1nIjsWKVMmsQxNWYFl7Y7nEqIZjrIpzwx2DxkibCKLO4Zl?= =?us-ascii?Q?jDnJZZt1dlx5d7bOo7tKJ5S6yNPrC10XDgbEBZHUljpghwZeWsitYmfFjGwK?= =?us-ascii?Q?dEDSAkjw7af5ERjHjONH7cz6xWT0J5fR411Qv0EyZ+zZ7VsNPiwGt9MfRz6d?= =?us-ascii?Q?s919CCcFOaLRjDo9RuTyUoOuYBBEIW9L5RslVgMVJe2+?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: 6kEFb8CctfA3z3vBSibCZ++pfKf9KGPrEtbKFUcejJzoo04bTaWf1WpwJEER7bnY4gkX790Od2z1nUhrcsXbVA74akXtYo1m1jbwLJfHnOlmptOqYT2z/SVlUXs2w++0+OLtCKkg1rc1ylY160hLZek6sIBzgvhmrFLDKp+NhLUOMBDae95ocBi6ka7vbAIAuvwbGPbycjmTLsFGIOJ69HugPuyTWptTxr1OMMMxuHHDkN74GpoWRKUYKGIWkMK9NbRCzFdKZ/nXnXek+fxk6Cfxa7kMcCoPFDdCulsTDCSX1MGL3MiFKYCpbJ828P9hSMbdTMIoFR2Yfemb28QWj28Sr6bnkVj9z8OnP7hCKMxRitb1/s6E9HwQVdJxxeyrEnUtgKC/sEh0tARkv6b098noE+hxRxHzF3xYBNMVl3gEiGhCPGAbz0uEcSEcF5NV9N6ERCbTVLj6TjtxbMUV8YT6rGIy1caRbFNsvlHYVEPnpcCkNpdEfe9h7pFERCU83a6J+Itm0lOG9dOU1ZqqoEWPoS2Qr8Dg1ILKFgpYgslTZ7heZFOikNXwrl9TMyn47m4u4365jX1mn3C0fqgGqgkWDy7PBg+pQ+yyamZnUYvIa6z99P2l6rc7ALY9mhFE2V2KWi2vHYSNR9sVX5rLRDMoJE3pHTE8k4clqpt2klaJ8UJ6CaR8Q8NFakAWzoUE+O4t5DfKWwnQ3iar0NA9c/zvhNvN6rtkci/SU98G3HpI610EzELeX3MKSxGJBr6l/rV0T8J5TsEDmsUIOistCPlPwcDqpla+M6i+f+SFGjwRfczV7mEyvB+2hvp4ggeQ X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 182267ff-b5ec-4fec-019e-08db20820081 X-MS-Exchange-CrossTenant-AuthSource: BN6PR1001MB2340.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Mar 2023 09:38:13.2317 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: sGB82h43Cs5DhcRwBRl1nW2OrSq1bIyRu7IiZgI4irNIvpsStEsWjvIaJYPy3Cemulau1+j21TzX5LIJUi/09bqqWS0wX6MGxNCVEsJOE4U= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL3PR10MB6113 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-09_06,2023-03-08_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 bulkscore=0 suspectscore=0 mlxlogscore=999 phishscore=0 spamscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303090075 X-Proofpoint-GUID: 77GuYhSBn0Rxza-4HSHFzl_l4zgamoF8 X-Proofpoint-ORIG-GUID: 77GuYhSBn0Rxza-4HSHFzl_l4zgamoF8 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Adhemerval Zanella Netto writes: > On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote: >> >> Hi everyone, >> >> For performance purposes, one of ours in-house applications requires to enable >> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the >> kernel to force all of the big enough and alligned memory allocations to >> reside in hugepages. I believe the reason behind this decision is to >> have more control on data location. > > He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1 > enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as > 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled). One option would > to use it instead of 'always' and use glibc.malloc.hugetlb=1. > > The main drawback of this strategy is this system wide setting, so it > might affect other user/programs as well. > >> >> For stack allocation, it seems that hugepages make resident set size >> (RSS) increase significantly, and without any apparent benefit, as the >> huge page will be split in small pages even before leaving glibc stack >> allocation code. >> >> As an example, this is what happens in case of a pthread_create with 2MB >> stack size: >> 1. mmap request for the 2MB allocation with PROT_NONE; >> a huge page is "registered" by the kernel >> 2. the thread descriptor is writen in the end of the stack. >> this will trigger a page exception in the kernel which will make the actual >> memory allocation of the 2MB. >> 3. an mprotect changes protection on the guard (one of the small pages of the >> allocated space): >> at this point the kernel needs to break the 2MB page into many small pages >> in order to change the protection on that memory region. >> This will eliminate any benefit of having small pages for stack allocation, >> but also makes RSS to be increaded by 2MB even though nothing was >> written to most of the small pages. >> >> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after >> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for >> the application. >> >> At this point I am very much confident that there is a real benefit in our >> particular use case to enforce stacks not ever to use hugepages. >> >> This RFC is to understand if I have missed some option in glibc that would >> allow to better control stack allocation. >> If not, I am tempted to propose/submit a change, in the form of a tunable, to >> enforce NOHUGEPAGES for stacks. >> >> In any case, I wonder if there is an actual use case where an hugepage would >> survive glibc stack allocation and will bring an actual benefit. >> >> Looking forward for your comments. > > Maybe also a similar strategy on pthread stack allocation, where if transparent > hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on > internal mmaps. So value of '3' means disable THP, which might be confusing > but currently we have '0' as 'use system default'. It can be also another > tunable, like glibc.hugetlb to decouple from malloc code. > The intent would not be to disable hugepages on all internal mmaps, as I think you said, but rather just do it for stack allocations. Although more work, I would say if we add this to a tunable then maybe we should move it from malloc namespace. If moving it out of malloc is not Ok for backcompatibility reasons, then I would say create a new tunable specific for the purpose, like glibc.stack_nohugetlb ? The more I think about this the less I feel we will ever be able to practically use hugepages in stacks. We can declare them as such, but soon enough the kernel would split them in small pages. > Ideally it will require to cache the __malloc_thp_mode, so we avoid the non > required mprotected calls, similar to what we need on malloc do_set_hugetlb > (it also assumes that once the programs calls the initial malloc, any system > wide change to THP won't take effect). Very good point. Did not think about this before.