From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by sourceware.org (Postfix) with ESMTPS id CB13A3858D20 for ; Fri, 14 Apr 2023 11:29:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB13A3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33E2uvpt023182; Fri, 14 Apr 2023 11:29:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=references : from : to : cc : subject : in-reply-to : date : message-id : content-type : mime-version; s=corp-2023-03-30; bh=mHLm2Vl4fvxJKhUuFrM0jmMdUmQn/z2dkpgsQhMam08=; b=TSxtt2Q/H1GFOcVnoAGFUH10BNX0XONR9y4F5aptoYn/PDwSSB5mhFeY10DD7rmTYnTB ei/VSoLhIWLZseIh2gQ3TkBV4Sepr1t2AwBbNDVyuMwlSCzNC9PtmqcJBhXxX+GwWDbJ 49NTgWn7CoY6t0V+YpEGDpcyCncQXGg2TxHAhBHpOKk3GcOUs4ak2uEJr6yyf71au8FY RJX2nt4naMy+eBTnfE5GNh+6P1b339Pvk91P9YOBwRScqjgsNqpvEdUYEz+OBC07f4pf k3z15IPil+5qYjd06TVxMgT3dwegkGenV0iL3Ut8vB/oslciUMZG/vt7DJ9og4LzELLQ jw== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3pu0bwnxjm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 14 Apr 2023 11:29:01 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33E9mM2X040349; Fri, 14 Apr 2023 11:29:01 GMT Received: from nam11-dm6-obe.outbound.protection.outlook.com (mail-dm6nam11lp2172.outbound.protection.outlook.com [104.47.57.172]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3puw8bsy7a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 14 Apr 2023 11:29:00 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QHOmoxbNtewbHC6w1kUQVzHjuRXvs10LuAWdsieFfF9aydqEIaNiJskdZq0u2leqMhKScQ/IcuXT19BJm8hsDnY3h7LiFvUw3ADV9gAfnvLSPB9Wpe1XQJLIUUoZWahkVAA/kBr7cxDburu3nIxn4jDXoUYpjpI82ep78fe+hG3A0YS94jODrBy+ljuBKBS9Wa3dwion3sJmapaaD7SpQX2ZBKV9o6alhq+HofTWlGcgz2HoZNZVvS1P3szInucD+bSntnssCOeJJgxAQU3Rb+MKChf8KvqxdQoe2bdG4cfTZVs6vXjprNm4+IHaxMZx5MYLRVfL3xm+1oGt0LIFQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mHLm2Vl4fvxJKhUuFrM0jmMdUmQn/z2dkpgsQhMam08=; b=kB67Rg3+6FypYhFbiqNx9J855dtr3Recuvz0wHYl5L6yULxqWBGTdTZSqluCgLsWZgZfBr6GFGUsVZipGNtfgxNxIWGFh72T1TsO3jxcnmmED5ykIgXd0K4S9HLHbxyiXIzJYbiPaApvmojUHg4Jz9Bl1E2W+HfdDzoVbnB8EL84egSD4RZx6k+NNAk31JfBRERsM1U9y4j/uE2IVmtS/CzErgnJq+1RrSnYlxCyxeiqo3JnKEfDKIeqSfMxBA3ne1Jom4yG/JULFDh2ANKdCOXISBgQ4IOB4u+X+LMvuidMApmFdIQmdYQTjrxIvfiuVXz8Q5CsBNpdxTJHYLFSxA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mHLm2Vl4fvxJKhUuFrM0jmMdUmQn/z2dkpgsQhMam08=; b=yGwzhyxDDVtS8mmTqmwUThatx+S6IrDbWzm+/bIEn9XizBlISiEGDeIIOI0bUhokeCG0FsXWP8usyF40lVfize5zC3gQrFslNh7IwmPYhk+AavDhwC/NBIkuldkeKkGD+p0367JroIucyjhRqZh8JmgGnDcFy2EuyQQWjXEcIjw= Received: from BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) by CO1PR10MB4514.namprd10.prod.outlook.com (2603:10b6:303:9d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.30; Fri, 14 Apr 2023 11:28:58 +0000 Received: from BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::4998:96fd:86b1:a388]) by BN6PR1001MB2340.namprd10.prod.outlook.com ([fe80::4998:96fd:86b1:a388%6]) with mapi id 15.20.6277.036; Fri, 14 Apr 2023 11:28:57 +0000 References: <87mt3bda4s.fsf@oracle.com> <7fd2f7b1-73d3-1bc3-cabf-c67d1930cefc@linaro.org> User-agent: mu4e 1.4.15; emacs 28.1 From: Cupertino Miranda To: Adhemerval Zanella Netto Cc: Wilco Dijkstra , 'GNU C Library' Subject: Re: [PATCH v5 1/1] Created tunable to force small pages on stack allocation. In-reply-to: <7fd2f7b1-73d3-1bc3-cabf-c67d1930cefc@linaro.org> Date: Fri, 14 Apr 2023 12:28:53 +0100 Message-ID: <878reud7nu.fsf@oracle.com> Content-Type: multipart/mixed; boundary="=-=-=" X-ClientProxiedBy: AM9P250CA0029.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:21c::34) To BN6PR1001MB2340.namprd10.prod.outlook.com (2603:10b6:405:30::36) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN6PR1001MB2340:EE_|CO1PR10MB4514:EE_ X-MS-Office365-Filtering-Correlation-Id: 884c5125-d08a-4fdb-0873-08db3cdb6fe8 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5PbkEqTVFy7AmzVhs6lwQo0wmxD/sNCOwsAi7FtyUkuxyh1e7Ys4WmCBdAvAE6Go7dZkrUD6+VcVGC12B+8ziUJcZPPxL+i7nSQVfHfNeCvYm2SpAte/fdq9KrGrZ5SeKbQJOXOxgagW3lduJ9HcRn+nPMkvNSQiaIav6k0VSvzCWn7MmcNeqbi6p43Rbp0un7L4MT0s2Lk0R2vI/ZppHKQktFaPc9tJ+6R/xlniFG7o+rZl2ZJZgXHOKdS1L8eUAza4nixkaem7v4UpWyGeDoxvcHB5urUE1REUuWCGpVdDI0u2JXpj7Tg4JkvVpGMpFZT5UbAy2uhi8md/3NNZQf3FoP+UMDB2PymA5cys/b/Bo1A3Gd6m+AhN3+MG7DlPRYeVDINOzBQRMkFb+GS8L6lAslsSti+QCBvnYN1NWHgzOdKbvScJQwe7NVVf/4R7X09Ap+xSJpJ56i5Sei+ESvPynI/COq66kADGqX+u9g3YL7/sg14mhoENlx+JQZfcn7Jhn1qYanS3LWTXRGXRPMWa58LWFJheheYG32bPyYUp745JGeVPPL0pE+4XnBQx X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BN6PR1001MB2340.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(376002)(346002)(396003)(136003)(39860400002)(366004)(451199021)(38100700002)(5660300002)(36756003)(2906002)(44832011)(316002)(8676002)(86362001)(8936002)(41300700001)(66556008)(66476007)(66946007)(6916009)(4326008)(6506007)(83380400001)(2616005)(54906003)(186003)(6512007)(478600001)(53546011)(6666004)(6486002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?FZ/wyeSSR4DKeDPFcznbFWt6g8QoVRSOtdnJEuhJAUoma6mK1ien3H0eRpM2?= =?us-ascii?Q?LBeQX4GbfTv8VELjB22Ayjl5A2vwjxRw/2vqgKvMUkOUPVdSEqPi8lg+MpY0?= =?us-ascii?Q?Vi9ent1efcfresYDm5+1IxC/eLqPGWHiJKFmqzQIYzLse2UMqIRc8JoOpt3A?= =?us-ascii?Q?xNkfM2Xz/i02CgLzdomTZkzgv/agG8wg7+9h8JVjWZH6T/Na/F2HKlcMrqOT?= =?us-ascii?Q?HLb5yBwkEF28b9Q1t/2fEZX+j7H10qaeYhaTI5oEDnVIwgLGJU3rpf9KRQ2K?= =?us-ascii?Q?2+DisTuL+kSIVXECDo2cpvioLUd2AUxremO6pP4cn01XUKZDql55ClUt5mTQ?= =?us-ascii?Q?WMhFn0bxKW2bRQbOQ/BcQyVaWGzz1u+4AW2ou/SSV+TyaIirVMk+j9ybk1le?= =?us-ascii?Q?n3cFzTca5b8cwVd4i/t8dhHFn1bh6uRhj5KbmDHU0i8z47eSt1mJQyd5iNYu?= =?us-ascii?Q?GRmrtM0DwotIzp5yrPQRH01mrAonQe165S03cQPmy2wttBXIHISE9wRHMk4a?= =?us-ascii?Q?6PuJnftZtbTjB1mS39vPFgJIu9NwiZT1Dk0Gl4MKc2PP2yRrnBu1n/Dnf8PI?= =?us-ascii?Q?PJGxxk0VzUkxkX8UweQTPN6/ZOqHXHOI3qT7y9odbvARu5I/QEGLL5obvzKf?= =?us-ascii?Q?O8GmYngzBIdIHmxqIHC3sGCNaaRUMulFMM6euJOmT/3qSkTbgyjU3isiQj7W?= =?us-ascii?Q?LIWZ5lJ53FhUG5Y7YolNFy4/OwjXfFBWir+JJJnpce2ZB6jrSnGScxfueRn0?= =?us-ascii?Q?9Cg9V/bmr67UBelUJoJ+mMfLlQF0iMLeFxNPJsVSv49NypWqNyDCv+OzbapO?= =?us-ascii?Q?xyHigfefObkbQ4P0ja8LY9ueyxcy+eYiLzDvADXGaCENcb6vr30E0rlyQqKp?= =?us-ascii?Q?Mbk9xHPwVUCH4YUfXRMkecUCE+srQhI197EawDzoL5Q+822k/FTXpJ/Wf9D3?= =?us-ascii?Q?7NwJEdlRSPiq4C8JotxoEEbmP/8HmgbIIoVe261zjC3wiAbAERWgiucsRJRT?= =?us-ascii?Q?UKKNkzieZhqwaqSUM0u6aNVM1FMuaRSOaIJhCseuSz8/qQbnvsJKpDRQIEPS?= =?us-ascii?Q?CUC5BK/xfVPvD3eZ2107XeSx+BUzHOKoLHSbJrDWuLG/+1zAEUUnDNwa4k3T?= =?us-ascii?Q?nqLtXc77R1mm/2q8yxOCZ6rAnqz2QNMnGk0UlRVrFcQzDZsrz2MNPypf47b+?= =?us-ascii?Q?XBRJTJ4A6Nlj5rkfWrGNMmXRxmJm3QoT2DP6WysMtsHmUdDrO6cPh/bLINDF?= =?us-ascii?Q?A43Pqj2cw33HxFO/icmYE6p4PaGW+hHjtBqIU/GC7QpFmIrJhEC7Bu2W+IQD?= =?us-ascii?Q?GqxuIzU3a54yaOCG87eMjDn+/s2C3Ep6AgguoWKCu6lIeS8eVaYVx6qHjMnV?= =?us-ascii?Q?oO69wa6kPxWCdZUBwR/6uXyGHucGuXxjgJgE+jdTqc4gG+RLUiOByZbNInHB?= =?us-ascii?Q?mxS/lhhaIO9laPuzanpkDvfGSrPSDW7rta9eoq0cAm/kDgLwQ/Ra5hp3TNvx?= =?us-ascii?Q?8MjzOD2wgyphxW5sdqhSi/W8L7cFSUdSx2lDmLKBpzpciLF8lvFTFvQUTcBp?= =?us-ascii?Q?0+2SaO2NHUrMSi2YeqjGmd16kat/bf9AA1U0fAjx9UnG0Te9aWY1EoqlE44t?= =?us-ascii?Q?IqpqBimOKf5Wq9OfP+lKHwk=3D?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: mmdlHrNMPmzZRVc/+dLLP8omOvQwfStITQjOA9cmaJpKSwZ0XLkZhCefLYKH8jFFpYXh0VWhPNZ7/ZFovPH3QndwWN+7YSt+QZj1jo35UDRJrnRa8wOkvYrUQaabWq5Ob8JmxezaYAzxgb0o8U4bYRtDwZdj2gss5HYYjYuLacXsTP/OmAZGJXrRX4cNK1T6vxXs/EXkdD75AB+mGEAk3iDCKGluH6/6Q+3cMCjz8nk0UdkhRNKkFnY7QFB0XFVcV5A3afs401gGQ8mLXYDKROfo03jCBfDHouqWXFqjRnbWEGvn77EcgBnWwr3GVGkb1+YumYu1U9vGvyloILvH6gsgQSJUKE7/B6B3zpWPMreY61VB7rfYaUJ+xfYewp6DuSNGy/0/etshE62Vw59mNivRHl/BfwpmuHN9G75CRGBMb9/l+jdSX+yP99qGWPUHwkzO7a84nnC1ulxaJhkDW4qsYs1OdmQkaoSnqGp6uDilQFYorpresfb4fhSGVKydcJxdG+Pr/yss/XPTOObLv/MsYSnD+FF42Vjj9sliMTxg79HiZzhW7VYYszUakMLgSZYtDjBVPLsRLkOMISyNTe/MlH/fTFK81dYFautKOey5TJmtOk5NTROe0XoOEgZFGn/dv+/ax+CTuYL24R1EFQHxrWbuVLXflIY7cZWB39D3r7AU+W+tSrHks7jva0xeQL7JrRXSDljTMnWGcbWZXFCjS7nNr1NN4AYrNfWD19stpz+J1cNY7c2SSChvfKVIvJ3XXd5gGXqqwTt+W5ez+mqvWAToqO6GGrURTVRA/VLJon7Qinxo1chYFkl0IM5M X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 884c5125-d08a-4fdb-0873-08db3cdb6fe8 X-MS-Exchange-CrossTenant-AuthSource: BN6PR1001MB2340.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Apr 2023 11:28:57.9063 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Y5riRtkXJWGdWAjAUjKE2jY0GkiNPIzeztzKlNCgtl5V6bWC0dR2P8tQYI+8KymCYrMcy5U8qaNTKwonl2jDGWEnFNBDoC+6Cr4vx1RoSCs= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1PR10MB4514 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-14_06,2023-04-14_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 phishscore=0 mlxscore=0 malwarescore=0 suspectscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304140104 X-Proofpoint-ORIG-GUID: ElBmL3iMmTMgfGPHQxAZ_qacQlM8O1vi X-Proofpoint-GUID: ElBmL3iMmTMgfGPHQxAZ_qacQlM8O1vi X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --=-=-= Content-Type: text/plain Hi Adhemerval, If my memory does not betray me, I believe the reason why the hugepages would not be relevant was because the mprotect to the guard region would enforce different protection on part of the huge page. In that sense you could test if the guard region is a part of the of the first page. If it is it would mean that the page would be broken in smaller parts and it should advise for smaller pages. While debugging I realized as well you were chacking for allignment of "mem" however for the common case STACK_GROWS_DOWN, the relevant allignment to check is the start of the stack (end of the memory allocated). I have included your patch with some changes which I think make a reasonable estimation. In any case, I would like to keep the tunable, as I think it is armless and makes sense. BTW, you mentioned that you would install the tunable patch in a previous email. Maybe you forgot. I am waiting for it to do some internal backporting. ;-) Regards, Cupertino --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=adhemerval_1.patch diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index c7adbccd6f..ad718cfb4a 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -33,6 +33,7 @@ #include #include #include +#include /* Default alignment of stack. */ #ifndef STACK_ALIGN @@ -206,6 +207,38 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize) #endif } +/* If the Transparent Huge Page (THP) is set as 'always', the resulting + address and the stack size are multiple of THP size, kernel may use THP for + the thread stack. However, if the guard page size is not multiple of THP, + once it is mprotect the allocate range could no longer be served with THP + and then kernel will revert back using default page sizes. + + However, the kernel might also not keep track of the offsets within the THP + that has been touched and need to reside on the memory. It will then keep + all the small pages, thus using much more memory than required. In this + scenario, it is better to just madvise that not use huge pages and avoid + the memory bloat. */ +static __always_inline int +advise_thp (void *mem, size_t size, char *guardpos) +{ + enum malloc_thp_mode_t thpmode = __malloc_thp_mode (); + unsigned long int thpsize = __malloc_default_thp_pagesize (); + +#if _STACK_GROWS_DOWN + char *stack_start_addr = mem + size; +#elif _STACK_GROWS_UP + char *stack_start_addr = mem; +#endif + + if (thpmode != malloc_thp_mode_always) + return 0; + if ((((uintptr_t) stack_start_addr) % thpsize) != 0 + || (((uintptr_t) mem / thpsize) != ((uintptr_t) guardpos / thpsize))) + return 0; + + return __madvise (mem, size, MADV_NOHUGEPAGE); +} + /* Returns a usable stack for a new thread either by allocating a new stack or reusing a cached stack of sufficient size. ATTR must be non-NULL and point to a valid pthread_attr. @@ -385,6 +418,13 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, - TLS_PRE_TCB_SIZE); #endif + char *guardpos = guard_position (mem, size, guardsize, + pd, pagesize_m1); + int r = advise_thp (mem, size, guardpos); + if (r != 0) + return r; + + /* Now mprotect the required region excluding the guard area. */ if (__glibc_likely (guardsize > 0)) { diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h index d68b85630c..21d4844bc4 100644 --- a/sysdeps/generic/malloc-hugepages.h +++ b/sysdeps/generic/malloc-hugepages.h @@ -26,6 +26,7 @@ unsigned long int __malloc_default_thp_pagesize (void) attribute_hidden; enum malloc_thp_mode_t { + malloc_thp_mode_unknown, malloc_thp_mode_always, malloc_thp_mode_madvise, malloc_thp_mode_never, diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c index 740027ebfb..15b862b0bf 100644 --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c @@ -22,19 +22,33 @@ #include #include +/* The __malloc_thp_mode is called only in single-thread mode, either in + malloc initialization or pthread creation. */ +static unsigned long int thp_pagesize = -1; + unsigned long int __malloc_default_thp_pagesize (void) { + unsigned long int size = atomic_load_relaxed (&thp_pagesize); + if (size != -1) + return size; + int fd = __open64_nocancel ( "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY); if (fd == -1) - return 0; + { + atomic_store_relaxed (&thp_pagesize, 0); + return 0; + } char str[INT_BUFSIZE_BOUND (unsigned long int)]; ssize_t s = __read_nocancel (fd, str, sizeof (str)); __close_nocancel (fd); if (s < 0) - return 0; + { + atomic_store_relaxed (&thp_pagesize, 0); + return 0; + } unsigned long int r = 0; for (ssize_t i = 0; i < s; i++) @@ -44,16 +58,28 @@ __malloc_default_thp_pagesize (void) r *= 10; r += str[i] - '0'; } + atomic_store_relaxed (&thp_pagesize, r); return r; } +/* The __malloc_thp_mode is called only in single-thread mode, either in + malloc initialization or pthread creation. */ +static enum malloc_thp_mode_t thp_mode = malloc_thp_mode_unknown; + enum malloc_thp_mode_t __malloc_thp_mode (void) { + enum malloc_thp_mode_t mode = atomic_load_relaxed (&thp_mode); + if (mode != malloc_thp_mode_unknown) + return mode; + int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled", O_RDONLY); if (fd == -1) - return malloc_thp_mode_not_supported; + { + atomic_store_relaxed (&thp_mode, malloc_thp_mode_not_supported); + return malloc_thp_mode_not_supported; + } static const char mode_always[] = "[always] madvise never\n"; static const char mode_madvise[] = "always [madvise] never\n"; @@ -67,13 +93,19 @@ __malloc_thp_mode (void) if (s == sizeof (mode_always) - 1) { if (strcmp (str, mode_always) == 0) - return malloc_thp_mode_always; + mode = malloc_thp_mode_always; else if (strcmp (str, mode_madvise) == 0) - return malloc_thp_mode_madvise; + mode = malloc_thp_mode_madvise; else if (strcmp (str, mode_never) == 0) - return malloc_thp_mode_never; + mode = malloc_thp_mode_never; + else + mode = malloc_thp_mode_not_supported; } - return malloc_thp_mode_not_supported; + else + mode = malloc_thp_mode_not_supported; + + atomic_store_relaxed (&thp_mode, mode); + return mode; } static size_t --=-=-= Content-Type: text/plain Adhemerval Zanella Netto writes: > On 13/04/23 13:23, Cupertino Miranda wrote: >> >> Hi Wilco, >> >> Exactly my remark on the patch. ;) >> >> I think the tunable is benefitial when we care to allocate hugepages for >> malloc, etc. But still be able to force small pages for stack. >> >> Imagine a scenario were you create lots of threads. Most threads >> barelly use any stack, however there is one that somehow requires a lot >> of it to do some crazy recursion. :) >> >> Most likely the heuristic would detect that hugepages would be useful >> based on the stack size requirement, but it would never predict that it >> only brings any benefit to 1% of the threads created. > > The problem is not find when hugepages is beneficial, but rather when > using will incur in falling back to default pages. And re-reading the > THP kernel docs and after some experiment, I am not sure it is really > possible to come up with good heuristics to do so (not without poking > in khugepaged stats). > > For instance, if guard size is 0 THP will still backup the thread stack. > However, if you force stack alignment by issuing multiple mmaps; the > khugepaged won't have available VMA and thus won't use THP (using your > example to force the mmap alignment in thread creation). > > I think my proposal will end with very limited and complicated > heuristic (specially because khugepaged have various tunable itself), > I agree that the tunable is a better strategy. > >> >> Regards, >> Cupertino >> >> >> >> Wilco Dijkstra writes: >> >>> Hi Adhemerval, >>> >>> I agree doing this automatically sounds like a better solution. >>> However: >>> >>> +static __always_inline int >>> +advise_thp (void *mem, size_t size, size_t guardsize) >>> +{ >>> + enum malloc_thp_mode_t thpmode = __malloc_thp_mode (); >>> + if (thpmode != malloc_thp_mode_always) >>> + return 0; >>> + >>> + unsigned long int thpsize = __malloc_default_thp_pagesize (); >>> + if ((uintptr_t) mem % thpsize != 0 >>> + || size % thpsize != 0 >>> + || (size - guardsize) % thpsize != 0) >>> + return 0; >>> >>> Isn't the last part always true currently given the guard page size is based on >>> the standard page size? IIRC the issue was the mmap succeeds but the guard >>> page is taken from the original mmap which then causes the decomposition. >>> >>> So you'd need something like: >>> >>> || guardsize % thpsize == 0) >>> >>> Ie. we return without the madvise if the size and alignment is wrong for a huge >>> page or it is correct and the guardsize is a multiple of a huge page (in which >>> case it shouldn't decompose). >>> >>> + return __madvise (mem, size, MADV_NOHUGEPAGE); >>> +} >>> >>> Cheers, >>> Wilco --=-=-=--