From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by sourceware.org (Postfix) with ESMTPS id 802D43858D28 for ; Fri, 11 Aug 2023 17:47:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 802D43858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167074.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37BHjekD018735 for ; Fri, 11 Aug 2023 13:47:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pps01; bh=3jo7lpINzreHyp5Vlosat2favjAzb1+DpzakpmxSbRU=; b=rBsG0LV3RwpVM2ssvlrIdDoXe8Hy5L5vKW8oOubqJ7QNh/hlENSQcrGkTowvVtp/Unk8 X6ZArlAg/DnB6KTJlO6lFCretJaNYWQLej2+jJuLkjKAyPqSoNOXwo9n56jrVN5+nCKV F1631pcN3QftTw5gXqHPR7gb8vmAdX7oYDAq2HD7DwY5x+MXN19G9kOL5wPZFHcE18s1 URFQLur0cXhSF7ePZGooKSlmsW1DjzMpZDhG6fM3d9u8WHMY9XnV+97rBFmo9Og/TEVE PhS6P2CwXO00abSaPWvxtdMjceG5FF1tEzOUJhxdAbtvXBWfPmdK6xOLIrfsS27kDSDO Bw== Received: from mail-vs1-f70.google.com (mail-vs1-f70.google.com [209.85.217.70]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3sd9j1mych-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 11 Aug 2023 13:47:44 -0400 Received: by mail-vs1-f70.google.com with SMTP id ada2fe7eead31-44748fe0108so905806137.1 for ; Fri, 11 Aug 2023 10:47:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691776063; x=1692380863; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3jo7lpINzreHyp5Vlosat2favjAzb1+DpzakpmxSbRU=; b=iO+nkd5jnetoVyN7fdhcgmuogEK/TS81Vdmy+UogqtTCE2Cz4EhT13uASiI3fpH6V+ 8KnNM/zJxa36bm312DD4EuvrBwMhi72whcmuETEP5C9GaEmX0QvrLCn5/MM7zFbXeX4A 8B0I5ErmdO/GGkfpBVovczshVRELxTLT3SqRozs4HoYcrOPLVUx8ZEuvLvMleCahruB2 nBZtyoI6AFFuYqiHIefFZPy8RiWw5iqG8egP5hBRxpdX1rDHnd8GmblGrgrEfqg4PUeb SKAQQ0QSTXO28dQRGskTnXkHz5nyRZNEkc9E1BhnxFpKeOzLZz1PL1HFP3ydMom4LF4h GgaA== X-Gm-Message-State: AOJu0YyvZB10+wnUMXqsU2tvhMeQux1ogzq8wUbAVhP9SB8dZpE1Povb hXJHac614ngQ0SymXtEWAB8lrvWjOadTuVz5qVpr+DWVOBSJw+ze+jZdtjPr6Or53CPCYzLyZDo IsA6tNL3dcV/jsTs= X-Received: by 2002:a05:6102:301b:b0:443:66b9:b8f0 with SMTP id s27-20020a056102301b00b0044366b9b8f0mr2510692vsa.10.1691776063176; Fri, 11 Aug 2023 10:47:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEohBMaq/NLoqndbGYE5EaOUC2v062cKLtcab3LgXMpz4lx68N4TOrkAkhoujpbsT+kEOc6hQ== X-Received: by 2002:a05:6102:301b:b0:443:66b9:b8f0 with SMTP id s27-20020a056102301b00b0044366b9b8f0mr2510648vsa.10.1691776062554; Fri, 11 Aug 2023 10:47:42 -0700 (PDT) Received: from Ericcs-MBP.cable.rcn.com (207-38-164-63.s9254.c3-0.43d-cbr1.qens-43d.ny.cable.rcncustomer.com. [207.38.164.63]) by smtp.gmail.com with ESMTPSA id u8-20020a0cdd08000000b006360778f314sm1348180qvk.105.2023.08.11.10.47.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 11 Aug 2023 10:47:42 -0700 (PDT) From: Eric Feng To: dmalcom@redhat.com Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng Subject: [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646] Date: Fri, 11 Aug 2023 13:47:24 -0400 Message-Id: <20230811174724.12604-1-ef2648@columbia.edu> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <48d2a18cd58bad40757ede4a8f6e46eeeff1133d.camel@redhat.com> References: <48d2a18cd58bad40757ede4a8f6e46eeeff1133d.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Proofpoint-GUID: v7KocMdbOp-2fdu6JIZm4PUILhiVPQwk X-Proofpoint-ORIG-GUID: v7KocMdbOp-2fdu6JIZm4PUILhiVPQwk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-11_09,2023-08-10_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=10 malwarescore=0 priorityscore=1501 mlxlogscore=999 clxscore=1015 bulkscore=10 adultscore=0 phishscore=0 spamscore=0 lowpriorityscore=10 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308110163 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thanks for the feedback! I've incorporated the changes (aside from expanding test coverage, which I plan on releasing in a follow-up), rebased, and performed a bootstrap and regtest on aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk with nits fixed and no problems after rebase, the patch has now been pushed. Best, Eric --- This patch adds known function subclasses for Python/C API functions PyList_New, PyLong_FromLong, and PyList_Append. It also adds new optional parameters for region_model::get_or_create_region_for_heap_alloc, allowing for the newly allocated region to immediately transition from the start state to the assumed non-null state in the malloc state machine if desired. Finally, it adds a new procedure, dg-require-python-h, intended as a directive in Python-related analyzer tests, to append necessary Python flags during the tests' build process. The main warnings we gain in this patch with respect to the known function subclasses mentioned are leak related. For example: rc3.c: In function ‘create_py_object’: │ rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak] │ 21 | return list; │ | ^~~~ │ ‘create_py_object’: events 1-4 │ | │ | 4 | PyObject* item = PyLong_FromLong(10); │ | | ^~~~~~~~~~~~~~~~~~~ │ | | | │ | | (1) allocated here │ | | (2) when ‘PyLong_FromLong’ succeeds │ | 5 | PyObject* list = PyList_New(2); │ | | ~~~~~~~~~~~~~ │ | | | │ | | (3) when ‘PyList_New’ fails │ |...... │ | 21 | return list; │ | | ~~~~ │ | | | │ | | (4) ‘item’ leaks here; was allocated at (1) │ Some concessions were made to simplify the analysis process when comparing kf_PyList_Append with the real implementation. In particular, PyList_Append performs some optimization internally to try and avoid calls to realloc if possible. For simplicity, we assume that realloc is called every time. Also, we grow the size by just 1 (to ensure enough space for adding a new element) rather than abide by the heuristics that the actual implementation follows. gcc/analyzer/ChangeLog: PR analyzer/107646 * call-details.h: New function. * region-model.cc (region_model::get_or_create_region_for_heap_alloc): New optional parameters. * region-model.h (class region_model): New optional parameters. * sm-malloc.cc (on_realloc_with_move): New function. (region_model::transition_ptr_sval_non_null): New function. gcc/testsuite/ChangeLog: PR analyzer/107646 * gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for PyList_New, PyList_Append, PyLong_FromLong * gcc.dg/plugin/plugin.exp: New test. * lib/target-supports.exp: New procedure. * gcc.dg/plugin/cpython-plugin-test-2.c: New test. Signed-off-by: Eric Feng --- gcc/analyzer/call-details.h | 4 + gcc/analyzer/region-model.cc | 17 +- gcc/analyzer/region-model.h | 14 +- gcc/analyzer/sm-malloc.cc | 42 + .../gcc.dg/plugin/analyzer_cpython_plugin.c | 722 ++++++++++++++++++ .../gcc.dg/plugin/cpython-plugin-test-2.c | 78 ++ gcc/testsuite/gcc.dg/plugin/plugin.exp | 3 +- gcc/testsuite/lib/target-supports.exp | 25 + 8 files changed, 899 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h index 24be2247e63..bf2601151ea 100644 --- a/gcc/analyzer/call-details.h +++ b/gcc/analyzer/call-details.h @@ -49,6 +49,10 @@ public: return POINTER_TYPE_P (get_arg_type (idx)); } bool arg_is_size_p (unsigned idx) const; + bool arg_is_integral_p (unsigned idx) const + { + return INTEGRAL_TYPE_P (get_arg_type (idx)); + } const gcall *get_call_stmt () const { return m_call; } location_t get_location () const; diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 094b7af3dbc..aa9fe008b9d 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -4991,11 +4991,16 @@ region_model::check_dynamic_size_for_floats (const svalue *size_in_bytes, Use CTXT to complain about tainted sizes. Reuse an existing heap_allocated_region if it's not being referenced by - this region_model; otherwise create a new one. */ + this region_model; otherwise create a new one. + + Optionally (update_state_machine) transitions the pointer pointing to the + heap_allocated_region from start to assumed non-null. */ const region * region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes, - region_model_context *ctxt) + region_model_context *ctxt, + bool update_state_machine, + const call_details *cd) { /* Determine which regions are referenced in this region_model, so that we can reuse an existing heap_allocated_region if it's not in use on @@ -5017,6 +5022,14 @@ region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes, if (size_in_bytes) if (compat_types_p (size_in_bytes->get_type (), size_type_node)) set_dynamic_extents (reg, size_in_bytes, ctxt); + + if (update_state_machine && cd) + { + const svalue *ptr_sval + = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg); + transition_ptr_sval_non_null (ctxt, ptr_sval); + } + return reg; } diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h index 0cf38714c96..a8acad8b7b2 100644 --- a/gcc/analyzer/region-model.h +++ b/gcc/analyzer/region-model.h @@ -387,9 +387,12 @@ class region_model region_model_context *ctxt, rejected_constraint **out); - const region * - get_or_create_region_for_heap_alloc (const svalue *size_in_bytes, - region_model_context *ctxt); + const region * + get_or_create_region_for_heap_alloc (const svalue *size_in_bytes, + region_model_context *ctxt, + bool update_state_machine = false, + const call_details *cd = nullptr); + const region *create_region_for_alloca (const svalue *size_in_bytes, region_model_context *ctxt); void get_referenced_base_regions (auto_bitmap &out_ids) const; @@ -476,6 +479,11 @@ class region_model const svalue *old_ptr_sval, const svalue *new_ptr_sval); + /* Implemented in sm-malloc.cc. */ + void + transition_ptr_sval_non_null (region_model_context *ctxt, + const svalue *new_ptr_sval); + /* Implemented in sm-taint.cc. */ void mark_as_tainted (const svalue *sval, region_model_context *ctxt); diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc index a8c63eb1ce8..ec763254b29 100644 --- a/gcc/analyzer/sm-malloc.cc +++ b/gcc/analyzer/sm-malloc.cc @@ -434,6 +434,11 @@ public: const svalue *new_ptr_sval, const extrinsic_state &ext_state) const; + void transition_ptr_sval_non_null (region_model *model, + sm_state_map *smap, + const svalue *new_ptr_sval, + const extrinsic_state &ext_state) const; + standard_deallocator_set m_free; standard_deallocator_set m_scalar_delete; standard_deallocator_set m_vector_delete; @@ -2504,6 +2509,17 @@ on_realloc_with_move (region_model *model, NULL, ext_state); } +/* Hook for get_or_create_region_for_heap_alloc for the case when we want + ptr_sval to mark a newly created region as assumed non null on malloc SM. */ +void +malloc_state_machine::transition_ptr_sval_non_null (region_model *model, + sm_state_map *smap, + const svalue *new_ptr_sval, + const extrinsic_state &ext_state) const +{ + smap->set_state (model, new_ptr_sval, m_free.m_nonnull, NULL, ext_state); +} + } // anonymous namespace /* Internal interface to this file. */ @@ -2548,6 +2564,32 @@ region_model::on_realloc_with_move (const call_details &cd, *ext_state); } +/* Moves ptr_sval from start to assumed non-null, for use by + region_model::get_or_create_region_for_heap_alloc. */ +void +region_model::transition_ptr_sval_non_null (region_model_context *ctxt, +const svalue *ptr_sval) +{ + if (!ctxt) + return; + const extrinsic_state *ext_state = ctxt->get_ext_state (); + if (!ext_state) + return; + + sm_state_map *smap; + const state_machine *sm; + unsigned sm_idx; + if (!ctxt->get_malloc_map (&smap, &sm, &sm_idx)) + return; + + gcc_assert (smap); + gcc_assert (sm); + + const malloc_state_machine &malloc_sm = (const malloc_state_machine &)*sm; + + malloc_sm.transition_ptr_sval_non_null (this, smap, ptr_sval, *ext_state); +} + } // namespace ana #endif /* #if ENABLE_ANALYZER */ diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c index 9ecc42d4465..7cd72e8a886 100644 --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c @@ -55,6 +55,8 @@ static GTY (()) hash_map *analyzer_stashed_globals; namespace ana { static tree pyobj_record = NULL_TREE; +static tree pyobj_ptr_tree = NULL_TREE; +static tree pyobj_ptr_ptr = NULL_TREE; static tree varobj_record = NULL_TREE; static tree pylistobj_record = NULL_TREE; static tree pylongobj_record = NULL_TREE; @@ -76,6 +78,714 @@ get_field_by_name (tree type, const char *name) return NULL_TREE; } +static const svalue * +get_sizeof_pyobjptr (region_model_manager *mgr) +{ + tree size_tree = TYPE_SIZE_UNIT (pyobj_ptr_tree); + const svalue *sizeof_sval = mgr->get_or_create_constant_svalue (size_tree); + return sizeof_sval; +} + +/* Update MODEL to set OB_BASE_REGION's ob_refcnt to 1. */ +static void +init_ob_refcnt_field (region_model_manager *mgr, region_model *model, + const region *ob_base_region, tree pyobj_record, + const call_details &cd) +{ + tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt"); + const region *ob_refcnt_region + = mgr->get_field_region (ob_base_region, ob_refcnt_tree); + const svalue *refcnt_one_sval + = mgr->get_or_create_int_cst (size_type_node, 1); + model->set_value (ob_refcnt_region, refcnt_one_sval, cd.get_ctxt ()); +} + +/* Update MODEL to set OB_BASE_REGION's ob_type to point to + PYTYPE_VAR_DECL_PTR. */ +static void +set_ob_type_field (region_model_manager *mgr, region_model *model, + const region *ob_base_region, tree pyobj_record, + tree pytype_var_decl_ptr, const call_details &cd) +{ + const region *pylist_type_region + = mgr->get_region_for_global (pytype_var_decl_ptr); + tree pytype_var_decl_ptr_type + = build_pointer_type (TREE_TYPE (pytype_var_decl_ptr)); + const svalue *pylist_type_ptr_sval + = mgr->get_ptr_svalue (pytype_var_decl_ptr_type, pylist_type_region); + tree ob_type_field = get_field_by_name (pyobj_record, "ob_type"); + const region *ob_type_region + = mgr->get_field_region (ob_base_region, ob_type_field); + model->set_value (ob_type_region, pylist_type_ptr_sval, cd.get_ctxt ()); +} + +/* Retrieve the "ob_base" field's region from OBJECT_RECORD within + NEW_OBJECT_REGION and set its value in the MODEL to PYOBJ_SVALUE. */ +static const region * +get_ob_base_region (region_model_manager *mgr, region_model *model, + const region *new_object_region, tree object_record, + const svalue *pyobj_svalue, const call_details &cd) +{ + tree ob_base_tree = get_field_by_name (object_record, "ob_base"); + const region *ob_base_region + = mgr->get_field_region (new_object_region, ob_base_tree); + model->set_value (ob_base_region, pyobj_svalue, cd.get_ctxt ()); + return ob_base_region; +} + +/* Initialize and retrieve a region within the MODEL for a PyObject + and set its value to OBJECT_SVALUE. */ +static const region * +init_pyobject_region (region_model_manager *mgr, region_model *model, + const svalue *object_svalue, const call_details &cd) +{ + const region *pyobject_region = model->get_or_create_region_for_heap_alloc ( + NULL, cd.get_ctxt (), true, &cd); + model->set_value (pyobject_region, object_svalue, cd.get_ctxt ()); + return pyobject_region; +} + +/* Increment the value of FIELD_REGION in the MODEL by 1. Optionally + capture the old and new svalues if OLD_SVAL and NEW_SVAL pointers are + provided. */ +static void +inc_field_val (region_model_manager *mgr, region_model *model, + const region *field_region, const tree type_node, + const call_details &cd, const svalue **old_sval = nullptr, + const svalue **new_sval = nullptr) +{ + const svalue *tmp_old_sval + = model->get_store_value (field_region, cd.get_ctxt ()); + const svalue *one_sval = mgr->get_or_create_int_cst (type_node, 1); + const svalue *tmp_new_sval = mgr->get_or_create_binop ( + type_node, PLUS_EXPR, tmp_old_sval, one_sval); + + model->set_value (field_region, tmp_new_sval, cd.get_ctxt ()); + + if (old_sval) + *old_sval = tmp_old_sval; + + if (new_sval) + *new_sval = tmp_new_sval; +} + +class pyobj_init_fail : public failed_call_info +{ +public: + pyobj_init_fail (const call_details &cd) : failed_call_info (cd) {} + + bool + update_model (region_model *model, const exploded_edge *, + region_model_context *ctxt) const final override + { + /* Return NULL; everything else is unchanged. */ + const call_details cd (get_call_details (model, ctxt)); + region_model_manager *mgr = cd.get_manager (); + if (cd.get_lhs_type ()) + { + const svalue *zero + = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0); + model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ()); + } + return true; + } +}; + +/* Some concessions were made to +simplify the analysis process when comparing kf_PyList_Append with the +real implementation. In particular, PyList_Append performs some +optimization internally to try and avoid calls to realloc if +possible. For simplicity, we assume that realloc is called every time. +Also, we grow the size by just 1 (to ensure enough space for adding a +new element) rather than abide by the heuristics that the actual implementation +follows. */ +class kf_PyList_Append : public known_function +{ +public: + bool + matches_call_types_p (const call_details &cd) const final override + { + return (cd.num_args () == 2 && cd.arg_is_pointer_p (0) + && cd.arg_is_pointer_p (1)); + } + void impl_call_pre (const call_details &cd) const final override; + void impl_call_post (const call_details &cd) const final override; +}; + +void +kf_PyList_Append::impl_call_pre (const call_details &cd) const +{ + region_model_manager *mgr = cd.get_manager (); + region_model *model = cd.get_model (); + + const svalue *pylist_sval = cd.get_arg_svalue (0); + const region *pylist_reg + = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ()); + + const svalue *newitem_sval = cd.get_arg_svalue (1); + const region *newitem_reg + = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ()); + + // Skip checks if unknown etc + if (pylist_sval->get_kind () != SK_REGION + && pylist_sval->get_kind () != SK_CONSTANT) + return; + + // PyList_Check + tree ob_type_field = get_field_by_name (pyobj_record, "ob_type"); + const region *ob_type_region + = mgr->get_field_region (pylist_reg, ob_type_field); + const svalue *stored_sval + = model->get_store_value (ob_type_region, cd.get_ctxt ()); + const region *pylist_type_region + = mgr->get_region_for_global (pylisttype_vardecl); + tree pylisttype_vardecl_ptr + = build_pointer_type (TREE_TYPE (pylisttype_vardecl)); + const svalue *pylist_type_ptr + = mgr->get_ptr_svalue (pylisttype_vardecl_ptr, pylist_type_region); + + if (stored_sval != pylist_type_ptr) + { + // TODO: emit diagnostic -Wanalyzer-type-error + cd.get_ctxt ()->terminate_path (); + return; + } + + // Check that new_item is not null. + { + const svalue *null_ptr + = mgr->get_or_create_int_cst (newitem_sval->get_type (), 0); + if (!model->add_constraint (newitem_sval, NE_EXPR, null_ptr, + cd.get_ctxt ())) + { + // TODO: emit diagnostic here + cd.get_ctxt ()->terminate_path (); + return; + } + } +} + +void +kf_PyList_Append::impl_call_post (const call_details &cd) const +{ + /* Three custom subclasses of custom_edge_info, for handling the various + outcomes of "realloc". */ + + /* Concrete custom_edge_info: a realloc call that fails, returning NULL. + */ + class realloc_failure : public failed_call_info + { + public: + realloc_failure (const call_details &cd) : failed_call_info (cd) {} + + bool + update_model (region_model *model, const exploded_edge *, + region_model_context *ctxt) const final override + { + const call_details cd (get_call_details (model, ctxt)); + region_model_manager *mgr = cd.get_manager (); + + const svalue *pylist_sval = cd.get_arg_svalue (0); + const region *pylist_reg = model->deref_rvalue ( + pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ()); + + /* Identify ob_item field and set it to NULL. */ + tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item"); + const region *ob_item_reg + = mgr->get_field_region (pylist_reg, ob_item_field); + const svalue *old_ptr_sval + = model->get_store_value (ob_item_reg, cd.get_ctxt ()); + + if (const region_svalue *old_reg + = old_ptr_sval->dyn_cast_region_svalue ()) + { + const region *freed_reg = old_reg->get_pointee (); + model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED); + model->unset_dynamic_extents (freed_reg); + } + + const svalue *null_sval = mgr->get_or_create_null_ptr (pyobj_ptr_ptr); + model->set_value (ob_item_reg, null_sval, cd.get_ctxt ()); + + if (cd.get_lhs_type ()) + { + const svalue *neg_one + = mgr->get_or_create_int_cst (cd.get_lhs_type (), -1); + cd.maybe_set_lhs(neg_one); + } + return true; + } + }; + + class realloc_success_no_move : public call_info + { + public: + realloc_success_no_move (const call_details &cd) : call_info (cd) {} + + label_text + get_desc (bool can_colorize) const final override + { + return make_label_text ( + can_colorize, "when %qE succeeds, without moving underlying buffer", + get_fndecl ()); + } + + bool + update_model (region_model *model, const exploded_edge *, + region_model_context *ctxt) const final override + { + const call_details cd (get_call_details (model, ctxt)); + region_model_manager *mgr = cd.get_manager (); + + const svalue *pylist_sval = cd.get_arg_svalue (0); + const region *pylist_reg = model->deref_rvalue ( + pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ()); + + const svalue *newitem_sval = cd.get_arg_svalue (1); + const region *newitem_reg = model->deref_rvalue ( + newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ()); + + tree ob_size_field = get_field_by_name (varobj_record, "ob_size"); + const region *ob_size_region + = mgr->get_field_region (pylist_reg, ob_size_field); + const svalue *ob_size_sval = nullptr; + const svalue *new_size_sval = nullptr; + inc_field_val (mgr, model, ob_size_region, integer_type_node, cd, + &ob_size_sval, &new_size_sval); + + const svalue *sizeof_sval = mgr->get_or_create_cast ( + ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr)); + const svalue *num_allocated_bytes = mgr->get_or_create_binop ( + size_type_node, MULT_EXPR, sizeof_sval, new_size_sval); + + tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item"); + const region *ob_item_region + = mgr->get_field_region (pylist_reg, ob_item_field); + const svalue *ob_item_ptr_sval + = model->get_store_value (ob_item_region, cd.get_ctxt ()); + + /* We can only grow in place with a non-NULL pointer and no unknown + */ + { + const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr); + if (!model->add_constraint (ob_item_ptr_sval, NE_EXPR, null_ptr, + cd.get_ctxt ())) + { + return false; + } + } + + const unmergeable_svalue *underlying_svalue + = ob_item_ptr_sval->dyn_cast_unmergeable_svalue (); + const svalue *target_svalue = nullptr; + const region_svalue *target_region_svalue = nullptr; + + if (underlying_svalue) + { + target_svalue = underlying_svalue->get_arg (); + if (target_svalue->get_kind () != SK_REGION) + { + return false; + } + } + else + { + if (ob_item_ptr_sval->get_kind () != SK_REGION) + { + return false; + } + target_svalue = ob_item_ptr_sval; + } + + target_region_svalue = target_svalue->dyn_cast_region_svalue (); + const region *curr_reg = target_region_svalue->get_pointee (); + + if (compat_types_p (num_allocated_bytes->get_type (), size_type_node)) + model->set_dynamic_extents (curr_reg, num_allocated_bytes, ctxt); + + model->set_value (ob_size_region, new_size_sval, ctxt); + + const svalue *offset_sval = mgr->get_or_create_binop ( + size_type_node, MULT_EXPR, sizeof_sval, ob_size_sval); + const region *element_region + = mgr->get_offset_region (curr_reg, pyobj_ptr_ptr, offset_sval); + model->set_value (element_region, newitem_sval, cd.get_ctxt ()); + + // Increment ob_refcnt of appended item. + tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt"); + const region *ob_refcnt_region + = mgr->get_field_region (newitem_reg, ob_refcnt_tree); + inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd); + + if (cd.get_lhs_type ()) + { + const svalue *zero + = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0); + cd.maybe_set_lhs(zero); + } + return true; + } + }; + + class realloc_success_move : public call_info + { + public: + realloc_success_move (const call_details &cd) : call_info (cd) {} + + label_text + get_desc (bool can_colorize) const final override + { + return make_label_text (can_colorize, "when %qE succeeds, moving buffer", + get_fndecl ()); + } + + bool + update_model (region_model *model, const exploded_edge *, + region_model_context *ctxt) const final override + { + const call_details cd (get_call_details (model, ctxt)); + region_model_manager *mgr = cd.get_manager (); + const svalue *pylist_sval = cd.get_arg_svalue (0); + const region *pylist_reg = model->deref_rvalue ( + pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ()); + + const svalue *newitem_sval = cd.get_arg_svalue (1); + const region *newitem_reg = model->deref_rvalue ( + newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ()); + + tree ob_size_field = get_field_by_name (varobj_record, "ob_size"); + const region *ob_size_region + = mgr->get_field_region (pylist_reg, ob_size_field); + const svalue *old_ob_size_sval = nullptr; + const svalue *new_ob_size_sval = nullptr; + inc_field_val (mgr, model, ob_size_region, integer_type_node, cd, + &old_ob_size_sval, &new_ob_size_sval); + + const svalue *sizeof_sval = mgr->get_or_create_cast ( + old_ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr)); + const svalue *new_size_sval = mgr->get_or_create_binop ( + size_type_node, MULT_EXPR, sizeof_sval, new_ob_size_sval); + + tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item"); + const region *ob_item_reg + = mgr->get_field_region (pylist_reg, ob_item_field); + const svalue *old_ptr_sval + = model->get_store_value (ob_item_reg, cd.get_ctxt ()); + + /* Create the new region. */ + const region *new_reg = model->get_or_create_region_for_heap_alloc ( + new_size_sval, cd.get_ctxt ()); + const svalue *new_ptr_sval + = mgr->get_ptr_svalue (pyobj_ptr_ptr, new_reg); + if (!model->add_constraint (new_ptr_sval, NE_EXPR, old_ptr_sval, + cd.get_ctxt ())) + return false; + + if (const region_svalue *old_reg + = old_ptr_sval->dyn_cast_region_svalue ()) + { + const region *freed_reg = old_reg->get_pointee (); + const svalue *old_size_sval = model->get_dynamic_extents (freed_reg); + if (old_size_sval) + { + const svalue *copied_size_sval + = get_copied_size (model, old_size_sval, new_size_sval); + const region *copied_old_reg = mgr->get_sized_region ( + freed_reg, pyobj_ptr_ptr, copied_size_sval); + const svalue *buffer_content_sval + = model->get_store_value (copied_old_reg, cd.get_ctxt ()); + const region *copied_new_reg = mgr->get_sized_region ( + new_reg, pyobj_ptr_ptr, copied_size_sval); + model->set_value (copied_new_reg, buffer_content_sval, + cd.get_ctxt ()); + } + else + { + model->mark_region_as_unknown (freed_reg, cd.get_uncertainty ()); + } + + model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED); + model->unset_dynamic_extents (freed_reg); + } + + const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr); + if (!model->add_constraint (new_ptr_sval, NE_EXPR, null_ptr, + cd.get_ctxt ())) + return false; + + model->set_value (ob_size_region, new_ob_size_sval, ctxt); + model->set_value (ob_item_reg, new_ptr_sval, cd.get_ctxt ()); + + const svalue *offset_sval = mgr->get_or_create_binop ( + size_type_node, MULT_EXPR, sizeof_sval, old_ob_size_sval); + const region *element_region + = mgr->get_offset_region (new_reg, pyobj_ptr_ptr, offset_sval); + model->set_value (element_region, newitem_sval, cd.get_ctxt ()); + + // Increment ob_refcnt of appended item. + tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt"); + const region *ob_refcnt_region + = mgr->get_field_region (newitem_reg, ob_refcnt_tree); + inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd); + + if (cd.get_lhs_type ()) + { + const svalue *zero + = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0); + cd.maybe_set_lhs(zero); + } + return true; + } + + private: + /* Return the lesser of OLD_SIZE_SVAL and NEW_SIZE_SVAL. + If unknown, OLD_SIZE_SVAL is returned. */ + const svalue * + get_copied_size (region_model *model, const svalue *old_size_sval, + const svalue *new_size_sval) const + { + tristate res + = model->eval_condition (old_size_sval, GT_EXPR, new_size_sval); + switch (res.get_value ()) + { + case tristate::TS_TRUE: + return new_size_sval; + case tristate::TS_FALSE: + case tristate::TS_UNKNOWN: + return old_size_sval; + default: + gcc_unreachable (); + } + } + }; + + /* Body of kf_PyList_Append::impl_call_post. */ + if (cd.get_ctxt ()) + { + cd.get_ctxt ()->bifurcate (make_unique (cd)); + cd.get_ctxt ()->bifurcate (make_unique (cd)); + cd.get_ctxt ()->bifurcate (make_unique (cd)); + cd.get_ctxt ()->terminate_path (); + } +} + +class kf_PyList_New : public known_function +{ +public: + bool + matches_call_types_p (const call_details &cd) const final override + { + return (cd.num_args () == 1 && cd.arg_is_integral_p (0)); + } + void impl_call_post (const call_details &cd) const final override; +}; + +void +kf_PyList_New::impl_call_post (const call_details &cd) const +{ + class success : public call_info + { + public: + success (const call_details &cd) : call_info (cd) {} + + label_text + get_desc (bool can_colorize) const final override + { + return make_label_text (can_colorize, "when %qE succeeds", + get_fndecl ()); + } + + bool + update_model (region_model *model, const exploded_edge *, + region_model_context *ctxt) const final override + { + const call_details cd (get_call_details (model, ctxt)); + region_model_manager *mgr = cd.get_manager (); + + const svalue *pyobj_svalue + = mgr->get_or_create_unknown_svalue (pyobj_record); + const svalue *varobj_svalue + = mgr->get_or_create_unknown_svalue (varobj_record); + const svalue *pylist_svalue + = mgr->get_or_create_unknown_svalue (pylistobj_record); + + const svalue *size_sval = cd.get_arg_svalue (0); + + const region *pylist_region + = init_pyobject_region (mgr, model, pylist_svalue, cd); + + /* + typedef struct + { + PyObject_VAR_HEAD + PyObject **ob_item; + Py_ssize_t allocated; + } PyListObject; + */ + tree varobj_field = get_field_by_name (pylistobj_record, "ob_base"); + const region *varobj_region + = mgr->get_field_region (pylist_region, varobj_field); + model->set_value (varobj_region, varobj_svalue, cd.get_ctxt ()); + + tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item"); + const region *ob_item_region + = mgr->get_field_region (pylist_region, ob_item_field); + + const svalue *zero_sval = mgr->get_or_create_int_cst (size_type_node, 0); + const svalue *casted_size_sval + = mgr->get_or_create_cast (size_type_node, size_sval); + const svalue *size_cond_sval = mgr->get_or_create_binop ( + size_type_node, LE_EXPR, casted_size_sval, zero_sval); + + // if size <= 0, ob_item = NULL + + if (tree_int_cst_equal (size_cond_sval->maybe_get_constant (), + integer_one_node)) + { + const svalue *null_sval + = mgr->get_or_create_null_ptr (pyobj_ptr_ptr); + model->set_value (ob_item_region, null_sval, cd.get_ctxt ()); + } + else // calloc + { + const svalue *sizeof_sval = mgr->get_or_create_cast ( + size_sval->get_type (), get_sizeof_pyobjptr (mgr)); + const svalue *prod_sval = mgr->get_or_create_binop ( + size_type_node, MULT_EXPR, sizeof_sval, size_sval); + const region *ob_item_sized_region + = model->get_or_create_region_for_heap_alloc (prod_sval, + cd.get_ctxt ()); + model->zero_fill_region (ob_item_sized_region); + const svalue *ob_item_ptr_sval + = mgr->get_ptr_svalue (pyobj_ptr_ptr, ob_item_sized_region); + const svalue *ob_item_unmergeable + = mgr->get_or_create_unmergeable (ob_item_ptr_sval); + model->set_value (ob_item_region, ob_item_unmergeable, + cd.get_ctxt ()); + } + + /* + typedef struct { + PyObject ob_base; + Py_ssize_t ob_size; // Number of items in variable part + } PyVarObject; + */ + const region *ob_base_region = get_ob_base_region ( + mgr, model, varobj_region, varobj_record, pyobj_svalue, cd); + + tree ob_size_tree = get_field_by_name (varobj_record, "ob_size"); + const region *ob_size_region + = mgr->get_field_region (varobj_region, ob_size_tree); + model->set_value (ob_size_region, size_sval, cd.get_ctxt ()); + + /* + typedef struct _object { + _PyObject_HEAD_EXTRA + Py_ssize_t ob_refcnt; + PyTypeObject *ob_type; + } PyObject; + */ + + // Initialize ob_refcnt field to 1. + init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd); + + // Get pointer svalue for PyList_Type then assign it to ob_type field. + set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylisttype_vardecl, cd); + + if (cd.get_lhs_type ()) + { + const svalue *ptr_sval + = mgr->get_ptr_svalue (cd.get_lhs_type (), pylist_region); + cd.maybe_set_lhs (ptr_sval); + } + return true; + } + }; + + if (cd.get_ctxt ()) + { + cd.get_ctxt ()->bifurcate (make_unique (cd)); + cd.get_ctxt ()->bifurcate (make_unique (cd)); + cd.get_ctxt ()->terminate_path (); + } +} + +class kf_PyLong_FromLong : public known_function +{ +public: + bool + matches_call_types_p (const call_details &cd) const final override + { + return (cd.num_args () == 1 && cd.arg_is_integral_p (0)); + } + void impl_call_post (const call_details &cd) const final override; +}; + +void +kf_PyLong_FromLong::impl_call_post (const call_details &cd) const +{ + class success : public call_info + { + public: + success (const call_details &cd) : call_info (cd) {} + + label_text + get_desc (bool can_colorize) const final override + { + return make_label_text (can_colorize, "when %qE succeeds", + get_fndecl ()); + } + + bool + update_model (region_model *model, const exploded_edge *, + region_model_context *ctxt) const final override + { + const call_details cd (get_call_details (model, ctxt)); + region_model_manager *mgr = cd.get_manager (); + + const svalue *pyobj_svalue + = mgr->get_or_create_unknown_svalue (pyobj_record); + const svalue *pylongobj_sval + = mgr->get_or_create_unknown_svalue (pylongobj_record); + + const region *pylong_region + = init_pyobject_region (mgr, model, pylongobj_sval, cd); + + // Create a region for the base PyObject within the PyLongObject. + const region *ob_base_region = get_ob_base_region ( + mgr, model, pylong_region, pylongobj_record, pyobj_svalue, cd); + + // Initialize ob_refcnt field to 1. + init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd); + + // Get pointer svalue for PyLong_Type then assign it to ob_type field. + set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylongtype_vardecl, cd); + + // Set the PyLongObject value. + tree ob_digit_field = get_field_by_name (pylongobj_record, "ob_digit"); + const region *ob_digit_region + = mgr->get_field_region (pylong_region, ob_digit_field); + const svalue *ob_digit_sval = cd.get_arg_svalue (0); + model->set_value (ob_digit_region, ob_digit_sval, cd.get_ctxt ()); + + if (cd.get_lhs_type ()) + { + const svalue *ptr_sval + = mgr->get_ptr_svalue (cd.get_lhs_type (), pylong_region); + cd.maybe_set_lhs (ptr_sval); + } + return true; + } + }; + + if (cd.get_ctxt ()) + { + cd.get_ctxt ()->bifurcate (make_unique (cd)); + cd.get_ctxt ()->bifurcate (make_unique (cd)); + cd.get_ctxt ()->terminate_path (); + } +} + static void maybe_stash_named_type (logger *logger, const translation_unit &tu, const char *name) @@ -179,6 +889,12 @@ init_py_structs () pylongobj_record = get_stashed_type_by_name ("PyLongObject"); pylongtype_vardecl = get_stashed_global_var_by_name ("PyLong_Type"); pylisttype_vardecl = get_stashed_global_var_by_name ("PyList_Type"); + + if (pyobj_record) + { + pyobj_ptr_tree = build_pointer_type (pyobj_record); + pyobj_ptr_ptr = build_pointer_type (pyobj_ptr_tree); + } } void @@ -205,6 +921,12 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */) sorry_no_cpython_plugin (); return; } + + iface->register_known_function ("PyList_Append", + make_unique ()); + iface->register_known_function ("PyList_New", make_unique ()); + iface->register_known_function ("PyLong_FromLong", + make_unique ()); } } // namespace ana diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c new file mode 100644 index 00000000000..19b5c17428a --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c @@ -0,0 +1,78 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target analyzer } */ +/* { dg-options "-fanalyzer" } */ +/* { dg-require-python-h "" } */ + + +#define PY_SSIZE_T_CLEAN +#include +#include "../analyzer/analyzer-decls.h" + +PyObject * +test_PyList_New (Py_ssize_t len) +{ + PyObject *obj = PyList_New (len); + if (obj) + { + __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */ + } + else + __analyzer_dump_path (); /* { dg-message "path" } */ + return obj; +} + +PyObject * +test_PyLong_New (long n) +{ + PyObject *obj = PyLong_FromLong (n); + if (obj) + { + __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */ + } + else + __analyzer_dump_path (); /* { dg-message "path" } */ + return obj; +} + +PyObject * +test_PyListAppend (long n) +{ + PyObject *item = PyLong_FromLong (n); + PyObject *list = PyList_New (0); + PyList_Append(list, item); + return list; /* { dg-warning "leak of 'item'" } */ +} + +PyObject * +test_PyListAppend_2 (long n) +{ + PyObject *item = PyLong_FromLong (n); + if (!item) + return NULL; + + __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + PyObject *list = PyList_New (n); + if (!list) + { + Py_DECREF(item); + return NULL; + } + + __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + + if (PyList_Append (list, item) < 0) + __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */ + else + __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */ + return list; /* { dg-warning "leak of 'item'" } */ +} + + +PyObject * +test_PyListAppend_3 (PyObject *item, PyObject *list) +{ + PyList_Append (list, item); + return list; +} \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp index 09c45394b1f..e1ed2d2589e 100644 --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp @@ -161,7 +161,8 @@ set plugin_test_list [list \ taint-CVE-2011-0521-6.c \ taint-antipatterns-1.c } \ { analyzer_cpython_plugin.c \ - cpython-plugin-test-1.c } \ + cpython-plugin-test-1.c \ + cpython-plugin-test-2.c } \ ] foreach plugin_test $plugin_test_list { diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 7004711b384..eda53ff3a09 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -12559,3 +12559,28 @@ proc check_effective_target_const_volatile_readonly_section { } { } return 1 } + +# Appends necessary Python flags to extra-tool-flags if Python.h is supported. +# Otherwise, modifies dg-do-what. +proc dg-require-python-h { args } { + upvar dg-extra-tool-flags extra-tool-flags + + verbose "ENTER dg-require-python-h" 2 + + set result [remote_exec host "python3-config --includes"] + set status [lindex $result 0] + if { $status == 0 } { + set python_flags [lindex $result 1] + } else { + verbose "Python.h not supported" 2 + upvar dg-do-what dg-do-what + set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"] + return + } + + verbose "Python flags are: $python_flags" 2 + + verbose "Before appending, extra-tool-flags: ${extra-tool-flags}" 3 + eval lappend extra-tool-flags $python_flags + verbose "After appending, extra-tool-flags: ${extra-tool-flags}" 3 +} -- 2.30.2