From: Dennis Putnam <dap1@bellsouth.net>
To: cygwin <cygwin@cygwin.com>
Subject: pyppeteer error in Python3
Date: Thu, 23 Sep 2021 13:32:48 -0400 [thread overview]
Message-ID: <d341cef6-cd51-67ec-0fec-7efdf19d4b13@bellsouth.net> (raw)
In-Reply-To: <d341cef6-cd51-67ec-0fec-7efdf19d4b13.ref@bellsouth.net>
*I'm not sure this is really a cygwin problem but I don't know where
else to ask. I'm runing a python3 script to extract a web page:**
*
#!/usr/bin/python3
# This script auto submitsw do not call complaints
from bs4 import BeautifulSoup
from requests_html import HTMLSession
from urllib.parse import urljoin
print('Starting process')
session=HTMLSession()
def get_all_forms(url):
"""Returns all form tags found on a web page's `url` """
# GET request
print("getting page")
res = session.get(url)
# for javascript driven website
print("Running Javascript")
res.html.render()
print("parsing url")
soup = BeautifulSoup(res.html.html, "html.parser")
return soup.find_all("form")
print(get_all_forms("https://blahblah"))
*The result is a traceback when executing 'res.html.render'.*
Traceback (most recent call last):
File "./donotcall.py", line 23, in <module>
print(get_all_forms("https://www.donotcall.gov/report.html#step1"))
File "./donotcall.py", line 19, in get_all_forms
res.html.render()
File "/usr/local/lib/python3.8/site-packages/requests_html.py", line
586, in render
self.browser = self.session.browser # Automatically create a event
loop and browser
File "/usr/local/lib/python3.8/site-packages/requests_html.py", line
730, in browser
self._browser = self.loop.run_until_complete(super().browser)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in
run_until_complete
return future.result()
File "/usr/local/lib/python3.8/site-packages/requests_html.py", line
714, in browser
self._browser = await
pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True,
args=self.__browser_args)
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py",
line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py",
line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py",
line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
*From what I can find with my searches, it has something to do with
pyppeteer (chromium) and synchronization. Can someone help me debug
this or point me to a better place to ask? TIA.*
parent reply other threads:[~2021-09-23 17:32 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <d341cef6-cd51-67ec-0fec-7efdf19d4b13.ref@bellsouth.net>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d341cef6-cd51-67ec-0fec-7efdf19d4b13@bellsouth.net \
--to=dap1@bellsouth.net \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).