public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Dennis Putnam <dap1@bellsouth.net>
To: cygwin <cygwin@cygwin.com>
Subject: pyppeteer error in Python3
Date: Thu, 23 Sep 2021 13:32:48 -0400	[thread overview]
Message-ID: <d341cef6-cd51-67ec-0fec-7efdf19d4b13@bellsouth.net> (raw)
In-Reply-To: <d341cef6-cd51-67ec-0fec-7efdf19d4b13.ref@bellsouth.net>

*I'm not sure this is really a cygwin problem but I don't know where 
else to ask. I'm runing a python3 script to extract a web page:**
*
#!/usr/bin/python3

# This script auto submitsw do not call complaints

from bs4 import BeautifulSoup
from requests_html import HTMLSession
from urllib.parse import urljoin

print('Starting process')
session=HTMLSession()

def get_all_forms(url):
    """Returns all form tags found on a web page's `url` """
    # GET request
    print("getting page")
    res = session.get(url)
    # for javascript driven website
    print("Running Javascript")
    res.html.render()
    print("parsing url")
    soup = BeautifulSoup(res.html.html, "html.parser")
    return soup.find_all("form")
print(get_all_forms("https://blahblah"))

*The result is a traceback when executing 'res.html.render'.*

Traceback (most recent call last):
   File "./donotcall.py", line 23, in <module>
print(get_all_forms("https://www.donotcall.gov/report.html#step1"))
   File "./donotcall.py", line 19, in get_all_forms
     res.html.render()
   File "/usr/local/lib/python3.8/site-packages/requests_html.py", line 
586, in render
     self.browser = self.session.browser  # Automatically create a event 
loop and browser
   File "/usr/local/lib/python3.8/site-packages/requests_html.py", line 
730, in browser
     self._browser = self.loop.run_until_complete(super().browser)
   File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in 
run_until_complete
     return future.result()
   File "/usr/local/lib/python3.8/site-packages/requests_html.py", line 
714, in browser
     self._browser = await 
pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, 
args=self.__browser_args)
   File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", 
line 307, in launch
     return await Launcher(options, **kwargs).launch()
   File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", 
line 168, in launch
     self.browserWSEndpoint = get_ws_endpoint(self.url)
   File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", 
line 227, in get_ws_endpoint
     raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

*From what I can find with my searches, it has something to do with 
pyppeteer (chromium)  and synchronization. Can someone help me debug 
this or point me to a better place to ask? TIA.*



           reply	other threads:[~2021-09-23 17:32 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <d341cef6-cd51-67ec-0fec-7efdf19d4b13.ref@bellsouth.net>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d341cef6-cd51-67ec-0fec-7efdf19d4b13@bellsouth.net \
    --to=dap1@bellsouth.net \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).