i'm using python 3 , i'm trying retrieve data website. however, data dynamically loaded , code have right doesn't work:
url = evecentralbaseurl + str(mineral) print("url : %s" % url); response = request.urlopen(url) data = str(response.read(10000)) data = data.replace("\\n", "\n") print(data)
where i'm trying find particular value, i'm finding template instead e.g."{{formatprice median}}" instead of "4.48".
how can make can retrieve value instead of placeholder text?
edit: this specific page i'm trying extract information from. i'm trying "median" value, uses template {{formatprice median}}
edit 2: i've installed , set program use selenium , beautifulsoup.
the code have is:
from bs4 import beautifulsoup selenium import webdriver #... driver = webdriver.firefox() driver.get(url) html = driver.page_source soup = beautifulsoup(html) print "finding..." tag in soup.find_all('formatprice median'): print tag.text
here screenshot of program it's executing. unfortunately, doesn't seem finding "formatprice median" specified.
assuming trying values page rendered using javascript templates (for instance handlebars), of standard solutions (i.e. beautifulsoup
or requests
).
this because browser uses javascript alter received , create new dom elements. urllib
requesting part browser not template rendering part. a description of issues can found here. article discusses 3 main solutions:
- parse ajax json directly
- use offline javascript interpreter process request spidermonkey, crowbar
- use browser automation tool splinter
this answer provides few more suggestions option 3, such selenium or watir. i've used selenium automated web testing , pretty handy.
edit
from comments looks handlebars driven site. i'd recommend selenium , beautiful soup. this answer gives code example may useful:
from bs4 import beautifulsoup selenium import webdriver driver = webdriver.firefox() driver.get('http://eve-central.com/home/quicklook.html?typeid=34') html = driver.page_source soup = beautifulsoup(html) # check out docs kinds of things can 'find_all' # (untested) snippet should find tags specific class id # see: http://www.crummy.com/software/beautifulsoup/bs4/doc/#searching-by-css-class tag in soup.find_all("a", class_="my_class"): print tag.text
basically selenium gets rendered html browser , can parse using beautifulsoup page_source
property. luck :)
Comments
Post a Comment