How to retrieve the values of dynamic html content using Python -

i'm using python 3 , i'm trying retrieve data website. however, data dynamically loaded , code have right doesn't work:

url = evecentralbaseurl + str(mineral) print("url : %s" % url);  response = request.urlopen(url) data = str(response.read(10000))  data = data.replace("\\n", "\n") print(data)

where i'm trying find particular value, i'm finding template instead e.g."{{formatprice median}}" instead of "4.48".

how can make can retrieve value instead of placeholder text?

edit: this specific page i'm trying extract information from. i'm trying "median" value, uses template {{formatprice median}}

edit 2: i've installed , set program use selenium , beautifulsoup.

the code have is:

from bs4 import beautifulsoup selenium import webdriver  #...  driver = webdriver.firefox() driver.get(url)  html = driver.page_source soup = beautifulsoup(html)  print "finding..."  tag in soup.find_all('formatprice median'):     print tag.text

here screenshot of program it's executing. unfortunately, doesn't seem finding "formatprice median" specified.

assuming trying values page rendered using javascript templates (for instance handlebars), of standard solutions (i.e. beautifulsoup or requests).

this because browser uses javascript alter received , create new dom elements. urllib requesting part browser not template rendering part. a description of issues can found here. article discusses 3 main solutions:

parse ajax json directly
use offline javascript interpreter process request spidermonkey, crowbar
use browser automation tool splinter

this answer provides few more suggestions option 3, such selenium or watir. i've used selenium automated web testing , pretty handy.

edit

from comments looks handlebars driven site. i'd recommend selenium , beautiful soup. this answer gives code example may useful:

from bs4 import beautifulsoup selenium import webdriver driver = webdriver.firefox() driver.get('http://eve-central.com/home/quicklook.html?typeid=34')  html = driver.page_source soup = beautifulsoup(html)  # check out docs kinds of things can 'find_all' # (untested) snippet should find tags specific class id # see: http://www.crummy.com/software/beautifulsoup/bs4/doc/#searching-by-css-class tag in soup.find_all("a", class_="my_class"):     print tag.text

basically selenium gets rendered html browser , can parse using beautifulsoup page_source property. luck :)

Brazie

Search This Blog

How to retrieve the values of dynamic html content using Python -

Comments

Post a Comment