Web scraping - finding object links

Web scraping - finding object links

Postby alexvestin » Fri Aug 05, 2016 11:14 am

Hi!

I followed a guide for which to scrape the links of a website, the problem is the links I need doesn't show up in the source code so they cant get pulled. This is the link(s) Image I want to be able to scrape, does anyone have pointers of what to use/search for.

I'm new to python and webprogramming in general, and this is probably wayt over my head, but this is for a pretty important (and urgent!) project so I would greatly appreciate any help :)

(here is the code I've done so far if it's any help)
Code: Select all
import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.studentbostader.se/sv/sok-bostad/lediga-bostader?pagination=1&paginationantal=10")
r.content

soup = BeautifulSoup(r.content)
links = soup.find_all("a")

for link in links:
   print ("<a href ='%s'>%s</a>" %(link.get("href"), link.text))
Last edited by micseydel on Fri Aug 05, 2016 7:33 pm, edited 1 time in total.
Reason: Initial post lock. Fixed image.
alexvestin
 
Posts: 1
Joined: Fri Aug 05, 2016 10:49 am

Re: Web scraping - finding object links

Postby Ofnuts » Fri Aug 05, 2016 10:11 pm

In many modern sites, what you get from the page URL is a HTML skeleton, and JavaScript code issues additional HTTP requests for actual data that it inserts in the page. You can check this with a browser extension that shows all the requests (such as LiveHTTPHeaders). This may make it easier for you because these calls return raw data (usually in JSON form).
This forum has been moved to http://python-forum.io/. See you there.
User avatar
Ofnuts
 
Posts: 2659
Joined: Thu May 14, 2015 9:46 am
Location: Paris, France, EU, Earth, Solar system, Milky Way, Local Cluster, Universe #32987440940987

Re: Web scraping - finding object links

Postby wavic » Sat Aug 06, 2016 11:13 pm

When JS is building the page you have to use a module which can handle it. I am not sure if Requests can do that but WebKit, QtWebKit, QtWebengine or something like that, will be able to do the job.
wavic
 
Posts: 165
Joined: Wed May 25, 2016 8:51 pm


Return to Networking

Who is online

Users browsing this forum: No registered users and 1 guest

cron