search

Home  >  Q&A  >  body text

BeautifulSoup ingests all data, but .findAll() only returns links to one parent

I'm trying to scrape a website using BeautifulSoup in Python. All data is ingested, including all links I try to access. However, when I use the .findAll() function, it only returns part of the link I'm looking for. That is to say, only the links in the following xpath are returned

/html/body/div[1]/div/div[2]/div/div[2]/div[1]

This will ignore links in /html/body/div[1]/div/div[2]/div/div[2]/div[2] /html/body/div[1]/div/div[2]/div/div[2]/div[3] etc

import requests
from bs4 import BeautifulSoup

url = "https://www.riksdagen.se/sv/ledamoter-och-partier/ledamoterna/"
response = requests.get(url)
content = BeautifulSoup(response.content, "html.parser")
mp_pages = []
mps = content.findAll(attrs = {'class': 'sc-907102a3-0 sc-e6d2fd61-0 gOAsvA jBTDjv'})
for x in mps:
    mp_pages.append(x.get('href'))

print(mp_pages)

I want all links to be appended to the mp_pages list, but it only goes to one parent (the links starting with A) and seems to stop at the last child instead of continuing.

I've seen similar questions where the answer was to use selenium due to javascript, but since all the links are within the content it doesn't make sense.

P粉654894952P粉654894952439 days ago433

reply all(1)I'll reply

  • P粉553428780

    P粉5534287802023-09-15 11:25:57

    The data you see on the page is stored as Json in