Home  >  Q&A  >  body text

How to scrape specific Google Weather text using BeautifulSoup?

How to find the course text "New York City, USA" in Python using BeautifulSoup?

Tried copying the video to practice, but it no longer works.

Tried to find something in the official documentation, but no success. Or is my get_html_content function not working properly and Google is just blocking me, thus returning an empty list / None ?

This is my current code:

from django.shortcuts import render
import requests

def get_html_content(city):
    USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"
    LANGUAGE = "en-US,en;q=0.5"
    session = requests.Session()
    session.headers['User-Agent'] = USER_AGENT
    session.headers['Accept-Language'] = LANGUAGE
    session.headers['Content-Language'] = LANGUAGE
    city.replace(" ", "+")
    html_content = session.get(f"https://www.google.com/search?q=weather+in+{city}").text
    return html_content

def home(request):
    result = None
    if 'city' in request.GET: 
        city = request.GET.get('city')
        html_content = get_html_content(city)
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html_content, 'html.parser')
        soup.find_all('div', attrs={'class': 'wob_loc q8U8x'})
        **OR**
        soup.find_all('div', attrs={'id': 'wob_loc'})

--> Both return an empty list (= .find method returns None)

P粉275883973P粉275883973181 days ago320

reply all(1)I'll reply

  • P粉509383150

    P粉5093831502024-04-02 09:50:59

    The layout of the Google page may have changed at the same time, so to get data about the weather you must change your code. For example:

    import requests
    from bs4 import BeautifulSoup
    
    
    params = {'q':'weather in New York City, New York, USA', 'hl': 'en'}
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0'}
    cookies = {'CONSENT':"YES+cb.20220419-08-p0.cs+FX+111"}
    
    url = 'https://www.google.com/search'
    
    
    soup = BeautifulSoup(requests.get(url, params=params, headers=headers, cookies=cookies).content, 'html.parser')
    
    for t in soup.select('#wob_dp [aria-label]'):
        how = t.find_next('img')['alt']
        temp = t.find_next('span').get_text(strip=True)
        print('{:<5} {:<20} {}'.format(t.text, how, temp))
    

    Print:

    Mon   Sunny                8
    Tue   Cloudy               7
    Wed   Partly cloudy        11
    Thu   Rain                 7
    Fri   Mostly cloudy        8
    Sat   Partly cloudy        6
    Sun   Scattered showers    8
    Mon   Showers              8
    

    reply
    0
  • Cancelreply