首頁 >後端開發 >Python教學 >如何計算您的投資組合地理位置與比較器(A 點和 B 點)之間的距離(時間和英里)

如何計算您的投資組合地理位置與比較器(A 點和 B 點)之間的距離(時間和英里)

Linda Hamilton
Linda Hamilton原創
2024-10-07 06:10:301096瀏覽

使用Python計算地理距離

此程式碼對於任何希望將具有地理資料的投資組合與任何其他地理進行匹配以計算兩點之間的行駛時間和距離的人來說非常有用。它的靈感來自於我被指派的一項工作任務,該任務是為了幫助資助者在查詢申請人的地理分佈後了解批准的項目彼此之間的接近程度。

本文將演練如何使用API​​ 呼叫、內建和自訂函數將慈善機構列表(A 點)與其最近的火車站(B 點)相匹配,併計算距離(以英里為單位)和行駛距離時間以分鐘為單位。

其他用例包括,例如:

  • 將郵遞區號與最近的學校配對
  • 將郵遞區號與最近的慈善機構配對
  • 將郵遞區號與最近的 NHS 提供者配對
  • 將郵遞區號與最近的國家公園配對
  • 將清單 A 中的郵遞區號與清單 B 中最近的郵遞區號相符

要求

套餐:

  • 熊貓
  • numpy
  • 請求
  • json
  • 半正矢

本文所使用的資源:

  • 慈善機構數據(在這個例子中,我從慈善機構委員會登記冊中支出超過 500 萬的慈善機構中選擇了前 100 個慈善機構)
  • 英國火車站資料(由於不容易取得,我使用了包含英國火車站及其經度、緯度和郵遞區號的 github 文件)
  • Postcodes.io(用於搜尋和提取英國郵遞區號資料的 API)
  • OSRM 專案(用於計算路線的 API)

為什麼要使用 Python?

這裡討論的步驟可能看起來錯綜複雜,但最終結果是一個可以重複使用和重新格式化的模板,以滿足您在計算多行資料的 A 點和 B 點之間的地理距離時的需求。

例如,假設您正在與 100 個慈善機構合作。您想知道這些慈善機構距離附近火車站有多近,作為對這些慈善機構地理位置進行更廣泛分析的一部分。您可能希望直觀地映射此數據,或將其用作進一步分析的起點,例如研究從遠處參加慈善機構的可及性。



無論何種用例,如果您想手動執行此操作,步驟如下:

  1. 找慈善機構郵遞區號
  2. 使用線上工具查看距離慈善機構最近的車站
  3. 使用線上地圖工具找出從慈善機構到最近車站的距離(英里)和駕駛時間
  4. 將結果記錄在電子表格中
  5. 對其餘 99 個慈善機構重複步驟 1 至 4

這可能對少數慈善機構有效,但一段時間後,這個過程將變得耗時、乏味,並且容易出現人為錯誤。



透過使用 Python 來完成此任務,我們可以自動化這些步驟,並且只需使用者需要的一些添加,只需在最後運行我們的程式碼即可。

Python 能做什麼?

讓我們將任務分解為多個步驟。我們所需的步驟如下:

  1. 找出距離給定郵遞區號最近的車站
  2. 計算兩者之間的距離
  3. 計算出行的駕駛時間
  4. 產生包含所有必需資訊的資料集

為了完成步驟 1,我們將使用 Python 來:

  • 匯入包含慈善機構詳細資料的資料集,包括其郵遞區號
  • 使用 Postcodes.io API 來擷取每個郵遞區號的經度和緯度
  • 將此資訊編譯回包含原始資訊以及每個慈善機構的經度和緯度的資料幀。

第 1 步:找出距離給定郵遞區號最近的車站

1- 導入包


# data manipulation
import numpy as np
import pandas as pd

# http requests
import requests

# handling json
import json

# calculating distances
import haversine as hs
from haversine import haversine, Unit


2 - 匯入和清理資料


# import as a pandas dataframe, specifying which columns to import
charities = pd.read_excel('charity_list.xlsx', usecols='A, C, E')
stations = pd.read_csv('uk-train-stations.csv', usecols=[1,2,3])

# renaming stations columns for ease of use
stations = stations.rename(columns={'station_name':'Station Name','latitude':'Station Latitude', 'longitude':'Station Longitude'})


包含慈善資料集的變數(名為「charities」)將成為我們的主資料框,我們將在與提取的資料合併時使用它。



現在,我們的下一步是建立用於提取慈善機構郵遞區號的經度和緯度的函數。

3 - 將郵遞區號轉換為清單以進行配對功能


charities_pc = charities['Charity Postcode'].tolist()


4 - 建立一個函數,該函數接受郵遞區號,向 postcodes.io 發出請求,記錄緯度和經度,並將資料傳回新的資料幀。


有關更多信息,請查閱 postcodes.io 文件


def bulk_pc_lookup(postcodes):

    # set up the api request
    url = "https://api.postcodes.io/postcodes"
    headers = {"Content-Type": "application/json"}

    # specify our input data and response, specifying that we are working with data in json format
    data = {"postcodes": postcodes}
    response = requests.post(url, headers=headers, data=json.dumps(data))

    # specify the information we want to extract from the api response

    if response.status_code == 200:
        results = response.json()["result"]
        postcode_data = []

        for result in results:
            postcode = result["query"]

            if result["result"] is not None:
                latitude = result["result"]["latitude"]
                longitude = result["result"]["longitude"]
                postcode_data.append({"Charity Postcode": postcode, "Latitude": latitude, "Longitude": longitude})

        return postcode_data

    # setting up a fail safe to capture any errors or results not found
    else:
        print(f"Error: {response.status_code}")
        return []


5 - 將我們的慈善郵遞區號清單傳遞到函數中以提取所需的結果


# specify where the postcodes are
postcodes = charities_pc

# save the results of the function as output
output = bulk_pc_lookup(postcodes)

# convert the results to a pandas dataframe
output_df = pd.DataFrame(output)
output_df.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

請注意:

  1. if your Point B data (in this case, the UK rail stations) does not already contain latitude and longitude, you will need to also performs steps 3 to 5 on the Point B data as well
  2. postcodes.io allows bulk look up requests for up to 100 postcodes at a time. if your dataset contains more than 100 postcodes, you will need to either manually create new excel sheets containing only 100 rows per sheet, or you will need to write a function to break your dataset into the required length for the API call

6 - we can now either merge our output_df with our original charity dataset, or, to leave our original data untouched, create a new dataframe that we will use for the rest of the project for our extracted results


charities_output = pd.merge(charities, output_df, on="Charity Postcode")

charities_output.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Step 1 Complete

We now have two dataframes which we will use for the next steps:

  1. Our original stations dataframe containing the UK train stations latitude and longitude
  2. Our new charities_output dataframe containing the original charity information and the new latitude and longitude information extracted from our API call

Step 2 - Calculate the distance between Point A (charity) and Point B (train station), and record the nearest result for Point A

In this section, we will be using the haversine distance formula to:

  • check the distance between a charity and every UK train station
  • match the nearest result i.e. the UK train station with the minimum distance from our charity
  • loop over our charities dataset to find the nearest match for each row
  • record our results in a dataframe

Please note, for further information on using the haversine module, consult the documentation

1 - create a function for calculating the distance between Point A and Point B


def calc_distance(lat1, lon1, lat2, lon2):

    # specify data for location one, i.e. Point A
    loc1 = (lat1, lon1)

    # specify the data for location two, i.e. Point B
    loc2 = (lat2, lon2)

    # calculate the distance and specify the units as miles
    dist = haversine(loc1, loc2, unit=Unit.MILES)

    return dist


2 - create a loop that calculates the distance between Point A and every row in Point B, and match the result where Point B is nearest to Point A


# create an empty dictionary to store the results
results = {}

# begin with looping over the dataset containing the data for Point A
for index1, row1 in charities_output.iterrows():

    # specify the location of our data
    charity_name = row1['Charity Name']
    lat1 = row1['Latitude']
    lon1 = row1['Longitude']

    # track the minimum distance between Point A and every row of Point B
    min_dist = float('inf')
    # as the minimum distance i.e. nearest Point B is not yet known, create an empty string for storage
    min_station = ''

    # loop over the dataset containing the data for Point B
    for index2, row2 in stations.iterrows():

        # specify the location of our data
        lat2 = row2['Station Latitude']
        lon2 = row2['Station Longitude']

        # use our previously created distance function to calculate the distance
        dist = calc_distance(lat1, lon1, lat2, lon2)

        # check each distance - if it is lower than the last, this is the new low. this will repeat until the lowest distance is found
        if dist < min_dist:
            min_dist = dist
            min_station = row2['Station Name']

    results[charity_name] = {'Nearest Station': min_station, 'Distance (Miles)': min_dist}

# convert the results dictionary into a dataframe
res = pd.DataFrame.from_dict(results, orient="index")

res.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

3 - merge our new information with our charities_output dataframe


# as our dataframe output has used our charities as an index, we need to re-add it as a column
res['Charity Name'] = res.index

# merging with our existing output dataframe
charities_output = charities_output.merge(res, on="Charity Name")

charities_output.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Step 2 Complete

We now have all our information in one place, charities_output, containing:

  • Our charity information
  • The nearest station to each charity
  • The distance in miles

Step 3 - Calculate the driving time for travel

Our final step uses Project OSRM to find the driving distance between each of our charities and its nearest station. This is helpful as miles are not always an accurate descriptor of distance, where, for example, in a city like London, a 1 mile journey might take as long as a 5 mile journey in a rural area.

To prepare for this step, we must have one dataframe containing the following information:

  • charity information: name, longitude, latitude, nearest station, distance in miles
  • station information: name, longtiude, latitude

1- create a data frame with the above information


drive_time_df = pd.merge(charities_output, stations, left_on='Nearest Station', right_on='Station Name')
drive_time_df = drive_time_df.drop(columns=['Station Name'])

drive_time_df.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

2 - now that our dataframe is ready, we can set up our function for calculating drive time using Project OSRM



please note: for further information, consult the documentation


url = "http://router.project-osrm.org/route/v1/driving/{lon1},{lat1};{lon2},{lat2}"

# function 

def calc_driveTime(row):

    # extract lat and lon
    lat1, lon1 = row['Latitude'], row['Longitude']
    lat2, lon2 = row['Station Latitude'], row['Station Longitude']

    # request
    response = requests.get(url.format(lat1=lat1, lon1=lon1, lat2=lat2, lon2=lon2))

    # parse response
    data = json.loads(response.content)

    # drive time in seconds
    drive_time_sec = data["routes"][0]["duration"]

    # convert to minutes
    drive_time = round((drive_time_sec) / 60, 0)

    return drive_time


3 - pass our data into our new function to calculate driving time in minutes


# apply the above function to our dataframe
driving_time_res = drive_time_df.apply(calc_driveTime, axis=1)

# add dataframe results as a new column
drive_time_df['Driving Time (Minutes)'] = driving_time_res

drive_time_df.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Step 4 Complete

We now have all our desired information in one compact dataframe. For layout purposes, and depending on what we want to do next with our data, we can create one final dataframe as output, containing the following information:

  • Charity Name
  • Nearest Station
  • Distance (Miles)
  • Driving Time (Minutes)

final_output = drive_time_df.drop(columns=['Charity Number', 'Charity Postcode', 'Latitude', 'Longitude', 'Station Latitude', 'Station Longitude'])

final_output.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Thankyou for reading! I hope this was helpful. Please checkout my website if you are interested in my work.

以上是如何計算您的投資組合地理位置與比較器(A 點和 B 點)之間的距離(時間和英里)的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn