Rumah >pembangunan bahagian belakang >Tutorial Python >Cara mengira jarak (masa dan batu) antara geografi portfolio anda dan pembanding (Titik A dan Titik B)

Cara mengira jarak (masa dan batu) antara geografi portfolio anda dan pembanding (Titik A dan Titik B)

Linda Hamilton
Linda Hamiltonasal
2024-10-07 06:10:301094semak imbas

使用Python计算地理距离

此代码对于任何希望将具有地理数据的投资组合与任何其他地理进行匹配以计算两点之间的行驶时间和距离的人来说非常有用。它的灵感来自于我被分配的一项工作任务,该任务是为了帮助资助者在查询申请人的地理分布后了解批准的项目彼此之间的接近程度。

本文将演练如何使用 API 调用、内置和自定义函数将慈善机构列表(A 点)与其最近的火车站(B 点)相匹配,并计算距离(以英里为单位)和行驶距离时间以分钟为单位。

其他用例包括,例如:

  • 将邮政编码与最近的学校相匹配
  • 将邮政编码与最近的慈善机构相匹配
  • 将邮政编码与最近的 NHS 提供商相匹配
  • 将邮政编码与最近的国家公园相匹配
  • 将列表 A 中的邮政编码与列表 B 中最近的邮政编码相匹配

要求

套餐:

  • 熊猫
  • numpy
  • 请求
  • json
  • 半正矢

本文使用的资源:

  • 慈善机构数据(在本例中,我从慈善机构佣金登记册中支出超过 500 万的慈善机构中选择了前 100 个慈善机构)
  • 英国火车站数据(由于不容易获得,我使用了包含英国火车站及其经度、纬度和邮政编码的 github 文档)
  • Postcodes.io(用于搜索和提取英国邮政编码数据的 API)
  • OSRM 项目(用于计算路线的 API)

为什么要使用 Python?

这里讨论的步骤可能看起来错综复杂,但最终结果是一个可以重复使用和重新格式化的模板,以满足您在计算多行数据的 A 点和 B 点之间的地理距离时的需求。

例如,假设您正在与 100 个慈善机构合作。您想知道这些慈善机构距离附近火车站有多近,作为对这些慈善机构地理位置进行更广泛分析的一部分。您可能希望直观地映射此数据,或将其用作进一步分析的起点,例如研究从远处参加慈善机构的可达性。



无论何种用例,如果您想手动执行此操作,步骤如下:

  1. 查找慈善机构邮政编码
  2. 使用在线工具查看距离慈善机构最近的车站
  3. 使用在线地图工具查找从慈善机构到最近车站的距离(英里)和驾驶时间
  4. 将结果记录在电子表格中
  5. 对其余 99 个慈善机构重复步骤 1 至 4

这可能对少数慈善机构有效,但一段时间后,这个过程将变得耗时、乏味,并且容易出现人为错误。



通过使用 Python 来完成此任务,我们可以自动化这些步骤,并且只需用户需要的一些添加,只需在最后运行我们的代码即可。

Python 能做什么?

让我们将任务分解为多个步骤。我们所需的步骤如下:

  1. 查找距离给定邮政编码最近的车站
  2. 计算两者之间的距离
  3. 计算出行的驾驶时间
  4. 生成包含所有必需信息的数据集

为了完成步骤 1,我们将使用 Python 来:

  • 导入包含慈善机构详细信息的数据集,包括其邮政编码
  • 使用 Postcodes.io API 提取每个邮政编码的经度和纬度
  • 将此信息编译回包含原始信息以及每个慈善机构的经度和纬度的数据帧。

第 1 步:查找距离给定邮政编码最近的车站

1- 导入包


# data manipulation
import numpy as np
import pandas as pd

# http requests
import requests

# handling json
import json

# calculating distances
import haversine as hs
from haversine import haversine, Unit


2 - 导入和清理数据


# import as a pandas dataframe, specifying which columns to import
charities = pd.read_excel('charity_list.xlsx', usecols='A, C, E')
stations = pd.read_csv('uk-train-stations.csv', usecols=[1,2,3])

# renaming stations columns for ease of use
stations = stations.rename(columns={'station_name':'Station Name','latitude':'Station Latitude', 'longitude':'Station Longitude'})


包含慈善数据集的变量(名为“charities”)将成为我们的主数据框,我们将在与提取的数据合并时使用它。



现在,我们的下一步是创建用于提取慈善机构邮政编码的经度和纬度的函数。

3 - 将邮政编码转换为列表以进行匹配功能


charities_pc = charities['Charity Postcode'].tolist()


4 - 创建一个函数,该函数接受邮政编码,向 postcodes.io 发出请求,记录纬度和经度,并将数据返回到新的数据帧中。


有关更多信息,请查阅 postcodes.io 文档


def bulk_pc_lookup(postcodes):

    # set up the api request
    url = "https://api.postcodes.io/postcodes"
    headers = {"Content-Type": "application/json"}

    # specify our input data and response, specifying that we are working with data in json format
    data = {"postcodes": postcodes}
    response = requests.post(url, headers=headers, data=json.dumps(data))

    # specify the information we want to extract from the api response

    if response.status_code == 200:
        results = response.json()["result"]
        postcode_data = []

        for result in results:
            postcode = result["query"]

            if result["result"] is not None:
                latitude = result["result"]["latitude"]
                longitude = result["result"]["longitude"]
                postcode_data.append({"Charity Postcode": postcode, "Latitude": latitude, "Longitude": longitude})

        return postcode_data

    # setting up a fail safe to capture any errors or results not found
    else:
        print(f"Error: {response.status_code}")
        return []


5 - 将我们的慈善邮政编码列表传递到函数中以提取所需的结果


# specify where the postcodes are
postcodes = charities_pc

# save the results of the function as output
output = bulk_pc_lookup(postcodes)

# convert the results to a pandas dataframe
output_df = pd.DataFrame(output)
output_df.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

请注意:

  1. if your Point B data (in this case, the UK rail stations) does not already contain latitude and longitude, you will need to also performs steps 3 to 5 on the Point B data as well
  2. postcodes.io allows bulk look up requests for up to 100 postcodes at a time. if your dataset contains more than 100 postcodes, you will need to either manually create new excel sheets containing only 100 rows per sheet, or you will need to write a function to break your dataset into the required length for the API call

6 - we can now either merge our output_df with our original charity dataset, or, to leave our original data untouched, create a new dataframe that we will use for the rest of the project for our extracted results


charities_output = pd.merge(charities, output_df, on="Charity Postcode")

charities_output.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Step 1 Complete

We now have two dataframes which we will use for the next steps:

  1. Our original stations dataframe containing the UK train stations latitude and longitude
  2. Our new charities_output dataframe containing the original charity information and the new latitude and longitude information extracted from our API call

Step 2 - Calculate the distance between Point A (charity) and Point B (train station), and record the nearest result for Point A

In this section, we will be using the haversine distance formula to:

  • check the distance between a charity and every UK train station
  • match the nearest result i.e. the UK train station with the minimum distance from our charity
  • loop over our charities dataset to find the nearest match for each row
  • record our results in a dataframe

Please note, for further information on using the haversine module, consult the documentation

1 - create a function for calculating the distance between Point A and Point B


def calc_distance(lat1, lon1, lat2, lon2):

    # specify data for location one, i.e. Point A
    loc1 = (lat1, lon1)

    # specify the data for location two, i.e. Point B
    loc2 = (lat2, lon2)

    # calculate the distance and specify the units as miles
    dist = haversine(loc1, loc2, unit=Unit.MILES)

    return dist


2 - create a loop that calculates the distance between Point A and every row in Point B, and match the result where Point B is nearest to Point A


# create an empty dictionary to store the results
results = {}

# begin with looping over the dataset containing the data for Point A
for index1, row1 in charities_output.iterrows():

    # specify the location of our data
    charity_name = row1['Charity Name']
    lat1 = row1['Latitude']
    lon1 = row1['Longitude']

    # track the minimum distance between Point A and every row of Point B
    min_dist = float('inf')
    # as the minimum distance i.e. nearest Point B is not yet known, create an empty string for storage
    min_station = ''

    # loop over the dataset containing the data for Point B
    for index2, row2 in stations.iterrows():

        # specify the location of our data
        lat2 = row2['Station Latitude']
        lon2 = row2['Station Longitude']

        # use our previously created distance function to calculate the distance
        dist = calc_distance(lat1, lon1, lat2, lon2)

        # check each distance - if it is lower than the last, this is the new low. this will repeat until the lowest distance is found
        if dist < min_dist:
            min_dist = dist
            min_station = row2['Station Name']

    results[charity_name] = {'Nearest Station': min_station, 'Distance (Miles)': min_dist}

# convert the results dictionary into a dataframe
res = pd.DataFrame.from_dict(results, orient="index")

res.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

3 - merge our new information with our charities_output dataframe


# as our dataframe output has used our charities as an index, we need to re-add it as a column
res['Charity Name'] = res.index

# merging with our existing output dataframe
charities_output = charities_output.merge(res, on="Charity Name")

charities_output.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Step 2 Complete

We now have all our information in one place, charities_output, containing:

  • Our charity information
  • The nearest station to each charity
  • The distance in miles

Step 3 - Calculate the driving time for travel

Our final step uses Project OSRM to find the driving distance between each of our charities and its nearest station. This is helpful as miles are not always an accurate descriptor of distance, where, for example, in a city like London, a 1 mile journey might take as long as a 5 mile journey in a rural area.

To prepare for this step, we must have one dataframe containing the following information:

  • charity information: name, longitude, latitude, nearest station, distance in miles
  • station information: name, longtiude, latitude

1- create a data frame with the above information


drive_time_df = pd.merge(charities_output, stations, left_on='Nearest Station', right_on='Station Name')
drive_time_df = drive_time_df.drop(columns=['Station Name'])

drive_time_df.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

2 - now that our dataframe is ready, we can set up our function for calculating drive time using Project OSRM



please note: for further information, consult the documentation


url = "http://router.project-osrm.org/route/v1/driving/{lon1},{lat1};{lon2},{lat2}"

# function 

def calc_driveTime(row):

    # extract lat and lon
    lat1, lon1 = row['Latitude'], row['Longitude']
    lat2, lon2 = row['Station Latitude'], row['Station Longitude']

    # request
    response = requests.get(url.format(lat1=lat1, lon1=lon1, lat2=lat2, lon2=lon2))

    # parse response
    data = json.loads(response.content)

    # drive time in seconds
    drive_time_sec = data["routes"][0]["duration"]

    # convert to minutes
    drive_time = round((drive_time_sec) / 60, 0)

    return drive_time


3 - pass our data into our new function to calculate driving time in minutes


# apply the above function to our dataframe
driving_time_res = drive_time_df.apply(calc_driveTime, axis=1)

# add dataframe results as a new column
drive_time_df['Driving Time (Minutes)'] = driving_time_res

drive_time_df.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Step 4 Complete

We now have all our desired information in one compact dataframe. For layout purposes, and depending on what we want to do next with our data, we can create one final dataframe as output, containing the following information:

  • Charity Name
  • Nearest Station
  • Distance (Miles)
  • Driving Time (Minutes)

final_output = drive_time_df.drop(columns=['Charity Number', 'Charity Postcode', 'Latitude', 'Longitude', 'Station Latitude', 'Station Longitude'])

final_output.head()


How to calculate the distance (time and miles) between the geographies of your portfolio and a comparator (Point A and Point B)

Thankyou for reading! I hope this was helpful. Please checkout my website if you are interested in my work.

Atas ialah kandungan terperinci Cara mengira jarak (masa dan batu) antara geografi portfolio anda dan pembanding (Titik A dan Titik B). Untuk maklumat lanjut, sila ikut artikel berkaitan lain di laman web China PHP!

Kenyataan:
Kandungan artikel ini disumbangkan secara sukarela oleh netizen, dan hak cipta adalah milik pengarang asal. Laman web ini tidak memikul tanggungjawab undang-undang yang sepadan. Jika anda menemui sebarang kandungan yang disyaki plagiarisme atau pelanggaran, sila hubungi admin@php.cn