此程式碼對於任何希望將具有地理資料的投資組合與任何其他地理進行匹配以計算兩點之間的行駛時間和距離的人來說非常有用。它的靈感來自於我被指派的一項工作任務,該任務是為了幫助資助者在查詢申請人的地理分佈後了解批准的項目彼此之間的接近程度。
本文將演練如何使用API 呼叫、內建和自訂函數將慈善機構列表(A 點)與其最近的火車站(B 點)相匹配,併計算距離(以英里為單位)和行駛距離時間以分鐘為單位。
其他用例包括,例如:
套餐:
本文所使用的資源:
這裡討論的步驟可能看起來錯綜複雜,但最終結果是一個可以重複使用和重新格式化的模板,以滿足您在計算多行資料的 A 點和 B 點之間的地理距離時的需求。
例如,假設您正在與 100 個慈善機構合作。您想知道這些慈善機構距離附近火車站有多近,作為對這些慈善機構地理位置進行更廣泛分析的一部分。您可能希望直觀地映射此數據,或將其用作進一步分析的起點,例如研究從遠處參加慈善機構的可及性。
無論何種用例,如果您想手動執行此操作,步驟如下:
這可能對少數慈善機構有效,但一段時間後,這個過程將變得耗時、乏味,並且容易出現人為錯誤。
透過使用 Python 來完成此任務,我們可以自動化這些步驟,並且只需使用者需要的一些添加,只需在最後運行我們的程式碼即可。
讓我們將任務分解為多個步驟。我們所需的步驟如下:
為了完成步驟 1,我們將使用 Python 來:
1- 導入包
# data manipulation import numpy as np import pandas as pd # http requests import requests # handling json import json # calculating distances import haversine as hs from haversine import haversine, Unit
2 - 匯入和清理資料
# import as a pandas dataframe, specifying which columns to import charities = pd.read_excel('charity_list.xlsx', usecols='A, C, E') stations = pd.read_csv('uk-train-stations.csv', usecols=[1,2,3]) # renaming stations columns for ease of use stations = stations.rename(columns={'station_name':'Station Name','latitude':'Station Latitude', 'longitude':'Station Longitude'})
包含慈善資料集的變數(名為「charities」)將成為我們的主資料框,我們將在與提取的資料合併時使用它。
現在,我們的下一步是建立用於提取慈善機構郵遞區號的經度和緯度的函數。3 - 將郵遞區號轉換為清單以進行配對功能
charities_pc = charities['Charity Postcode'].tolist()4 - 建立一個函數,該函數接受郵遞區號,向 postcodes.io 發出請求,記錄緯度和經度,並將資料傳回新的資料幀。
有關更多信息,請查閱 postcodes.io 文件def bulk_pc_lookup(postcodes): # set up the api request url = "https://api.postcodes.io/postcodes" headers = {"Content-Type": "application/json"} # specify our input data and response, specifying that we are working with data in json format data = {"postcodes": postcodes} response = requests.post(url, headers=headers, data=json.dumps(data)) # specify the information we want to extract from the api response if response.status_code == 200: results = response.json()["result"] postcode_data = [] for result in results: postcode = result["query"] if result["result"] is not None: latitude = result["result"]["latitude"] longitude = result["result"]["longitude"] postcode_data.append({"Charity Postcode": postcode, "Latitude": latitude, "Longitude": longitude}) return postcode_data # setting up a fail safe to capture any errors or results not found else: print(f"Error: {response.status_code}") return []5 - 將我們的慈善郵遞區號清單傳遞到函數中以提取所需的結果
# specify where the postcodes are postcodes = charities_pc # save the results of the function as output output = bulk_pc_lookup(postcodes) # convert the results to a pandas dataframe output_df = pd.DataFrame(output) output_df.head()請注意:
- if your Point B data (in this case, the UK rail stations) does not already contain latitude and longitude, you will need to also performs steps 3 to 5 on the Point B data as well
- postcodes.io allows bulk look up requests for up to 100 postcodes at a time. if your dataset contains more than 100 postcodes, you will need to either manually create new excel sheets containing only 100 rows per sheet, or you will need to write a function to break your dataset into the required length for the API call
6 - we can now either merge our output_df with our original charity dataset, or, to leave our original data untouched, create a new dataframe that we will use for the rest of the project for our extracted results
charities_output = pd.merge(charities, output_df, on="Charity Postcode") charities_output.head()Step 1 Complete
We now have two dataframes which we will use for the next steps:
- Our original stations dataframe containing the UK train stations latitude and longitude
- Our new charities_output dataframe containing the original charity information and the new latitude and longitude information extracted from our API call
Step 2 - Calculate the distance between Point A (charity) and Point B (train station), and record the nearest result for Point A
In this section, we will be using the haversine distance formula to:
- check the distance between a charity and every UK train station
- match the nearest result i.e. the UK train station with the minimum distance from our charity
- loop over our charities dataset to find the nearest match for each row
- record our results in a dataframe
Please note, for further information on using the haversine module, consult the documentation
1 - create a function for calculating the distance between Point A and Point B
def calc_distance(lat1, lon1, lat2, lon2): # specify data for location one, i.e. Point A loc1 = (lat1, lon1) # specify the data for location two, i.e. Point B loc2 = (lat2, lon2) # calculate the distance and specify the units as miles dist = haversine(loc1, loc2, unit=Unit.MILES) return dist2 - create a loop that calculates the distance between Point A and every row in Point B, and match the result where Point B is nearest to Point A
# create an empty dictionary to store the results results = {} # begin with looping over the dataset containing the data for Point A for index1, row1 in charities_output.iterrows(): # specify the location of our data charity_name = row1['Charity Name'] lat1 = row1['Latitude'] lon1 = row1['Longitude'] # track the minimum distance between Point A and every row of Point B min_dist = float('inf') # as the minimum distance i.e. nearest Point B is not yet known, create an empty string for storage min_station = '' # loop over the dataset containing the data for Point B for index2, row2 in stations.iterrows(): # specify the location of our data lat2 = row2['Station Latitude'] lon2 = row2['Station Longitude'] # use our previously created distance function to calculate the distance dist = calc_distance(lat1, lon1, lat2, lon2) # check each distance - if it is lower than the last, this is the new low. this will repeat until the lowest distance is found if dist < min_dist: min_dist = dist min_station = row2['Station Name'] results[charity_name] = {'Nearest Station': min_station, 'Distance (Miles)': min_dist} # convert the results dictionary into a dataframe res = pd.DataFrame.from_dict(results, orient="index") res.head()3 - merge our new information with our charities_output dataframe
# as our dataframe output has used our charities as an index, we need to re-add it as a column res['Charity Name'] = res.index # merging with our existing output dataframe charities_output = charities_output.merge(res, on="Charity Name") charities_output.head()Step 2 Complete
We now have all our information in one place, charities_output, containing:
- Our charity information
- The nearest station to each charity
- The distance in miles
Step 3 - Calculate the driving time for travel
Our final step uses Project OSRM to find the driving distance between each of our charities and its nearest station. This is helpful as miles are not always an accurate descriptor of distance, where, for example, in a city like London, a 1 mile journey might take as long as a 5 mile journey in a rural area.
To prepare for this step, we must have one dataframe containing the following information:
- charity information: name, longitude, latitude, nearest station, distance in miles
- station information: name, longtiude, latitude
1- create a data frame with the above information
drive_time_df = pd.merge(charities_output, stations, left_on='Nearest Station', right_on='Station Name') drive_time_df = drive_time_df.drop(columns=['Station Name']) drive_time_df.head()2 - now that our dataframe is ready, we can set up our function for calculating drive time using Project OSRM
please note: for further information, consult the documentationurl = "http://router.project-osrm.org/route/v1/driving/{lon1},{lat1};{lon2},{lat2}" # function def calc_driveTime(row): # extract lat and lon lat1, lon1 = row['Latitude'], row['Longitude'] lat2, lon2 = row['Station Latitude'], row['Station Longitude'] # request response = requests.get(url.format(lat1=lat1, lon1=lon1, lat2=lat2, lon2=lon2)) # parse response data = json.loads(response.content) # drive time in seconds drive_time_sec = data["routes"][0]["duration"] # convert to minutes drive_time = round((drive_time_sec) / 60, 0) return drive_time3 - pass our data into our new function to calculate driving time in minutes
# apply the above function to our dataframe driving_time_res = drive_time_df.apply(calc_driveTime, axis=1) # add dataframe results as a new column drive_time_df['Driving Time (Minutes)'] = driving_time_res drive_time_df.head()Step 4 Complete
We now have all our desired information in one compact dataframe. For layout purposes, and depending on what we want to do next with our data, we can create one final dataframe as output, containing the following information:
- Charity Name
- Nearest Station
- Distance (Miles)
- Driving Time (Minutes)
final_output = drive_time_df.drop(columns=['Charity Number', 'Charity Postcode', 'Latitude', 'Longitude', 'Station Latitude', 'Station Longitude']) final_output.head()Thankyou for reading! I hope this was helpful. Please checkout my website if you are interested in my work.
以上是如何計算您的投資組合地理位置與比較器(A 點和 B 點)之間的距離(時間和英里)的詳細內容。更多資訊請關注PHP中文網其他相關文章!