本文是澳大利亚天气预测项目的前置数据处理环节,在大项目中需要将观测点所在城市转化为气候区域,以方便探究气候区域与天气的关系。

但从公开数据中,只能查到主要城市的气候区域,所以处理思路是,通过观测点和主要城市的经纬度,计算出实际距离,然后近似找到观测点的气候区域。

代码示例

1、加载已有数据

import pandas as pd

# 样本城市经纬度
sample_city_ll = pd.read_csv('./datas/sample_city_ll.csv', index_col=0)
# 主要城市经纬度
city_ll = pd.read_csv('./datas/city_ll.csv', index_col=0)
# 主要城市气候区域
city_climate = pd.read_csv('./datas/city_climate.csv', index_col=0)

2、使用geopy库按经纬度计算两点距离

from geopy.distance import geodesic

sample_df = pd.DataFrame(index=sample_city_ll['City'])
# 遍历获取sample_city信息
for idx, sample_row in sample_city_ll.iterrows():
    sample_row = dict(sample_row)
    sample_city = sample_row['City']
    sample_lt = sample_row['Latitude'].strip('°')
    sample_lg = sample_row['Longitude'].strip('°')
    # 遍历获取city信息
    dists = []
    for idx, row in city_ll.iterrows():
        row = dict(row)
        city = row['City']
        lt = row['Latitude'].strip('°')
        lg = row['Longitude'].strip('°')
        # 计算距离
        dists.append([geodesic((sample_lt, sample_lg), (lt, lg)).km, city])
    # 获取距离最小值对应的城市
    _, city = min(dists)
    # 查找最近城市对应的气候区域,并填充dataframe
    climate = city_climate.loc[city, 'Climate']
    sample_df.loc[sample_city, 'Climate'] = climate

sample_df.to_csv('./datas/sample_city_climate.csv')
print(sample_df.shape)

本文为 陈华 原创,欢迎转载,但请注明出处:http://edu.ichenhua.cn/read/289