> 文章列表 > 高德POI数据采集笔记

高德POI数据采集笔记

高德POI数据采集笔记

一、业务需求

采集某个城市的各种类型的POI数据,优化交通,探索了很长时间,查阅了很多资料,梳理一下思路做一下笔记。

二、采集思路

利用高德开发平台的搜索POI功能,我这里用到的是多边形搜索功能,即给定一个经纬度点构成的多边形,搜索多边形内的api

经过探索发现,多边形不能画的太大,返回的数据条数是有限制的,考虑到这个原因,进入如下方式的优化:

1、首先获取城市的行政区域边界

https://restapi.amap.com/v3/config/district?parameters

下载城市编码表

https://lbs.amap.com/api/webservice/download

根基城市编码表,输入不同行政区域请求上述的url,我是用下面的url进行请求的,替换申请的key和keywords参数,就可以获取指定行政区域的边界

https://restapi.amap.com/v3/config/district?platform=JS&key=&subdistrict=0&extensions=all&level=district&s=rsv3&output=json&keywords=天津市

2、对每个行政区域边界求最大最小经纬度,得到行政区域的矩形边界

3、对每个矩形边界按照100*100距离进行网格化

我实现的比较繁琐,方法比较笨,直接每个方向进行遍历的

def getBound(p1, p2):lng1 = (p1.split(',')[0]) + ","lat1 = (p1.split(',')[1]) + ";"lng2 = (p2.split(',')[0]) + ","lat2 = (p2.split(',')[1]) + ";"b1 = lng1 + lat1b2 = lng2 + lat1b3 = lng2 + lat2b4 = lng1 + lat2return b1 + b2 + b3 + b4 + b1from utils import gpsutilsdef getAllRec(leftUp, rightDown):nodeList = []recList = [];leftlng = float(leftUp.split(',')[0])leftlat = float(leftUp.split(',')[1])rightlng = float(rightDown.split(',')[0])rightlat = float(rightDown.split(',')[1])nodeList.append([leftlng, leftlat, rightlng, rightlat])recList.append([leftlng, leftlat, rightlng, rightlat])step_lat = 0.008step_lng = 0.008result_list = []while len(recList) > 0:item = recList.pop();# print(getBound(str(item[0]) + "," + str(item[1]), str(item[2]) + "," + str(item[3])))leftlng = item[0]leftlat = item[1]rightlng = item[2]rightlat = item[3]if item[2] - item[0] > step_lng and item[1] - item[3] > step_lat:midLng = round((item[0] + item[2]) / 2, 6)midLat = round((item[1] + item[3]) / 2, 6)nodeList.append([leftlng, leftlat, midLng, midLat])nodeList.append([leftlng, midLat, midLng, rightlat])nodeList.append([midLng, leftlat, rightlng, midLat])nodeList.append([midLng, midLat, rightlng, rightlat])recList.append([leftlng, leftlat, midLng, midLat])recList.append([leftlng, midLat, midLng, rightlat])recList.append([midLng, leftlat, rightlng, midLat])recList.append([midLng, midLat, rightlng, rightlat])else:print(gpsutils.calcDistance([item[1],item[0]],[item[3],item[2]]))result_list.append(item)return result_listlist = getAllRec("118.09338700,24.52395700", "118.17166400,24.44209500")
for item in list:print(getBound(str(item[0]) + "," + str(item[1]), str(item[2]) + "," + str(item[3])))

4、下载POI类型编码表

https://lbs.amap.com/api/webservice/download

5、遍历每个编码类型中的大类

6、遍历每个网络,对每个网络多边形搜索指定大类的POI

https://restapi.amap.com/v3/place/polygon?parameters

7、对返回结果进行解析,获取POI的各种属性

keywords这里没用到,可以去掉,poi_type表示类型,leftup,rightdown表示矩形边界点,具体实现时,采用的根据返回的POI个数是否大于200,如果大于200,则继续对网格进行切分,再次请求。

def requestPOIApiByPolygon(keyWords, key, poi_type, leftUp, rightDown):page_size = 25poi_list = []all_data = []  # 所有的POI数据recList = [];  # 矩形的栈leftlng = float(leftUp.split(',')[0])leftlat = float(leftUp.split(',')[1])rightlng = float(rightDown.split(',')[0])rightlat = float(rightDown.split(',')[1])recList.append([leftlng, leftlat, rightlng, rightlat])while len(recList) > 0:item = recList.pop();# print(getBound(str(item[0]) + "," + str(item[1]), str(item[2]) + "," + str(item[3])))leftlng = item[0]leftlat = item[1]rightlng = item[2]rightlat = item[3]polygon = str(leftlng) + "," + str(leftlat) + "|" + str(rightlng) + "," + str(rightlat)if getPOIApiCountByPolygon(keyWords, key, poi_type, polygon) > 200:midLng = round((item[0] + item[2]) / 2, 6)midLat = round((item[1] + item[3]) / 2, 6)recList.append([leftlng, leftlat, midLng, midLat])recList.append([leftlng, midLat, midLng, rightlat])recList.append([midLng, leftlat, rightlng, midLat])recList.append([midLng, midLat, rightlng, rightlat])else:for pageNum in range(page_size):URL = "https://restapi.amap.com/v3/place/polygon?keywords=" + keyWords + \\"&city=" + "xiamen" + \\"&output=json" + \\"&types=" + str(poi_type) + \\"&key=" + key + \\"&polygon=" + polygon + \\"&offset=" + str(page_size) + \\"&page=" + str(pageNum + 1)resp = requests.get(URL)res = json.loads(resp.text)if str(res['info']) == 'USER_DAILY_QUERY_OVER_LIMIT':print("请求次数超过限制")return;if pageNum == 0 and int(str(res['count'])) > 0:print("area#%s#count=%s" % (getBound(str(leftlng) + "," + str(leftlat), str(rightlng) + "," + str(rightlat)), res['count']))if int(str(res['count'])) > 0:# print(URL)for r in res['pois']:poi = {}poi['name'] = r['name']poi['id'] = r['id']if poi['id'] in poi_id_list_have:continuepoi['location'] = r['location'] + r";"poi['type'] = r['type']poi['pname'] = r['pname']poi['cityname'] = r['cityname']if poi['cityname'] != '厦门市':continuepoi['adname'] = r['adname']poi['address'] = r['address']poi['rec'] = polygonif len(poi_list) == 0 or poi['id'] not in pd.DataFrame(poi_list).iloc[:, 1]:poi_list.append(list(poi.values()))  # 把dict的values 转成list 添加到list列表# else:#     print("id=%s 已经存在" % (poi['id']))pageNum += 1time.sleep(random.randint(30, 60))else:break# print("当前poi个数:%d" % (len(poi_list)))if len(poi_list) > 0:all_data = pd.DataFrame(poi_list)return all_data

7、高德搜索POI获取到的POI的边界点

高德直接返回的没有边界点,但是在高德地图上有的能获取到边界点,经过分析之后发现,是根据POI的ID进行的请求

https://www.amap.com/detail/get/detail?id={}

但是直接请求,发现会给高德限制,无法批量获取,经过调研发现可以利用百度的边界点获取功能,获取到边界点。

具体做法:

  • 遍历高德POI的名称,根据名称请求百度的url,解析返回结果

    https://map.baidu.com/?newmap=1&reqflag=pcmap&biz=1&from=webmap&da_par=direct&pcevaname=pc4.1&qt=s&da_src=shareurl&wd=西雅图
    
  • 百度返回的结果的坐标是墨卡托坐标,需要对墨卡托坐标进行转换,转换方法参考
    百度墨卡托坐标转化