背景

在我们的场景中，行政区划是电子围栏的一个特例。日常处理和行政区划有关的业务场景，举例如下

根据当前GPS位置求地理位置，如求停车点逆地理信息
根据车辆历史轨迹判断途径区域，如求长跑线路
根据指定城市求近邻城市，如求线路的近邻起止点

使用PostGIS函数

使用到的PostGIS函数如下

数据来源

行政区域边界找到这样一份公开数据，CSV格式如下

字段	描述
id	行政区划代码的简写形式
geo	城市中心坐标，高德地图GCJ-02火星坐标系。格式："lng lat" or "EMPTY"，少量的EMPTY（仅台湾的城市、国外）代表此城市没有抓取到坐标信息
polygon	行政区域边界，高德地图GCJ-02火星坐标系。格式："lng lat,...;lng lat,..." or "EMPTY"，少量的EMPTY（仅台湾的城市、国外）代表此城市没有抓取到边界信息；存在多个地块(如飞地)时用;分隔，每个地块的坐标点用,分隔

原始数据入库

使用Kettle把数据批量入库，针对原始数据的特征进行清理（这部分代码不通用）

警告对国家领土边界进行数据抽稀，仅仅作为验证技术可行性的研究行为，不能用于其他任何场景！！！

drop table if exists demo.t_ok_geo;
create table if not exists demo.t_ok_geo(id int, ext_path text, geo text, polygon text);

-- 提前处理掉一部分无效数据。原始数据有超过6位的行政区划编码
delete from demo.t_ok_geo og where id >= 1000000 or geo = 'EMPTY';

-- 提前处理掉一部分无效数据。区划代码补齐之后的重复数据，保留省市区信息最多的一条
with duplicated_rows as
(
	select row_number() over (partition by rpad(og.id::text, 6 , '0') order by length(og.ext_path) desc) as rn, og.* 
	from demo.t_ok_geo og 
)
delete from demo.t_ok_geo og where og.id in (	select id from duplicated_rows du where rn >1)
;

查询样本数据

select id, ext_path, geo, left(polygon, 30) from demo.t_ok_geo limit 3 ;
   id   |       ext_path       |         geo          |              left
--------+----------------------+----------------------+--------------------------------
     11 | 北京市               | 116.405285 39.904989 | 116.812384 39.615914,116.81208
   1101 | 北京市 北京市        | 116.405285 39.904989 | 116.244694 39.517344,116.24463
 110101 | 北京市 北京市 东城区 | 116.418757 39.917544 | 116.387663 39.960923,116.38947
(3 rows)

初始化电子围栏

PostGIS中使用多边形Polygon类型保存围栏信息，根据我们的原始数据特征，需要注意

原始数据使用GCJ-02火星坐标系，需要转换成WGS84坐标系；
Polygon类型必须是闭合的多边形，但是原始数据并不是闭合线段，需要额外处理；
一个区划可能有多个独立的多边形，所以需要使用MultiPolygon类型；

GCJ-02转WGS84

初始化代码中需要使用 geoc_gcj02towgs84() 这个函数，把GCJ02转换为WGS84坐标系。这一套转码工具来自这里，我们对源码进行了一点点修改

默认函数是建立在public模式下，但是我们仅仅是临时使用，所以放在当前模式下面
屏蔽了raise语句，避免刷屏拖慢执行速度。因为执行速度真的很慢

建立围栏表

drop table if exists demo.t_area_fence;

create table if not exists demo.t_area_fence
(
	id int, 
	ext_path text, 
	area_level int,
	province text, 
	city text, 
	district text, 
	center_point geometry(point, 4326), 
	fence_polygon geometry(multipolygon, 4326),
	primary key (id)
);

初始化围栏数据

算法很简单，有几个点需要注意

一个区域有多个多边形的情况，需要按分隔符拆分为多条数据，分别处理之后再聚合；
非闭合线段，必须先改造成闭环线段，然后才能构造多边形；
初始化涉及到拓扑抽稀算法，会在另外的笔记中进行讲解；

警告对国家领土边界进行数据抽稀，仅仅作为验证技术可行性的研究行为，不能用于其他任何场景！！！

with fence_line as
(
	select 
		-- 统一原始id长度
		rpad(og.id::text, 6 , '0')::int as id,
		replace(ext_path, ' ', '-') as ext_path,  
		ST_GeomFromText( 'point('|| geo || ')', 4326 ) as center_point,
		-- 拆分多段数据为单行，然后构造Linestring对象
		ST_GeomFromText( 'LineString('|| unnest(string_to_array(polygon, ';')) || ')', 4326 ) as fence_line
	from demo.t_ok_geo og  
),
-- 构造闭合线段，根据需要进行拓扑抽稀
fence_close as
(
	select fl.*,
			-- 如果Linestring的首尾点不同，就无法构造Polygon。所以把第一个点复制到最后，形成闭环
			case
				-- 原始值
				when st_equals(st_startpoint(fl.fence_line), st_endpoint(fl.fence_line)) then fl.fence_line
				else st_addpoint(fl.fence_line, st_startpoint(fl.fence_line) )
				-- 抽稀值
				--when st_equals(st_startpoint(fl.fence_line), st_endpoint(fl.fence_line)) then st_simplifypreservetopology(fl.fence_line, 0.001)
				--else st_simplifypreservetopology( st_addpoint(fl.fence_line, st_startpoint(fl.fence_line) ), 0.001)
			end as closed_line
		from fence_line fl
)
-- 根据原始数据坐标进行转换，如GCJ02转WGS84。转换之后重新设置SRID。
insert into demo.t_area_fence
select fc.id, fc.ext_path,
	array_length(string_to_array(fc.ext_path, '-'),1) as area_level,
	split_part(fc.ext_path, '-', 1) as province, 
	split_part(fc.ext_path, '-', 2) as city ,
	split_part(fc.ext_path, '-', 3) as district,
	-- 原始值
	--fc.center_point,  
	--st_multi(st_union(array_agg(st_makepolygon(fc.closed_line))))  as fence_polygon
	-- 转换值
	st_setsrid(geoc_gcj02towgs84(fc.center_point),4326) as center_point,
	st_setsrid(geoc_gcj02towgs84(st_multi(st_union(array_agg(st_makepolygon(fc.closed_line))))),4326)  as fence_polygon
from fence_close fc
group by fc.id, fc.ext_path, fc.center_point
;

应用围栏数据

判断当前位置

判断当前位置是否在指定的区域内

select af.ext_path, 
	st_within(ST_GeomFromText( 'point(116.418757 39.917544)', 4326 ) , af.fence_polygon)
from demo.t_area_fence af 
where af.district ~ '(东城区|朝阳区)'
;
       ext_path       | st_within
----------------------+-----------
 北京市 北京市 东城区 | t
 北京市 北京市 朝阳区 | f
 吉林省 长春市 朝阳区 | f
(3 rows)

逆地理信息（粗略版）

根据坐标点查询地址，注意

因为我们只有区划的边界，所以无法查询详细的地址信息。如果要查询更详细地址，需要使用OSM数据或者在线API。
因为我们同时有省市区的边界，所以一个地址会被包括在多个区域内。
- 如东城区的中心，同时会出现在东城区和北京市的围栏中。
- 因为省市区的区划代码有大小顺序，所以我们取区划代码最大的一个，即最里面的围栏。

select af.id, af.ext_path as pass_through
from demo.t_area_fence af 
where 
	st_within('srid=4326;point(116.418757 39.917544)'::geometry , af.fence_polygon)
order by id desc
limit 1
;
   id   |     pass_through
--------+----------------------
 110101 | 北京市 北京市 东城区
(1 row)

求近邻城市

这个例子仅仅是演示一下可以这么做，更有效率的做法是采用KNN（K近邻）算法。

-- 成都市周边50km内的城市，转换为geography是为了直接使用米作为距离单位
select city
from demo.t_area_fence af 
where area_level = 2
	and st_dwithin( 
		(select center_point from demo.t_area_fence f where f.city = '成都市' and area_level = 2)::geography, 
		af.fence_polygon::geography, 
		1000*50)
;
  city
--------
 眉山市
 成都市
 德阳市
(3 rows)

KNN查询的语法在查询的ORDER BY子句中放置了一个特殊的"基于索引的距离运算符"，在本例中为"<->"。有两种基于索引的距离运算符:

<-> 表示边界框中心之间的距离
<#> 表示边界框边界之间的距离

-- KNN算法求近邻的5个城市
select city
from demo.t_area_fence af 
where area_level = 2
order by fence_polygon <-> (select center_point from demo.t_area_fence f where f.city = '成都市' and area_level = 2)
limit 5
;

求轨迹途径区域

结合已经整理过的车辆轨迹数据，可粗略的求车辆在指定时间内途径的区域。轨迹和围栏有交集，我们就认为是途径。

select distinct on (af.id) af.id, af.ext_path as pass_through
from demo.t_taxi_trajectory tr, demo.t_area_fence af 
where tr.tid = 28
	and tr.dt = '2008-02-02'
	and st_intersects(tr.traj_simp, af.fence_polygon)
order by id desc
;
   id   |     pass_through
--------+----------------------
 110114 | 北京市 北京市 昌平区
 110113 | 北京市 北京市 顺义区
 110105 | 北京市 北京市 朝阳区
 110101 | 北京市 北京市 东城区
 110100 | 北京市 北京市
 110000 | 北京市
(6 rows)

PostGIS_小白笔记_行政区划边界

背景