转载

自定义经纬度索引(非RTree、Morton Code[z order curve]、Geohash的方式)

Custom Indexing for Latitude-Longitude Data

网络上有一些经纬度索引的文章，讨论的方法是用Geohash、RTree、Z-Order Curve（morton Code）等算法来将经纬度由二维变为一维的索引。

这篇文章介绍一种使用二维数据索引的方式来创建经纬度索引。前半部分讨论一种自定义的、使用二维数组的方式将经纬度变为一维数据，来加快索引。

文章最后实现了查找附近几公里的方法。常见的一些应用是根据某个经纬度点，计算出一定范围内的其他坐标点。这些坐标点是景区、商场等。中文网络里，有讲GeoHash、RTree的一些文章，但都不够全面，只是介绍了如何实现Geohash或Rtree，但没有介绍到最后一个终极目标，即给定一个经纬度 latitude/longitude和距离radius，如何实现查找一定范围内的所有的点。

本文来源于MSDN杂志的一篇文章，请参考：

https://msdn.microsoft.com/en-us/magazine/jj133823.aspx

它有提供源代码下载，但我去下载的时候，提供的是错误的代码。

通常，我们在数据库里会有一张表保存如下的关系：

Id latitude - longitude

0001 47.610 - 122.330

0002 36.175 - 115.136

0003 47.613 - 122.332

Id代表某个主键，可以是一个小吃店、商场等。Latitude – longitude 表示该ID所对应的坐标。

一些应用场景包括：

1）要找出在一定经纬范围内的所有ID：

SELECT Id

FROM IdInfoTable

WHERE latitude >= 47.0 AND latitude < 48.0

AND longitude >= -123.0 AND longitude < -122.0

2）还有一些应用是用户会根据自己的坐标点来查找，比如我在 47.003 – 122.311附近，我要找附近5公里范围的所有小吃店。一种最笨的方法是逐一遍历表里每一条数据，如果2个坐标点的距离小于5就把这个ID留下作为搜索结果输出。根据2个经纬度坐标点来计算2个经纬度之间距离的方法，可以参考：

http://www.cnblogs.com/softfair/p/distance_of_two_latitude_and_longitude_points.html

另外一种简单的想法是：

我们可以考虑把一对经纬映射到一个区域ID(sectorID)。

latitude – longitude Sector ID

47.610 - 122.330 49377

36.175 - 115.136 45424

47.613 - 122.332 49377

Id Sector ID

如果某些经纬度都映射到统一个SectorID，那这个ID，我都可以返回。设想有如下一个应用。我根据地图上红色坐标点，得到一个SectorID，假设为49377，我本地保存的信息表IdInfoTable里，也把经纬度信息映射为SectorID，那我做查询：

   SELECT  Id

   FROM    IdInfoTable

   WHERE   sector = 49377

就能立刻找到我需要的哪些ID是与图上红色点经纬度相同的一个区域的。

自定义经纬度索引(非RTree、Morton Code[z order curve]、Geohash的方式)

经纬度一种简单的划分，就是二维数组：

首先第一步，就是考虑如何将全球的经纬度划分开，比如我这里是每0.5度划分一小格。如下图，fraction = 0.5 表示间隔数。即每一度间隔0.5。

纬度从-90 到 90度。可以划分为：

[-90.0, -89.5)

[-89.5, -80.0)

[-80.0, -79.5)

….

[89.5, 90.0]

最后一个是闭合区间，让它包含90度。如下表格的第一列所示，这样纬度间隔总共有360度。

90-(-90)/0.5=360

经度从-180 到 180度，按照0.5间隔划分，可以表示为：

[-180.0, -179.5)

[-179.5, -170.0)

…..

[179.5, 180.0]

最后一个是闭合区间，让它包含180度。如下表格第一行所示。这样纬度间隔总共有720度。

180-(-180)/0.5=720

表格的行代表纬度，一行代表0.5度的跨度。

表格的列代表经度，一列代表0.5度的跨度。

表格总共有360 * 720 = 259,200个，序号从 0 到259,199。

fraction = 0.5	[-180.0, -179.5) 0	[-179.5, -170.0) 1	[-170.0, -169.5) 2		[179.5, 180.0] 719
[-90.0, -89.5) -> 0	0	1	2	...	719
[-89.5, -89.0) -> 1	720	721	722	...	1439
[-80.0, -79.5) -> 2	1440	1441	1442	...
	...

[89.5, 90.0] -> 359	258,480			...	259,199

表1

上面使用了fraction=0.5来划分全球的经纬度。你也可以使用更小的间隔来划分全球的经纬度。这个会得到更多的sectorID。更甚者纬度行使用0.5来划分，经度列用0.2来划分。（最好不要使用0.1这类计算机无法精确表示的浮点数。）这里演示只用0.5。

如果有一个纬度latitude的row索引 ri，有一个经度longitude的列col索引ci，则sectorID是：

sid = (ri * 360 * 1/fraction) + ci

例如：上表格里，Sector 1441的行索引 ri 是2，列索引 ci 是1。

(2 * 360 * 1/0.5) + 1 = (2 * 720) + 1 = 1441.

360 * 1/fraction 这个因子决定了每一行有多少个列（经度360度分为多少份）。这个因子乘以ri就是这一行第一个sectorID。加上ci 就是往该行偏移ci个列，就是最终的SectorID。

下一个难题是如何找到经纬度的索引ri 和ci。笨的办法是遍历所有经纬度找到合适的索引，例如遍历一遍纬度，如上表格从0 到359遍历找到给定纬度的ri，再从经度的0到719遍历，找到给定经度的ci。遍历的多少取决于切分的粒度，切的越细，遍历区间越大。另外一种就是二分查找。

代码如下，纬度索引返回的是行ri值，经度索引返回的是列ci值。它们2个函数是相似的，区别只是在LonIndex函数中把常量180需要替换为360；-90 替换为-180；90替换为180。

static int LatIndex(double latitude, double fraction)     {  latitude = Math.Round(latitude, 8);  int lastIndex = (int)(180 * (1.0 / fraction) - 1);  double firstLat = -90.0;  if (latitude == -90.0) return 0;  if (latitude == 90.0) return lastIndex;  int lo = 0;  int hi = lastIndex;  int mid = (lo + hi) / 2;  int ct = 0; // To prevent infinite loop  while (ct < 10000)  {    ++ct;    double left = firstLat + (fraction * mid); // Left part interval     left = Math.Round(left, 8);    double right = left + fraction;    // Right part interval     right = Math.Round(right, 8);    if (latitude >= left && latitude < right)      return mid;    else if (latitude < left)    {      hi = mid - 1; mid = (lo + hi) / 2;    }    else    {      lo = mid + 1; mid = (lo + hi) / 2;    }  }  throw new Exception("LatIndex no return for input = " + latitude);     }

上面方法里经纬度用的double。Double、float在计算机里有精度问题，某些数据是表示不正确的，一般也不能用 = 来进行相等比较，例如 double a=-1, b=-1, 这时候计算机判断 a==b就是false。还有一些除不尽的数也是不能直接=比较的。但可以截取一个有效位数，然后进行比较。为避免equality-of-two-type-double 错误，2个double或float类型数据是不能比较大小的。所以这里取一个8位有效数字，然后进行比较。上面的函数的主循环里检查[a,b) 这个区间。所以需要显示的判断 90这个最后一个值。

有了上面的方法介绍，下面就可以把经纬度映射到sectorID了。

static int LatLonToSector(double latitude, double longitude,    double fraction)  {    int latIndex = LatIndex(latitude, fraction); // row    int lonIndex = LonIndex(longitude, fraction);  // col    return (latIndex * 360 * (int)(1.0 / fraction)) + lonIndex;  }

Sector 面积的问题。

因为根据上面fraction分割经纬度的方法是将地球平面平均的分割为若干个小块。但这些小块的面积是不一样大小的。如下图：纬线之间是平行的，每一纬度之间的距离，例如A的距离、B的距离大约是111KM。经线之间是在赤道上距离远，靠近极点附近的距离近，例如C的距离是比D的距离要小的。所以一个小块的面积在不同纬度上是不一样的。

自定义经纬度索引(非RTree、Morton Code[z order curve]、Geohash的方式)

图1

2个坐标点之间的距离计算方法如下：参考：经纬度距离计算（给定2个经纬度，计算距离） http://www.cnblogs.com/softfair/p/distance_of_two_latitude_and_longitude_points.html

public static double Distance(double lat1, double lon1,   double lat2, double lon2) {  double r = 6371.0; // approx. radius of earth in km  double lat1Radians = (lat1 * Math.PI) / 180.0;  double lon1Radians = (lon1 * Math.PI) / 180.0;  double lat2Radians = (lat2 * Math.PI) / 180.0;  double lon2Radians = (lon2 * Math.PI) / 180.0;  double d = r * Math.Acos((Math.Cos(lat1Radians) *  Math.Cos(lat2Radians) *  Math.Cos(lon2Radians - lon1Radians)) +    Math.Sin(lat1Radians) * Math.Sin(lat2Radians)));  return d; }

所以，如果知道一个Sector的起始经纬度(latitude, longitude)是能够计算这个sector的长、宽和面积的。

以下是把sectorID转化为起始纬度的函数：

static double SectorToLat(int sector,    double fraction)  {    int divisor = 360 * (int)(1.0 / fraction);    int row = sector / divisor;    return -90.0 + (fraction * row);  }

这个函数是LatLonToSector的一个逆向操作。例如给定sectorID=1441，fraction=0.5如上表格，

divisor = 360 * 1.0 / 0.5 = 720 表示有多少个列，是经度的间隔。1441 / 720 = 2 就是行row索引ri。-90.0 + 0.5 * 2 = -90.0 + 1.0 = -89.0 就是[-89.0, -79.5)这个区间的起始纬度。

获得起始经度的方法是类似的：

static double SectorToLon(int sector, double fraction)  {    int divisor = 360 * (int)(1.0 / fraction);    int col = sector % divisor;    return -180.0 + (fraction * col);  }

sector % divisor 取模表示所在的列ci。

根据以上2个方法，可以计算面积：

static double Area(int sector, double fraction)  {    double lat1 = SectorToLat(sector, fraction);    double lon1 = SectorToLon(sector, fraction);    double lat2 = lat1 + fraction;    double lon2 = lon1 + fraction;    double width = Distance(lat1, lon1, lat1, lon2);    double height = Distance(lat1, lon1, lat2, lon1);    return width * height;  }

临近区块(Adjacent Sectors)

有时候，找到当前Sector是不够的，还需要找到临近的Sector。例如在721块的时候，临近Sector是0, 1, 2, 720, 722, 1440, 1441 和 1442共八个块。左右Sector就是SectorId减或加1得到。上下的Sector就是SectorId减加一行所具有的列数总数即360 * (1 / fraction)。另外需要注意的几个特殊Sector，他们可能是四个角上的、或者是第一行的sector、或者是最后一行的sector，它们临近的sector是不全的。需要额外判断。以下是判断临近块的方法：

static bool IsLeftEdge(int sector, double fraction)  {    int numColumns = (int)(1.0 / fraction) * 360;    if (sector % numColumns == 0) return true;    else return false;  }  static bool IsRightEdge(int sector, double fraction)  {    if (IsLeftEdge(sector + 1, fraction) == true) return true;    else return false;  }  static bool IsTopRow(int sector, double fraction)  {    int numColumns = (int)(1.0 / fraction) * 360;    if (sector >= 0 && sector <= numColumns - 1) return true;    else return false;  }  static bool IsBottomRow(int sector, double fraction)  {    int numColumns = (int)(1.0 / fraction) * 360;    int numRows = (int)(1.0 / fraction) * 180;    int firstValueInLastRow = numColumns * (numRows - 1);    int lastValueInLastRow = numColumns * numRows - 1;    if (sector >= firstValueInLastRow && sector <= lastValueInLastRow)      return true;    else      return false;  }

下面的函数是返回临近点的函数。返回数组里，

result[0]是左上角的SectorId，

result[1]是正上方的SectorId，

result[2]是右上角的SectorId，

result[3]是正左方的SectorId，

result[4]是正右方的SectorId，

result[5]是左下角的SectorId，

result[6]是正下方的SectorId，

result[7]是右下角的SectorId，

static int[] AdjacentSectors(int sector, double fraction) {   int[] result = new int[8]; // Clockwise from upper-left   int numCols = (int)(1.0 / fraction) * 360;   int numRows = (int)(1.0 / fraction) * 180;   int firstValueInLastRow = numCols * (numRows - 1);   int lastValueInLastRow = numCols * numRows - 1;   // General case   if (IsLeftEdge(sector, fraction) == false &&  IsRightEdge(sector, fraction) == false &&  IsTopRow(sector, fraction) == false &&  IsBottomRow(sector, fraction) == false)   {  result[0] = sector - numCols - 1;  result[1] = sector - numCols;  result[2] = sector - numCols + 1;  result[3] = sector - 1;  result[4] = sector + 1;  result[5] = sector + numCols - 1;  result[6] = sector + numCols;  result[7] = sector + numCols + 1;  return result;   }   // Deal with special cases here. See code download.   throw new Exception("Unexpected logic path in AdjacentSectors"); }

有时候，仅仅查找临近的8个节点是不够的。例如：在高纬度地区。如下图所示，中心区域是红色的，它四周的8块Sector是绿色的。如果Fraction很小，导致红色区域的宽和高都比较小，例如有可能红色Secotor的宽高是2*10，则当需要查找附近10公里的时候，宽度需要向外延生，比如说左边，需要向外延生5个临近快。因为如果当前的点是红色Sector的起始点，即红色块左下角点时，左边就需要向外延生5*2公里。这样才能完全覆盖住半径10公里的范围。也就是下图中黑色线条标出的左边方向的5个Sector，每个Secotr的宽是2。黑色线条就一个10公里半径（以红色Sector左下角作为圆心）。

这样扩展后，会找到更多的点，包括哪些可能不是10公里范围内的点。那就需要再用距离公式对这些范围内的点再做精确的距离计算，来剔除掉一些额外点，保留下尽在范围的点。

自定义经纬度索引(非RTree、Morton Code[z order curve]、Geohash的方式)

/// <summary>         /// 根据经纬度、半径查找临近节点，会递归查找，找到该半径区域大小的所有Sector。         /// </summary>         /// <param name="latitude">纬度</param>         /// <param name="longitude">经度</param>         /// <param name="radius">半径</param>         /// <param name="fraction"></param>         /// <returns>SectorID</returns>         public HashSet<int> FindClosedAdjacentSectors(double latitude, double longitude, double radius)         {  if (radius > 200.0)  {      throw new Exception("半径太大"); //超过200公里，国内查找实际意义不太大。   }  //这里使用List来保存所有的sector，再最后输出的时候再去重复。  //因为下面的查找用的是深度优先递归查找，实际测试中发现，如果用HashSet会遗漏掉几个点。  var allFoundSectorIdSet = new List<int>();  var centralSectorId = MapLonLatCalc.LatLonToSector(latitude, longitude, Fraction);  allFoundSectorIdSet.Add(centralSectorId);  //计算当前sector的宽和高  var w = 0.0; var h = 0.0;  MapLonLatCalc.SectorArea(centralSectorId, Fraction, out w, out h);  GetAdjacentSectorsRecursively(centralSectorId, allFoundSectorIdSet,radius+w,radius+h, 200);//给宽度+w，高度+h，冗余，不同经纬度上多找一些。以免遗漏  return new HashSet<int>(allFoundSectorIdSet);         }

在这里，我做递归查找时，将宽和高都额外给了一个距离，这个距离是中心块的宽和高。上面介绍的 SectorToLat和 SectorToLon得到的 lat和 lon都是 Sector左上角的坐标位置（如下图蓝色点位置），一个 Sector是一个矩形。如果我们输入的源点是蓝色点，那递归查找时，无需额外的增加宽和高。但如果我们输入的源点是红色的点的位置，在做半径 10公里这类查找时，我们就需要额外的把红色点所在的 Sector的宽和高给补足进去。代码里计算的距离的参考点都是以蓝色的点为基准的。

自定义经纬度索引(非RTree、Morton Code[z order curve]、Geohash的方式)

以上代码使用List<int> 来保存所有找到的SectorId，会存在重复的Sector，原来使用的HashSet保存，直接在查找时就去重复，但测试后发现一些问题。例如上图，中心Sector是0，首先找到周边8个Sector并保存，然后开始深度搜索递归，从Sector1首先开始，而Sector1它会优先查找到天蓝色线条所在的Sector，当找到Sector7的时候，Sector7又深度优先，会直接把Sector2找到保存下来，当Sector1周边所有都找遍完了，轮到Sector2来递归查找时，发现Sector2已经找过了，就跳过了。导致Sector8没有被包含进来。这就出现了错误。所以现在使用List保存所有Sector，最后再去重复。

递归查找：

/// <summary> /// 递归查找临近区域的SectorId /// </summary> /// <param name="sectorId">当前SectorID</param> /// <param name="allSectors">已经存在的SectorId，每找到一个就存入</param> /// <param name="accumulatedWidth">累计的宽度，这里其实是剩余的宽度</param> /// <param name="accumulatedHeight">累计的高度，这里其实剩余的宽度</param> /// <param name="recursiveTime">递归次数，避免死死循环</param> /// <param name="fraction"></param> private void GetAdjacentSectorsRecursively(int centorSectorId, List<int> allSectors,double remainWidth,double remainHeight, int recursiveTime) {     if (recursiveTime <= 0)     {         return;     }     var result = MapLonLatCalc.AdjacentSectors(centorSectorId, Fraction);     for (int i = 0; i < result.Length; i++)     {         var sId = result[i];         if (sId > -1)         {  allSectors.Add(sId);  //计算当前sector的宽和高  var w = 0.0; var h = 0.0;  MapLonLatCalc.SectorArea(sId, Fraction, out w, out h);  var leftWidth = remainWidth - w;//还剩余多少没有找到？如果>0,就需要递归查找，低纬度与高纬度值不一样。  var leftHeight = remainHeight - h; //看看剩余的高，高是同一经线上的2个维度距离。所以高是平均相等的。  //只要有剩余的宽和高，就递归去找。可多找，不可少找  // 这里有个问题，在极点附近lat接近90 的时候，Sector就是几个，无论怎么递归查找，leftWidth 和leftHeight有可能永远>0.  //所以在数据源上要做限制，不允许有极点附近的坐标。还好，目前国内或国外的查找点在极点附近基本没有什么。  if (leftWidth > 0 || leftHeight > 0)  {      GetAdjacentSectorsRecursively(sId, allSectors,leftWidth,leftHeight, recursiveTime - 1);  }         }     } }

在极点附近的时候，这里有个问题，在极点附近lat接近90 的时候，Sector就是几个，无论怎么递归查找，leftWidth 和leftHeight有可能永远>0.

所以在数据源上要做限制，不允许有极点附近的坐标。还好，目前国内或国外的查找点在极点附近基本没有什么。

测试程序：

随机生成一批坐标点，纬度范围控制在 -85 到85之间。目前地球上85度纬度以上的地区基本无有效信息可查。微软Bing地图也在这个范围内。

static void InitLatLon() {  Random r = new Random(0);  //initialDataFile="..//..//UserIDLatLon.txt";  FileStream ofs = new FileStream(initialDataFile, FileMode.Create);  StreamWriter sw = new StreamWriter(ofs);  for (int i = 0; i < 1000000; ++i)  {   double lat = (85.0 - (-85.0)) * r.NextDouble() + (-85.0);   double lon = (180.0 - (-180.0)) * r.NextDouble() + (-180.0);   sw.WriteLine(i.ToString().PadLeft(6, '0') + "," +     lat.ToString("F8") + "," + lon.ToString("F8"));  }  sw.Close(); ofs.Close(); }

MyCusterInfo 是模拟现实中的一个实际业务中可能需要的一些地理信息。它除了ID、经纬度坐标外，可能还包含其他信息，这里用ExtraInfo表示。

public class MyCusterInfo {  public int Id  {   get;   set;  }  public double LAT  {   get;   set;  }  public double LON  {   get;   set;  }  /// <summary>  /// 额外信息  /// </summary>  public object ExtraInfo  {   get;   set;  } }

从文件中把经纬度信息读取出来放在内存中：

Dictionary<int, MyCusterInfo> myLatLonCusterInfo = new Dictionary<int, MyCusterInfo>(1000000); //initialDataFile = "..//..//UserIDLatLon.txt";                 FileStream ifs = new FileStream(initialDataFile, FileMode.Open);                 StreamReader sr = new StreamReader(ifs);                 string line = "";                 string[] tokens = null;                 while ((line = sr.ReadLine()) != null)                 {  tokens = line.Split(',');  int id = int.Parse(tokens[0]);  double lat = double.Parse(tokens[1]);  double lon = double.Parse(tokens[2]);  bool vld = false;  double vlat = -1.0, vlon = -1.0;  vld = findObjects.ConvertToValidLatLon(lat, lon, out vlat, out vlon);  if (vld)  {      var obj=new MyCusterInfo { Id = id, LAT = vlat, LON = vlon };      myLatLonCusterInfo[id]=obj;  }                 }