Skip to content

Cannot index some geo_shape geometries (before and after Geo Refactoring changes) #3909

@tommcintyre

Description

@tommcintyre

Probably FAO @chilling .

I have a mapping that includes some geo_shape fields. My test data contains GeoJSON fields that specify points, but do so in the form of polygons with the same lon/lat repeated 4 (or sometimes 3) times. This causes a validation exception to be thrown up from within Spatial4J when it generates the polygon.

I am not sure if this is technically invalid GeoJSON or not - however, this is a form that the Twitter API generates frequently (around 1 in 1500 tweets in my dataset), and other libraries I have used can parse it OK. It would be good if ES can attempt to do what it can with data like this, rather than failing (i.e. treat it as another appropriate type like a point, or relax the verification).

This is tested against the branch that includes the Geo-Refactoring improvements at chilling/elasticsearch@0369983 (it does also happen on master, if you can get that far!).

The stack trace:

org.elasticsearch.index.mapper.MapperParsingException: failed to parse [place]
    at org.elasticsearch.index.mapper.geo.GeoShapeFieldMapper.parse(GeoShapeFieldMapper.java:232)
    at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:515)
    at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:457)
    at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:515)
    at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:457)
    at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:515)
    at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:457)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:508)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:452)
    at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:341)
    at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:203)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:418)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    ... 1 more
Caused by: com.spatial4j.core.exception.InvalidShapeException: Too few distinct points in geometry component at or near point (-81.872495, 36.163117, NaN)
    at com.spatial4j.core.shape.jts.JtsGeometry.(JtsGeometry.java:90)
    at org.elasticsearch.common.geo.builders.BasePolygonBuilder.build(BasePolygonBuilder.java:153)
    at org.elasticsearch.index.mapper.geo.GeoShapeFieldMapper.parse(GeoShapeFieldMapper.java:219)
    ... 15 more

To reproduce:

curl -XPUT http://server:9200/dummyindex -d '
{
    "mappings": {
        "dummytype": {
            "properties": {
                "place": {
                    "type": "geo_shape",
                    "tree": "quadtree",
                    "precision": "10m"
                }
            }
        }
    }
}'

curl -XPOST http://server:9200/dummyindex/dummytype/1 -d '
{
    "place": {
        "coordinates": [[[-81.872495, 36.163117], [-81.872495, 36.163117], [-81.872495, 36.163117], [-81.872495, 36.163117]]],
        "type": "Polygon"
    }
}'

I wrote a nasty, hacky workaround to fix this particular case - see below. This is obviously not an acceptable general purpose solution, as it doesn't address the issue for any other shapes, or other cases like only having 2 or 3 distinct points. I am not really familiar enough with the libraries, but I guess a real fix might involve simplifying or normalizing each geometry somehow before it gets passed to Spatial4J?

Alternatively it could be deemed that this is not ES's job, and the GeoJSON needs to be more well-formed - but I think this is likely to be a common problem due to the source of this data, and because of the different expectations of different GeoJSON libraries.

ShapeBuilder.java:

-        protected static PolygonBuilder parsePolygon(CoordinateNode coordinates) {
+        protected static ShapeBuilder parsePolygon(CoordinateNode coordinates) {
             LineStringBuilder shell = parseLineString(coordinates.children.get(0));
+            
+            if (new HashSet<Coordinate>(shell.points).size() == 1)
+                return newPoint(shell.points.get(0));
+            
             PolygonBuilder polygon = new PolygonBuilder(shell.points);
             for (int i = 1; i < coordinates.children.size(); i++) {
                 polygon.hole(parseLineString(coordinates.children.get(i)));

...

         protected static MultiPolygonBuilder parseMultiPolygon(CoordinateNode coordinates) {
             MultiPolygonBuilder polygons = newMultiPolygon();
             for (CoordinateNode node : coordinates.children) {
-                polygons.polygon(parsePolygon(node));
+              polygons.polygon((PolygonBuilder)parsePolygon(node));
             }
             return polygons;
         }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions