Skip to content

Conversation

@nknize
Copy link
Contributor

@nknize nknize commented May 14, 2015

While the GeoJSON spec does say a polygon is represented as an array of LinearRings (where a LinearRing is defined as a 'closed' array of points), it is a simple change to close the polygon for users. This addresses situations like those integrated with twitter (where GeoJSON polygons are not closed) such that our users do not have to write extra code to close the polygon.

closes #11131

@nknize nknize added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes v1.5.3 labels May 14, 2015
@rjernst
Copy link
Member

rjernst commented May 14, 2015

How can you tell the difference between they left off the closing point, vs their serialization to json didn't write the entire polygon? Maybe this behavior should be controlled by a setting, as it relaxes restrictions but allows potentially incorrect data to be added?

@lababidi
Copy link
Contributor

@rjernst I understand what you mean. That then puts the onus on the person managing elasticsearch to have the correct configuration or setting. I believe the responsibility should be on the the person handling the data in the first place. If they write data to elasticsearch and find it's incorrect, they can check their serialization method to verify it's producing correct outputs. This will still fail if the polygon only contains 1,2, or 3 points. I can see why this type of validation is necessary, but again, I believe that validation should be on the json serialization not in elasticsearch.

@nknize nknize added v1.6.1 and removed v1.5.3 labels Jun 9, 2015
@nknize
Copy link
Contributor Author

nknize commented Jun 9, 2015

Since before ElastiCon there has been a lot of talk about maturing ES to make it harder to "hurt yourself". The more I think about this issue the more I feel like we're opening a door for users to do exactly that.

In the spirit of user safety I like the idea of adding a setting to control this behavior. Though we're also trying to thin out the already massive number of settings for geo types. @clintongormley thoughts?

@lababidi
Copy link
Contributor

lababidi commented Jun 9, 2015

@nknize @clintongormley I hear what you're saying.

If you do add a setting, is the default going to allow "loose" GeoPoints, such as the non-closed ones?

To elaborate:
The use case I have in mind would be an out of the box install of elasticsearch to then push twitter/gnip data right in after setting the mapping. Prior to this merge request ElasticSearch will fail this use case on this ingest. If this setting is created, I would strongly encourage the default setting to allow GeoPoints to be non-strict/non-closed.

@clintongormley
Copy link
Contributor

@nknize we have two settings on (eg) numeric fields:

  • ignore_malformed - don't throw an exception if the data is malformed, but just ignore this field value
  • coerce - try to coerce eg a string to a number, or a double to a long

Wondering if we should allow the same settings for geo-shapes, where coerce:true would auto-close? (I think I'd default this setting to false though)

@rjernst
Copy link
Member

rjernst commented Jun 9, 2015

+1 to reusing settings (with documentation for what it means to "coerce" for geo points) and to keep the default sane (ie dont coerce by default).

@nknize
Copy link
Contributor Author

nknize commented Jun 9, 2015

++ I like this idea for consistency across mappings.

@lababidi
Copy link
Contributor

What's the status on this merge?

@clintongormley clintongormley changed the title [GEO] Update ShapeBuilder and GeoPolygonQueryParser to accept non-closed GeoJSON Update ShapeBuilder and GeoPolygonQueryParser to accept non-closed GeoJSON Jul 14, 2015
@nknize nknize force-pushed the enhancement/11131 branch from c8c4b9f to d5e6767 Compare July 14, 2015 21:33
@nknize nknize added v1.7.0 and removed v1.6.1 labels Jul 14, 2015
@nknize
Copy link
Contributor Author

nknize commented Jul 14, 2015

Code updated to use new mappings enhancements and optional coerce parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a map to store two options? can we not just store the options in two local variables here. If/when this expands into more parse options and its too big to pass around as separate variables we could create a class to hold these options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colings86 absolutely. I debated this myself. In the end I made the decision to put it in now for the flexibility of adding future options by just adding them to the hashmap. Since the overhead of the map is greater than the current parameters I can take it out and add it later when needed.

@colings86
Copy link
Contributor

Left a comment. I actually think coerce should default to true since thats the default for numbers. Should we not be consistent with numerics? /cc @clintongormley

@clintongormley
Copy link
Contributor

I think I agree. It'll also give a better OOB experience with the multitude of bad geodata out there.

@rjernst
Copy link
Member

rjernst commented Jul 15, 2015

I disagree. This would give a worse OOB experience. The point of this setting is to allow not following the standard (closing the shapes) which leaves room for corrupt serialization of shapes (eg omitted some points) leaving a corrupt shape (because we "auto closed" it). We shouldn't be silent about it, the shape is corrupt! We should call the setting something different if we can't agree on this. The OOB experience needs to be "follow the standard or you get an error", otherwise we would silently allow corrupt data into the index.

@nknize
Copy link
Contributor Author

nknize commented Jul 15, 2015

I agree that geodata differs from numerics in the context of 'coerce'. Even though its fundamentally two numeric elements they're also bounded by the spherical coordinate system. Defaulting coerce to true means ES will happily do more work just to accomodate non-sensical geodata. If the user really wants ES to "fix" bad data then they can explicitly set coerce to true. But IMHO I don't think we should take the default expectation that all geodata ES receives will be bad.

@clintongormley
Copy link
Contributor

OK - then coerce false

@martijnvg martijnvg added v1.7.1 and removed v1.7.0 labels Jul 16, 2015
@martijnvg
Copy link
Member

Bumping the version up to 1.7.1 for the today's release.

@colings86
Copy link
Contributor

LGTM

…d GeoJSON

While the GeoJSON spec does say a polygon is represented as an array of LinearRings (where a LinearRing is defined as a 'closed' array of points), the coerce parameter provides users with flexibility to have ES automatically close polygons. This addresses situations like those integrated with twitter (where GeoJSON polygons are not closed) such that our users do not have to write extra code to close the polygon. This code change adds the optional coerce parameter to the GeoShapeFieldMapper.

closes elastic#11131
@cnwarden
Copy link

coerce: https://www.elastic.co/guide/en/elasticsearch/reference/current/coerce.html, does it only support latest version? how about 1.7.3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement v2.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accept non-closed GeoJSON polygons

8 participants