-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Data Management/Ingest NodeExecution or management of Ingest Pipelines including GeoIPExecution or management of Ingest Pipelines including GeoIP:Search Foundations/MappingIndex mappings, including merging and defining field typesIndex mappings, including merging and defining field types>regressionTeam:Search FoundationsMeta label for the Search Foundations team in ElasticsearchMeta label for the Search Foundations team in Elasticsearch
Description
Elasticsearch version: 2.2.0
Description of the problem including expected versus actual behavior: Mapper attachments plugin (or ingest-attachment) works with Text of PDF, but not with the Office formats.
Steps to reproduce:
- Install mapper-attachments plugin
- Index a Word (
.docxdocument) - Look at logs
DEBUGlevel.
Logs:
[2016-02-29 16:43:39,341][DEBUG][mapper.attachment ] Failed to extract [100000] characters of text for [null]: [Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@51667d8a]
...
Caused by: java.lang.IllegalStateException: access denied ("java.lang.RuntimePermission" "getClassLoader")
at org.apache.xmlbeans.XmlBeans.getContextTypeLoader(XmlBeans.java:336)
...
Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "getClassLoader")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
Analysis:
As recent Office documents are now xml based (.docx, .xlsx...), Tika can not read them anymore in the context of elasticsearch because getClassLoader call is forbidden.
Reported by many users at https://discuss.elastic.co/t/no-hits-when-do-a-text-search-in-an-attachment-for-docx-file/41779
Switching to .doc legacy format works well.
Metadata
Metadata
Assignees
Labels
:Data Management/Ingest NodeExecution or management of Ingest Pipelines including GeoIPExecution or management of Ingest Pipelines including GeoIP:Search Foundations/MappingIndex mappings, including merging and defining field typesIndex mappings, including merging and defining field types>regressionTeam:Search FoundationsMeta label for the Search Foundations team in ElasticsearchMeta label for the Search Foundations team in Elasticsearch