Description
status NEW severity enhancement in component indexer for ---
Reported in version unspecified on platform ANY/Generic
Assigned to: Trond Norbye
On 2012-02-15 13:52:01 +0000, Vladimir Kotal wrote:
Recent reindexing with 0.11 revealed that the indexer cannot cope with larger files and just blows up (JAVA_OPTS is default, set to 2 GB):
2012-02-15 14:30:53.572+0100 INFO t15 DefaultIndexChangedListener.fileAdd: Add: /foo.cpio (PlainAnalyzer)
2012-02-15 14:31:43.178+0100 SEVERE t15 IndexDatabase$1.run: Problem updating lucene index database:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at org.opensolaris.opengrok.analysis.plain.PlainAnalyzer.analyze(PlainAnalyzer.java:77)
at org.opensolaris.opengrok.analysis.TextAnalyzer.analyze(TextAnalyzer.java:60)
at org.opensolaris.opengrok.analysis.AnalyzerGuru.getDocument(AnalyzerGuru.java:262)
at org.opensolaris.opengrok.index.IndexDatabase.addFile(IndexDatabase.java:584)
at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:814)
at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:787)
at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:787)
at org.opensolaris.opengrok.index.IndexDatabase.update(IndexDatabase.java:354)
at org.opensolaris.opengrok.index.IndexDatabase$1.run(IndexDatabase.java:158)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)2012-02-15 14:31:43.194+0100 INFO t10 Indexer.sendToConfigHost: Send configuration to: localhost:2424
2012-02-15 14:31:44.488+0100 INFO t10 Indexer.sendToConfigHost: Configuration update routine done, check log output for errors.$ du -sh /foo.cpio
311M /foo.cpioThere should be an option which would allow us to say that files larger than xy bytes should be ignored by the indexer (similar to the -i option for filenames).
On 2012-02-15 13:54:37 +0000, Vladimir Kotal wrote:
Maybe there should even be some sane default, like 100 MB.
On 2012-02-16 12:26:12 +0000, Knut Anders Hatlen wrote:
The analyzers don't really need to read the entire file into memory, they could also operate on streams. The reason why they do read the file into memory, I think, is to avoid reading every file twice (once to add it to the Lucene indexes, and once to build the xref). I'm not sure how important this optimization is (should run some experiments to see).