Given a document with a field that has arrays of free text values the significant_text aggregation treats each array element as a separate document when it comes to counting doc frequencies of terms which can lead to this error from the significance heuristic:
"type": "illegal_argument_exception",
"reason": "subsetFreq > subsetSize, in JLHScore"
This is down to a bug in where the set of previously-seen tokens is allocated and destroyed.