-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Expose dissect and grok to painless #67825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,6 +40,8 @@ dependencies { | |
| api 'org.ow2.asm:asm-commons:7.2' | ||
| api 'org.ow2.asm:asm-analysis:7.2' | ||
| api 'org.ow2.asm:asm:7.2' | ||
| api project(':libs:elasticsearch-grok') | ||
| api project(':libs:elasticsearch-dissect') | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First big question: do we want this in "core painless" or do we want to make an extension point? Grok and dissect are libraries, but grok has a few dependencies. Second question that impacts on the first: do we want to limit these regex flavors to certain contexts? Like just runtime fields. I think it'd be complex to explain that to folks though.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nik9000 I gave this specific issue a lot more thought over the weekend, and it's definitely my greatest concern. I think when we try to do the separation of core-painless from the plugin, this will be really hard to separate if it's part of the grammar. (Though, I guess we already have a way to turn on/off regex, so maybe this just needs a similar way to do that?). I do wonder if we should instead consider making grok/dissect instance bindings and have them called as static methods. This would allow the grok instance (singleton for Painless) to have both the watchdog and a cache independent of core-painless, and then they become dependent on whitelisting as opposed to grammar changes.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a grok method makes sense. While these are similar to regexes, they are also distinctly different. I would prefer we not add these dependencies directly to painless. Instead, as Jack suggested, we can make them available through our normal extension mechanisms. But we don't need grok in eg score scripts, so it's not something that needs to be available to all contexts, and we should continue to strive to keep core painless unencumbered.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure that is true. If we're comfortable exposing grok for runtime fields then I think they'd end up in all contexts anyway, if just transitively through runtime fields. I certainly understand wanting to keep painless modular. Y'all prefer grok and dissect to be methods that compile the pattern instance binding style? That'd work, and it'd keep all of the watchdog stuff out of core painless.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not overly concerned with the contexts simply because someone could use grok in a runtime field that could then be used as part of a score script. I get that it's hard to remove things once they're part of the context whitelist, but since it can be used indirectly anyway I don't have a strong desire to keep it out of other contexts.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a fair point that through runtime fields it can probably effectively be used everywhere. I do think the modular argument is strong through; there is nothing about grok that implies to me it needs to be part of the language itself. It can work just as well as a method, which would keep painless free of additional external deps. |
||
| api project('spi') | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -109,7 +109,7 @@ INTEGER: ( '0' | [1-9] [0-9]* ) [lLfFdD]?; | |
| DECIMAL: ( '0' | [1-9] [0-9]* ) (DOT [0-9]+)? ( [eE] [+\-]? [0-9]+ )? [fFdD]?; | ||
|
|
||
| STRING: ( '"' ( '\\"' | '\\\\' | ~[\\"] )*? '"' ) | ( '\'' ( '\\\'' | '\\\\' | ~[\\'] )*? '\'' ); | ||
| REGEX: '/' ( '\\' ~'\n' | ~('/' | '\n') )+? '/' [cilmsUux]* { isSlashRegex() }?; | ||
| REGEX: [dg]? '/' ( '\\' ~'\n' | ~('/' | '\n') )+? '/' [cilmsUux]* { isSlashRegex() }?; | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we made the regex flavor pluggable then we're replace
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would propose a more descriptive name if possible e.g. grok and dissect rather than g and d. Thoughts?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went with single letters because the regex flags are single letter and it "felt similar". I'd kind of prefer single letters over longer things, but I'm not super attached either way.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do stick with grammar changes, I prefer the single letters as well. Something like grok/.../ seems quite awkward. |
||
|
|
||
| TRUE: 'true'; | ||
| FALSE: 'false'; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ | |
| package org.elasticsearch.painless; | ||
|
|
||
| import org.elasticsearch.bootstrap.BootstrapInfo; | ||
| import org.elasticsearch.grok.MatcherWatchdog; | ||
| import org.elasticsearch.painless.antlr.Walker; | ||
| import org.elasticsearch.painless.ir.ClassNode; | ||
| import org.elasticsearch.painless.lookup.PainlessLookup; | ||
|
|
@@ -46,6 +47,7 @@ | |
| import java.util.HashMap; | ||
| import java.util.Map; | ||
| import java.util.concurrent.atomic.AtomicInteger; | ||
| import java.util.function.Supplier; | ||
|
|
||
| import static org.elasticsearch.painless.WriterConstants.CLASS_NAME; | ||
|
|
||
|
|
@@ -165,16 +167,29 @@ public Loader createLoader(ClassLoader parent) { | |
| */ | ||
| private final Map<String, Class<?>> additionalClasses; | ||
|
|
||
| /** | ||
| * Suppliers the watchdog that prevents grok from running forever. | ||
| */ | ||
| private final Supplier<MatcherWatchdog> grokWatchdog; | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This watchdog is what prevents grok from "running forever". It basically calls Another way to do it might be to plumb a listener for task cancellation into |
||
|
|
||
| /** | ||
| * Standard constructor. | ||
| * @param scriptClass The class/interface the script will implement. | ||
| * @param factoryClass An optional class/interface to create the {@code scriptClass} instance. | ||
| * @param statefulFactoryClass An optional class/interface to create the {@code factoryClass} instance. | ||
| * @param painlessLookup The whitelist the script will use. | ||
| * @param grokWatchdog Supplies the watchdog used to prevent grok from running forever | ||
| */ | ||
| Compiler(Class<?> scriptClass, Class<?> factoryClass, Class<?> statefulFactoryClass, PainlessLookup painlessLookup) { | ||
| Compiler( | ||
| Class<?> scriptClass, | ||
| Class<?> factoryClass, | ||
| Class<?> statefulFactoryClass, | ||
| PainlessLookup painlessLookup, | ||
| Supplier<MatcherWatchdog> grokWatchdog | ||
| ) { | ||
| this.scriptClass = scriptClass; | ||
| this.painlessLookup = painlessLookup; | ||
| this.grokWatchdog = grokWatchdog; | ||
| Map<String, Class<?>> additionalClasses = new HashMap<>(); | ||
| additionalClasses.put(scriptClass.getName(), scriptClass); | ||
| addFactoryMethod(additionalClasses, factoryClass, "newInstance"); | ||
|
|
@@ -218,7 +233,15 @@ ScriptScope compile(Loader loader, String name, String source, CompilerSettings | |
| String scriptName = Location.computeSourceName(name); | ||
| ScriptClassInfo scriptClassInfo = new ScriptClassInfo(painlessLookup, scriptClass); | ||
| SClass root = Walker.buildPainlessTree(scriptName, source, settings); | ||
| ScriptScope scriptScope = new ScriptScope(painlessLookup, settings, scriptClassInfo, scriptName, source, root.getIdentifier() + 1); | ||
| ScriptScope scriptScope = new ScriptScope( | ||
| painlessLookup, | ||
| settings, | ||
| scriptClassInfo, | ||
| scriptName, | ||
| source, | ||
| grokWatchdog, | ||
| root.getIdentifier() + 1 | ||
| ); | ||
| new PainlessSemanticHeaderPhase().visitClass(root, scriptScope); | ||
| new PainlessSemanticAnalysisPhase().visitClass(root, scriptScope); | ||
| // TODO: Make this phase optional #60156 | ||
|
|
@@ -254,7 +277,15 @@ byte[] compile(String name, String source, CompilerSettings settings, Printer de | |
| String scriptName = Location.computeSourceName(name); | ||
| ScriptClassInfo scriptClassInfo = new ScriptClassInfo(painlessLookup, scriptClass); | ||
| SClass root = Walker.buildPainlessTree(scriptName, source, settings); | ||
| ScriptScope scriptScope = new ScriptScope(painlessLookup, settings, scriptClassInfo, scriptName, source, root.getIdentifier() + 1); | ||
| ScriptScope scriptScope = new ScriptScope( | ||
| painlessLookup, | ||
| settings, | ||
| scriptClassInfo, | ||
| scriptName, | ||
| source, | ||
| grokWatchdog, | ||
| root.getIdentifier() + 1 | ||
| ); | ||
| new PainlessSemanticHeaderPhase().visitClass(root, scriptScope); | ||
| new PainlessSemanticAnalysisPhase().visitClass(root, scriptScope); | ||
| // TODO: Make this phase optional #60156 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,7 @@ | |
|
|
||
| import org.elasticsearch.common.settings.Setting; | ||
| import org.elasticsearch.common.settings.Setting.Property; | ||
| import org.elasticsearch.grok.Grok; | ||
| import org.elasticsearch.painless.api.Augmentation; | ||
|
|
||
| import java.util.HashMap; | ||
|
|
@@ -77,6 +78,9 @@ public final class CompilerSettings { | |
| * For testing. Do not use. | ||
| */ | ||
| private int initialCallSiteDepth = 0; | ||
|
|
||
| private Map<String, String> grokPatternBank = Grok.BUILTIN_PATTERNS; | ||
|
|
||
| private int testInject0 = 2; | ||
| private int testInject1 = 4; | ||
| private int testInject2 = 6; | ||
|
|
@@ -170,6 +174,20 @@ public int getRegexLimitFactor() { | |
| return regexLimitFactor; | ||
| } | ||
|
|
||
| /** | ||
| * Default grok "pattern bank". Mostly initialized here so | ||
| */ | ||
| public Map<String, String> getGrokPatternBank() { | ||
| return grokPatternBank; | ||
| } | ||
|
|
||
| public void addToGrokPatternBank(String name, String pattern) { | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Grok allows you to add things to the pattern bank and I started plumbing that through using the scripts |
||
| if (grokPatternBank == Grok.BUILTIN_PATTERNS) { | ||
| grokPatternBank = new HashMap<>(grokPatternBank); | ||
| } | ||
| grokPatternBank.put(name, pattern); | ||
| } | ||
|
|
||
| /** | ||
| * Get compiler settings as a map. This is used to inject compiler settings into augmented methods with the {@code @inject_constant} | ||
| * annotation. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,9 +33,12 @@ | |
| import org.elasticsearch.common.settings.Setting; | ||
| import org.elasticsearch.common.settings.Settings; | ||
| import org.elasticsearch.common.settings.SettingsFilter; | ||
| import org.elasticsearch.common.unit.TimeValue; | ||
| import org.elasticsearch.common.util.LazyInitializable; | ||
| import org.elasticsearch.common.xcontent.NamedXContentRegistry; | ||
| import org.elasticsearch.env.Environment; | ||
| import org.elasticsearch.env.NodeEnvironment; | ||
| import org.elasticsearch.grok.MatcherWatchdog; | ||
| import org.elasticsearch.painless.action.PainlessContextAction; | ||
| import org.elasticsearch.painless.action.PainlessExecuteAction; | ||
| import org.elasticsearch.painless.spi.PainlessExtension; | ||
|
|
@@ -111,6 +114,8 @@ public final class PainlessPlugin extends Plugin implements ScriptPlugin, Extens | |
| } | ||
|
|
||
| private final SetOnce<PainlessScriptEngine> painlessScriptEngine = new SetOnce<>(); | ||
| private final SetOnce<ThreadPool> threadPool = new SetOnce<>(); | ||
| private final Supplier<MatcherWatchdog> grokWatchdog = new LazyInitializable<>(this::initGrokWatchdog)::getOrCompute; | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This stuff is the big "yikes" around the watchdog. |
||
|
|
||
| @Override | ||
| public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) { | ||
|
|
@@ -123,7 +128,7 @@ public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext< | |
| } | ||
| contextsWithWhitelists.put(context, contextWhitelists); | ||
| } | ||
| painlessScriptEngine.set(new PainlessScriptEngine(settings, contextsWithWhitelists)); | ||
| painlessScriptEngine.set(new PainlessScriptEngine(settings, contextsWithWhitelists, grokWatchdog)); | ||
| return painlessScriptEngine.get(); | ||
| } | ||
|
|
||
|
|
@@ -136,6 +141,7 @@ public Collection<Object> createComponents(Client client, ClusterService cluster | |
| Supplier<RepositoriesService> repositoriesServiceSupplier) { | ||
| // this is a hack to bind the painless script engine in guice (all components are added to guice), so that | ||
| // the painless context api. this is a temporary measure until transport actions do no require guice | ||
| this.threadPool.set(threadPool); | ||
| return Collections.singletonList(painlessScriptEngine.get()); | ||
| } | ||
|
|
||
|
|
@@ -178,4 +184,13 @@ public List<RestHandler> getRestHandlers(Settings settings, RestController restC | |
| handlers.add(new PainlessContextAction.RestAction()); | ||
| return handlers; | ||
| } | ||
|
|
||
| private MatcherWatchdog initGrokWatchdog() { | ||
| // TODO this is fairly unpleasant | ||
| ThreadPool threadPool = this.threadPool.get(); | ||
| return MatcherWatchdog.newInstance(1000, 1000, threadPool::relativeTimeInMillis, | ||
| (delay, command) -> threadPool.schedule( | ||
| command, TimeValue.timeValueMillis(delay), ThreadPool.Names.GENERIC | ||
| )); | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is pretty much what ingest does, though it makes the two |
||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bumped into a few small things in disect - error messages and figured I'd clean them up while I was looking at them. I can certainly break them into a separate PR if it'd make life easier.