Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ public final class DissectKey {
}

if (name == null || (name.isEmpty() && !skip)) {
throw new DissectException.KeyParse(key, "The key name could be determined");
throw new DissectException.KeyParse(key, "The key name could not be determined");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bumped into a few small things in disect - error messages and figured I'd clean them up while I was looking at them. I can certainly break them into a separate PR if it'd make life easier.

}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@
import java.util.stream.Collectors;

/**
* <p>Splits (dissects) a string into its parts based on a pattern.</p><p>A dissect pattern is composed of a set of keys and delimiters.
* Splits (dissects) a string into its parts based on a pattern.
* <p>
* A dissect pattern is composed of a set of keys and delimiters.
* For example the dissect pattern: <pre>%{a} %{b},%{c}</pre> has 3 keys (a,b,c) and two delimiters (space and comma). This pattern will
* match a string of the form: <pre>foo bar,baz</pre> and will result a key/value pairing of <pre>a=foo, b=bar, and c=baz.</pre>
* <p>Matches are all or nothing. For example, the same pattern will NOT match <pre>foo bar baz</pre> since all of the delimiters did not
Expand Down Expand Up @@ -171,11 +173,9 @@ public DissectParser(String pattern, String appendSeparator) {
*
* @param inputString The string to dissect
* @return the key/value Map of the results
* @throws DissectException if unable to dissect a pair into it's parts.
*/
public Map<String, String> parse(String inputString) {
/**
*
/*
* This implements a naive string matching algorithm. The string is walked left to right, comparing each byte against
* another string's bytes looking for matches. If the bytes match, then a second cursor looks ahead to see if all the bytes
* of the other string matches. If they all match, record it and advances the primary cursor to the match point. If it can not match
Expand Down Expand Up @@ -276,7 +276,19 @@ public Map<String, String> parse(String inputString) {
}
Map<String, String> results = dissectMatch.getResults();

if (!dissectMatch.isValid(results)) {
return dissectMatch.isValid(results) ? results : null;
}

/**
* <p>Entry point to dissect a string into it's parts.</p>
*
* @param inputString The string to dissect
* @return the key/value Map of the results
* @throws DissectException if unable to dissect a pair into it's parts.
*/
public Map<String, String> forceParse(String inputString) {
Map<String, String> results = parse(inputString);
if (results == null) {
throw new DissectException.FindMatch(pattern, inputString);
}
return results;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -344,11 +344,12 @@ public void testJsonSpecification() throws Exception {
}
}

private DissectException assertFail(String pattern, String input){
return expectThrows(DissectException.class, () -> new DissectParser(pattern, null).parse(input));
private DissectException assertFail(String pattern, String input) {
return expectThrows(DissectException.class, () -> new DissectParser(pattern, null).forceParse(input));
}

private void assertMiss(String pattern, String input) {
assertNull(new DissectParser(pattern, null).parse(input));
DissectException e = assertFail(pattern, input);
assertThat(e.getMessage(), CoreMatchers.containsString("Unable to find match for dissect pattern"));
assertThat(e.getMessage(), CoreMatchers.containsString(pattern));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ public IngestDocument execute(IngestDocument ingestDocument) {
} else if (input == null) {
throw new IllegalArgumentException("field [" + field + "] is null, cannot process it.");
}
dissectParser.parse(input).forEach(ingestDocument::setFieldValue);
dissectParser.forceParse(input).forEach(ingestDocument::setFieldValue);
return ingestDocument;
}

Expand Down
2 changes: 2 additions & 0 deletions modules/lang-painless/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ dependencies {
api 'org.ow2.asm:asm-commons:7.2'
api 'org.ow2.asm:asm-analysis:7.2'
api 'org.ow2.asm:asm:7.2'
api project(':libs:elasticsearch-grok')
api project(':libs:elasticsearch-dissect')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First big question: do we want this in "core painless" or do we want to make an extension point? Grok and dissect are libraries, but grok has a few dependencies.

Second question that impacts on the first: do we want to limit these regex flavors to certain contexts? Like just runtime fields. I think it'd be complex to explain that to folks though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 I gave this specific issue a lot more thought over the weekend, and it's definitely my greatest concern. I think when we try to do the separation of core-painless from the plugin, this will be really hard to separate if it's part of the grammar. (Though, I guess we already have a way to turn on/off regex, so maybe this just needs a similar way to do that?). I do wonder if we should instead consider making grok/dissect instance bindings and have them called as static methods. This would allow the grok instance (singleton for Painless) to have both the watchdog and a cache independent of core-painless, and then they become dependent on whitelisting as opposed to grammar changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a grok method makes sense. While these are similar to regexes, they are also distinctly different. I would prefer we not add these dependencies directly to painless. Instead, as Jack suggested, we can make them available through our normal extension mechanisms. But we don't need grok in eg score scripts, so it's not something that needs to be available to all contexts, and we should continue to strive to keep core painless unencumbered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't need grok in eg score scripts, so it's not something that needs to be available to all contexts, and we should continue to strive to keep core painless unencumbered.

I'm not sure that is true. If we're comfortable exposing grok for runtime fields then I think they'd end up in all contexts anyway, if just transitively through runtime fields.

I certainly understand wanting to keep painless modular. Y'all prefer grok and dissect to be methods that compile the pattern instance binding style? That'd work, and it'd keep all of the watchdog stuff out of core painless.

Copy link
Contributor

@jdconrad jdconrad Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not overly concerned with the contexts simply because someone could use grok in a runtime field that could then be used as part of a score script. I get that it's hard to remove things once they're part of the context whitelist, but since it can be used indirectly anyway I don't have a strong desire to keep it out of other contexts.

Copy link
Member

@rjernst rjernst Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair point that through runtime fields it can probably effectively be used everywhere. I do think the modular argument is strong through; there is nothing about grok that implies to me it needs to be part of the language itself. It can work just as well as a method, which would keep painless free of additional external deps.

api project('spi')
}

Expand Down
2 changes: 1 addition & 1 deletion modules/lang-painless/src/main/antlr/PainlessLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ INTEGER: ( '0' | [1-9] [0-9]* ) [lLfFdD]?;
DECIMAL: ( '0' | [1-9] [0-9]* ) (DOT [0-9]+)? ( [eE] [+\-]? [0-9]+ )? [fFdD]?;

STRING: ( '"' ( '\\"' | '\\\\' | ~[\\"] )*? '"' ) | ( '\'' ( '\\\'' | '\\\\' | ~[\\'] )*? '\'' );
REGEX: '/' ( '\\' ~'\n' | ~('/' | '\n') )+? '/' [cilmsUux]* { isSlashRegex() }?;
REGEX: [dg]? '/' ( '\\' ~'\n' | ~('/' | '\n') )+? '/' [cilmsUux]* { isSlashRegex() }?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we made the regex flavor pluggable then we're replace [dg]? with [a-z]? or something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose a more descriptive name if possible e.g. grok and dissect rather than g and d. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with single letters because the regex flags are single letter and it "felt similar". I'd kind of prefer single letters over longer things, but I'm not super attached either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do stick with grammar changes, I prefer the single letters as well. Something like grok/.../ seems quite awkward.


TRUE: 'true';
FALSE: 'false';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
package org.elasticsearch.painless;

import org.elasticsearch.bootstrap.BootstrapInfo;
import org.elasticsearch.grok.MatcherWatchdog;
import org.elasticsearch.painless.antlr.Walker;
import org.elasticsearch.painless.ir.ClassNode;
import org.elasticsearch.painless.lookup.PainlessLookup;
Expand All @@ -46,6 +47,7 @@
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Supplier;

import static org.elasticsearch.painless.WriterConstants.CLASS_NAME;

Expand Down Expand Up @@ -165,16 +167,29 @@ public Loader createLoader(ClassLoader parent) {
*/
private final Map<String, Class<?>> additionalClasses;

/**
* Suppliers the watchdog that prevents grok from running forever.
*/
private final Supplier<MatcherWatchdog> grokWatchdog;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This watchdog is what prevents grok from "running forever". It basically calls Grok#interrupt when grok's have run too long. Grok itself periodically checks its own Grok#interrupt flag and Thread.currentThread.interrupted. It'd be better for us if we could register a callback that it checks. But it doesn't support that. So we need this watchdog thing. The watchdog is a shared component that tracks groks and interrupts them if they'd run too long. It seems heavy, especially for painless. And the plumbing is unpleasant. But without modifying joni we're stuck with it. And modifying joni doesn't seem like a small project.

Another way to do it might be to plumb a listener for task cancellation into Task and have it call any running Groks. The more I type this the more I'm thinking modifying joni is the right thing.....


/**
* Standard constructor.
* @param scriptClass The class/interface the script will implement.
* @param factoryClass An optional class/interface to create the {@code scriptClass} instance.
* @param statefulFactoryClass An optional class/interface to create the {@code factoryClass} instance.
* @param painlessLookup The whitelist the script will use.
* @param grokWatchdog Supplies the watchdog used to prevent grok from running forever
*/
Compiler(Class<?> scriptClass, Class<?> factoryClass, Class<?> statefulFactoryClass, PainlessLookup painlessLookup) {
Compiler(
Class<?> scriptClass,
Class<?> factoryClass,
Class<?> statefulFactoryClass,
PainlessLookup painlessLookup,
Supplier<MatcherWatchdog> grokWatchdog
) {
this.scriptClass = scriptClass;
this.painlessLookup = painlessLookup;
this.grokWatchdog = grokWatchdog;
Map<String, Class<?>> additionalClasses = new HashMap<>();
additionalClasses.put(scriptClass.getName(), scriptClass);
addFactoryMethod(additionalClasses, factoryClass, "newInstance");
Expand Down Expand Up @@ -218,7 +233,15 @@ ScriptScope compile(Loader loader, String name, String source, CompilerSettings
String scriptName = Location.computeSourceName(name);
ScriptClassInfo scriptClassInfo = new ScriptClassInfo(painlessLookup, scriptClass);
SClass root = Walker.buildPainlessTree(scriptName, source, settings);
ScriptScope scriptScope = new ScriptScope(painlessLookup, settings, scriptClassInfo, scriptName, source, root.getIdentifier() + 1);
ScriptScope scriptScope = new ScriptScope(
painlessLookup,
settings,
scriptClassInfo,
scriptName,
source,
grokWatchdog,
root.getIdentifier() + 1
);
new PainlessSemanticHeaderPhase().visitClass(root, scriptScope);
new PainlessSemanticAnalysisPhase().visitClass(root, scriptScope);
// TODO: Make this phase optional #60156
Expand Down Expand Up @@ -254,7 +277,15 @@ byte[] compile(String name, String source, CompilerSettings settings, Printer de
String scriptName = Location.computeSourceName(name);
ScriptClassInfo scriptClassInfo = new ScriptClassInfo(painlessLookup, scriptClass);
SClass root = Walker.buildPainlessTree(scriptName, source, settings);
ScriptScope scriptScope = new ScriptScope(painlessLookup, settings, scriptClassInfo, scriptName, source, root.getIdentifier() + 1);
ScriptScope scriptScope = new ScriptScope(
painlessLookup,
settings,
scriptClassInfo,
scriptName,
source,
grokWatchdog,
root.getIdentifier() + 1
);
new PainlessSemanticHeaderPhase().visitClass(root, scriptScope);
new PainlessSemanticAnalysisPhase().visitClass(root, scriptScope);
// TODO: Make this phase optional #60156
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Setting.Property;
import org.elasticsearch.grok.Grok;
import org.elasticsearch.painless.api.Augmentation;

import java.util.HashMap;
Expand Down Expand Up @@ -77,6 +78,9 @@ public final class CompilerSettings {
* For testing. Do not use.
*/
private int initialCallSiteDepth = 0;

private Map<String, String> grokPatternBank = Grok.BUILTIN_PATTERNS;

private int testInject0 = 2;
private int testInject1 = 4;
private int testInject2 = 6;
Expand Down Expand Up @@ -170,6 +174,20 @@ public int getRegexLimitFactor() {
return regexLimitFactor;
}

/**
* Default grok "pattern bank". Mostly initialized here so
*/
public Map<String, String> getGrokPatternBank() {
return grokPatternBank;
}

public void addToGrokPatternBank(String name, String pattern) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grok allows you to add things to the pattern bank and I started plumbing that through using the scripts options. But it turns out that we have an assertion that options contains some very specific stuff off in Script so I never finished it. I've left this here so we can talk about it. Grok really does want to let folks write a pattern bank and options seems like an OK way to do it, its just a change to how we'd been using options. So I left this in so we could discuss it.

if (grokPatternBank == Grok.BUILTIN_PATTERNS) {
grokPatternBank = new HashMap<>(grokPatternBank);
}
grokPatternBank.put(name, pattern);
}

/**
* Get compiler settings as a map. This is used to inject compiler settings into augmented methods with the {@code @inject_constant}
* annotation.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,12 @@
import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.settings.SettingsFilter;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.util.LazyInitializable;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
import org.elasticsearch.env.Environment;
import org.elasticsearch.env.NodeEnvironment;
import org.elasticsearch.grok.MatcherWatchdog;
import org.elasticsearch.painless.action.PainlessContextAction;
import org.elasticsearch.painless.action.PainlessExecuteAction;
import org.elasticsearch.painless.spi.PainlessExtension;
Expand Down Expand Up @@ -111,6 +114,8 @@ public final class PainlessPlugin extends Plugin implements ScriptPlugin, Extens
}

private final SetOnce<PainlessScriptEngine> painlessScriptEngine = new SetOnce<>();
private final SetOnce<ThreadPool> threadPool = new SetOnce<>();
private final Supplier<MatcherWatchdog> grokWatchdog = new LazyInitializable<>(this::initGrokWatchdog)::getOrCompute;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff is the big "yikes" around the watchdog.


@Override
public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) {
Expand All @@ -123,7 +128,7 @@ public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<
}
contextsWithWhitelists.put(context, contextWhitelists);
}
painlessScriptEngine.set(new PainlessScriptEngine(settings, contextsWithWhitelists));
painlessScriptEngine.set(new PainlessScriptEngine(settings, contextsWithWhitelists, grokWatchdog));
return painlessScriptEngine.get();
}

Expand All @@ -136,6 +141,7 @@ public Collection<Object> createComponents(Client client, ClusterService cluster
Supplier<RepositoriesService> repositoriesServiceSupplier) {
// this is a hack to bind the painless script engine in guice (all components are added to guice), so that
// the painless context api. this is a temporary measure until transport actions do no require guice
this.threadPool.set(threadPool);
return Collections.singletonList(painlessScriptEngine.get());
}

Expand Down Expand Up @@ -178,4 +184,13 @@ public List<RestHandler> getRestHandlers(Settings settings, RestController restC
handlers.add(new PainlessContextAction.RestAction());
return handlers;
}

private MatcherWatchdog initGrokWatchdog() {
// TODO this is fairly unpleasant
ThreadPool threadPool = this.threadPool.get();
return MatcherWatchdog.newInstance(1000, 1000, threadPool::relativeTimeInMillis,
(delay, command) -> threadPool.schedule(
command, TimeValue.timeValueMillis(delay), ThreadPool.Names.GENERIC
));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much what ingest does, though it makes the two 1000 values configurable. If we're ok adding the watchdog like this then we probably should do that too, but I have no idea if we want to keep this at all so I didn't go to the trouble yet.

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import org.elasticsearch.SpecialPermission;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.grok.MatcherWatchdog;
import org.elasticsearch.painless.Compiler.Loader;
import org.elasticsearch.painless.lookup.PainlessLookup;
import org.elasticsearch.painless.lookup.PainlessLookupBuilder;
Expand All @@ -45,9 +46,11 @@
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.Supplier;

import static org.elasticsearch.painless.WriterConstants.OBJECT_TYPE;

Expand Down Expand Up @@ -79,7 +82,7 @@ public final class PainlessScriptEngine implements ScriptEngine {

/**
* Default compiler settings to be used. Note that {@link CompilerSettings} is mutable but this instance shouldn't be mutated outside
* of {@link PainlessScriptEngine#PainlessScriptEngine(Settings, Map)}.
* of {@link PainlessScriptEngine#PainlessScriptEngine}.
*/
private final CompilerSettings defaultCompilerSettings = new CompilerSettings();

Expand All @@ -90,7 +93,11 @@ public final class PainlessScriptEngine implements ScriptEngine {
* Constructor.
* @param settings The settings to initialize the engine with.
*/
public PainlessScriptEngine(Settings settings, Map<ScriptContext<?>, List<Whitelist>> contexts) {
public PainlessScriptEngine(
Settings settings,
Map<ScriptContext<?>, List<Whitelist>> contexts,
Supplier<MatcherWatchdog> grokWatchdog
) {
defaultCompilerSettings.setRegexesEnabled(CompilerSettings.REGEX_ENABLED.get(settings));
defaultCompilerSettings.setRegexLimitFactor(CompilerSettings.REGEX_LIMIT_FACTOR.get(settings));

Expand All @@ -101,7 +108,7 @@ public PainlessScriptEngine(Settings settings, Map<ScriptContext<?>, List<Whitel
ScriptContext<?> context = entry.getKey();
PainlessLookup lookup = PainlessLookupBuilder.buildFromWhitelists(entry.getValue());
contextsToCompilers.put(context,
new Compiler(context.instanceClazz, context.factoryClazz, context.statefulFactoryClazz, lookup));
new Compiler(context.instanceClazz, context.factoryClazz, context.statefulFactoryClazz, lookup, grokWatchdog));
contextsToLookups.put(context, lookup);
}

Expand Down Expand Up @@ -449,6 +456,15 @@ private CompilerSettings buildCompilerSettings(Map<String, String> params) {
compilerSettings.setInitialCallSiteDepth(Integer.parseInt(value));
}

for (Iterator<Map.Entry<String, String>> itr = copy.entrySet().iterator(); itr.hasNext();) {
Map.Entry<String, String> e = itr.next();
if (false == e.getKey().startsWith("grok.pattern.")) {
continue;
}
itr.remove();
compilerSettings.addToGrokPatternBank(e.getKey().substring("grok.pattern.".length()), e.getValue());
}

value = copy.remove(CompilerSettings.REGEX_ENABLED.getKey());
if (value != null) {
throw new IllegalArgumentException("[painless.regex.enabled] can only be set on node startup.");
Expand Down
Loading