Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
524 changes: 271 additions & 253 deletions README.md

Large diffs are not rendered by default.

648 changes: 648 additions & 0 deletions doc/1.x/README.md

Large diffs are not rendered by default.

File renamed without changes.
181 changes: 181 additions & 0 deletions doc/1.x/compatibility.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
## Compatibility with JSON Schema versions

[![Supported Dialects](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fsupported_versions.json)](https://bowtie.report/#/implementations/java-networknt-json-schema-validator)
[![Draft 2020-12](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft2020-12.json)](https://bowtie.report/#/dialects/draft2020-12)
[![Draft 2019-09](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft2019-09.json)](https://bowtie.report/#/dialects/draft2019-09)
[![Draft 7](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft7.json)](https://bowtie.report/#/dialects/draft7)
[![Draft 6](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft6.json)](https://bowtie.report/#/dialects/draft6)
[![Draft 4](https://img.shields.io/endpoint?url=https%3A%2F%2Fbowtie.report%2Fbadges%2Fjava-com.networknt-json-schema-validator%2Fcompliance%2Fdraft4.json)](https://bowtie.report/#/dialects/draft4)

The `pattern` and `format` `regex` validator by default uses the JDK regular expression implementation which is not ECMA-262 compliant and is thus not compliant with the JSON Schema specification. The library can however be configured to use a ECMA-262 compliant regular expression implementation such as `GraalJS` or `Joni`.

Annotation processing and reporting are implemented. Note that the collection of annotations will have an adverse performance impact.

This implements the Flag, List and Hierarchical output formats defined in the [Specification for Machine-Readable Output for JSON Schema Validation and Annotation](https://github.com/json-schema-org/json-schema-spec/blob/8270653a9f59fadd2df0d789f22d486254505bbe/jsonschema-validation-output-machines.md).

The implementation supports the use of custom keywords, formats, vocabularies and meta-schemas.

### Known Issues

There are currently no known issues with the required functionality from the specification.

The following are the tests results after running the [JSON Schema Test Suite](https://github.com/json-schema-org/JSON-Schema-Test-Suite) as at 18 Jun 2024 using version 1.4.1. As the test suite is continously updated, this can result in changes in the results subsequently.

| Implementations | Overall | DRAFT_03 | DRAFT_04 | DRAFT_06 | DRAFT_07 | DRAFT_2019_09 | DRAFT_2020_12 |
|-----------------|-------------------------------------------------------------------------|-------------------------------------------------------------------|---------------------------------------------------------------------|--------------------------------------------------------------------|------------------------------------------------------------------------|----------------------------------------------------------------------|------------------------------------------------------------------------|
| NetworkNt | pass: r:4803 (100.0%) o:2372 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | | pass: r:610 (100.0%) o:251 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:822 (100.0%) o:318 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:906 (100.0%) o:541 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:1220 (100.0%) o:625 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) | pass: r:1245 (100.0%) o:637 (100.0%)<br>fail: r:0 (0.0%) o:0 (0.0%) |

### Legend

| Symbol | Meaning |
|:------:|:----------------------|
| 🟢 | Fully implemented |
| 🟡 | Partially implemented |
| 🔴 | Not implemented |
| 🚫 | Not defined |

### Keywords Support

| Keyword | Draft 4 | Draft 6 | Draft 7 | Draft 2019-09 | Draft 2020-12 |
|:---------------------------|:-------:|:-------:|:-------:|:-------------:|:-------------:|
| $anchor | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| $defs | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| $dynamicAnchor | 🚫 | 🚫 | 🚫 | 🚫 | 🟢 |
| $dynamicRef | 🚫 | 🚫 | 🚫 | 🚫 | 🟢 |
| $id | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| $recursiveAnchor | 🚫 | 🚫 | 🚫 | 🟢 | 🚫 |
| $recursiveRef | 🚫 | 🚫 | 🚫 | 🟢 | 🚫 |
| $ref | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| $vocabulary | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| additionalItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| additionalProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| allOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| anyOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| const | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| contains | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| contentEncoding | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| contentMediaType | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| contentSchema | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| definitions | 🟢 | 🟢 | 🟢 | 🚫 | 🚫 |
| dependencies | 🟢 | 🟢 | 🟢 | 🚫 | 🚫 |
| dependentRequired | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| dependentSchemas | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| enum | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| exclusiveMaximum (boolean) | 🟢 | 🚫 | 🚫 | 🚫 | 🚫 |
| exclusiveMaximum (numeric) | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| exclusiveMinimum (boolean) | 🟢 | 🚫 | 🚫 | 🚫 | 🚫 |
| exclusiveMinimum (numeric) | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| if-then-else | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| items | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| maxContains | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| minContains | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| maximum | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| maxItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| maxLength | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| maxProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| minimum | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| minItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| minLength | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| minProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| multipleOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| not | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| oneOf | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| pattern | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| patternProperties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| prefixItems | 🚫 | 🚫 | 🚫 | 🚫 | 🟢 |
| properties | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| propertyNames | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| readOnly | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| required | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| type | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| unevaluatedItems | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| unevaluatedProperties | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| uniqueItems | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| writeOnly | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |

In accordance with the specification, unknown keywords are treated as annotations. This is customizable by configuring a unknown keyword factory on the respective meta-schema.

#### Content Encoding

Since Draft 2019-09, the `contentEncoding` keyword does not generate assertions.

#### Content Media Type

Since Draft 2019-09, the `contentMediaType` keyword does not generate assertions.

#### Content Schema

The `contentSchema` keyword does not generate assertions.

#### Pattern

By default the `pattern` keyword uses the JDK regular expression implementation validating regular expressions.

This is not ECMA-262 compliant and is thus not compliant with the JSON Schema specification. This is however the more likely desired behavior as other logic will most likely be using the default JDK regular expression implementation to perform downstream processing.

The library can be configured to use a ECMA-262 compliant regular expression validator which is implemented using [GraalJS](https://github.com/oracle/graaljs) or [Joni](https://github.com/jruby/joni). This can be configured by setting `setRegularExpressionFactory` to the respective `GraalJSRegularExpressionFactory` or `JoniRegularExpressionFactory` instances.

This also requires adding the `org.graalvm.js:js` or `org.jruby.joni:joni` dependency.

```xml
<dependency>
<!-- Used to validate ECMA 262 regular expressions -->
<!-- Approximately 50 MB in dependencies -->
<!-- GraalJSRegularExpressionFactory -->
<groupId>org.graalvm.js</groupId>
<artifactId>js</artifactId>
<version>${version.graaljs}</version>
</dependency>

<dependency>
<!-- Used to validate ECMA 262 regular expressions -->
<!-- Approximately 2 MB in dependencies -->
<!-- JoniRegularExpressionFactory -->
<groupId>org.jruby.joni</groupId>
<artifactId>joni</artifactId>
<version>${version.joni}</version>
</dependency>
```

#### Format

Since Draft 2019-09 the `format` keyword only generates annotations by default and does not generate assertions.

This can be configured on a schema basis by using a meta schema with the appropriate vocabulary.

| Version | Vocabulary | Value |
|:----------------------|---------------------------------------------------------------|-------------------|
| Draft 2019-09 | `https://json-schema.org/draft/2019-09/vocab/format` | `true` |
| Draft 2020-12 | `https://json-schema.org/draft/2020-12/vocab/format-assertion`| `true`/`false` |

This behavior can be overridden to generate assertions by setting the `setFormatAssertionsEnabled` option to `true`.

| Format | Draft 4 | Draft 6 | Draft 7 | Draft 2019-09 | Draft 2020-12 |
|:----------------------|:-------:|:-------:|:-------:|:-------------:|:-------------:|
| date | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| date-time | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| duration | 🚫 | 🚫 | 🚫 | 🟢 | 🟢 |
| email | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| hostname | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| idn-email | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| idn-hostname | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| ipv4 | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| ipv6 | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| iri | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| iri-reference | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| json-pointer | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| relative-json-pointer | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| regex | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| time | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |
| uri | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
| uri-reference | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| uri-template | 🚫 | 🟢 | 🟢 | 🟢 | 🟢 |
| uuid | 🚫 | 🚫 | 🟢 | 🟢 | 🟢 |

##### Unknown Formats

When the format assertion vocabularies are used in a meta schema, in accordance to the specification, unknown formats will result in assertions. If the format assertion vocabularies are not used, unknown formats will only result in assertions if the assertions are enabled and if `setStrict("format", true)`.

##### Footnotes
1. Note that the validation are only optional for some of the keywords/formats.
2. Refer to the corresponding JSON schema for more information on whether the keyword/format is optional or not.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
89 changes: 89 additions & 0 deletions doc/1.x/ecma-262.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Regular Expressions

For the `pattern` and `format` `regex` validators there are 3 built in options in the library.

A custom implementation can be made by implementing `com.networknt.schema.regex.RegularExpressionFactory` to return a custom implementation of `com.networknt.schema.regex.RegularExpression`.

| Regular Expression Factory | Description |
|--------------------------------------------------|----------------------------------------------------|
| `JDKRegularExpressionFactory` | Uses Java's standard `java.util.regex` and calls the `find()` method. Note that `matches()` is not called as that attempts to match the entire string, implicitly adding anchors. This is the default implementation and does not require any additional libraries. |
| `JoniRegularExpressionFactory` | Uses `org.joni.Regex` with `Syntax.ECMAScript`. This requires adding the `org.jruby.joni:joni` dependency which will require about 2MB. |
| `GraalJSRegularExpressionFactory` | Uses GraalJS with `new RegExp(pattern, 'u')`. This requires adding the `org.graalvm.js:js` dependency which will require about 50MB. |

## Specification

The use of Regular Expressions is specified in JSON Schema at https://json-schema.org/draft/2020-12/json-schema-core#name-regular-expressions.

```
Keywords MAY use regular expressions to express constraints, or constrain the instance value to be a regular expression. These regular expressions SHOULD be valid according to the regular expression dialect described in ECMA-262, section 21.2.1 [ecma262].

Regular expressions SHOULD be built with the "u" flag (or equivalent) to provide Unicode support, or processed in such a way which provides Unicode support as defined by ECMA-262.

Furthermore, given the high disparity in regular expression constructs support, schema authors SHOULD limit themselves to the following regular expression tokens:

individual Unicode characters, as defined by the JSON specification [RFC8259];
simple character classes ([abc]), range character classes ([a-z]);
complemented character classes ([^abc], [^a-z]);
simple quantifiers: "+" (one or more), "*" (zero or more), "?" (zero or one), and their lazy versions ("+?", "*?", "??");
range quantifiers: "{x}" (exactly x occurrences), "{x,y}" (at least x, at most y, occurrences), {x,} (x occurrences or more), and their lazy versions;
the beginning-of-input ("^") and end-of-input ("$") anchors;
simple grouping ("(...)") and alternation ("|").
Finally, implementations MUST NOT take regular expressions to be anchored, neither at the beginning nor at the end. This means, for instance, the pattern "es" matches "expression".
```

## Considerations when selecting implementation

If strict compliance with the regular expression dialect described in ECMA-262 is required. Then only the `GraalJS` implementation meets that criteria.

The `Joni` implementation is configured to attempt to match the ECMA-262 regular expression dialect. However this dialect isn't directly maintained by its maintainers as it doesn't come from its upstream `Oniguruma`. The current implementation has known issues matching inputs with newlines and not respecting `^` and `$` anchors.

The `JDK` implementation is the default and uses `java.util.regex` with the `find()` method.

As the implementations are used when validating regular expressions, using `format` `regex`, one consideration is how the regular expression is used. For instance if the system that consumes the input is implemented in Javascript then the `GraalJS` implementation will ensure that this regular expression will work. If the system that consumes the input is implemented in Java then the `JDK` implementation may be better.

## Configuration of implementation

The following test case shows how to pass a config object to use the `GraalJS` factory.

```java
public class RegularExpressionTest {
@Test
public void testInvalidRegexValidatorECMA262() throws Exception {
SchemaValidatorsConfig config = SchemaValidatorsConfig.builder()
.regularExpressionFactory(GraalJSRegularExpressionFactory.getInstance())
.build();
JsonSchemaFactory factory = JsonSchemaFactory.getInstance(VersionFlag.V202012);
JsonSchema schema = factory.getSchema("{\r\n"
+ " \"format\": \"regex\"\r\n"
+ "}", config);
Set<ValidationMessage> errors = schema.validate("\"\\\\a\"", InputFormat.JSON, executionContext -> {
executionContext.getExecutionConfig().setFormatAssertionsEnabled(true);
});
assertFalse(errors.isEmpty());
}
}
```

## Performance

The following is the relative performance of the different implementations.

```
Benchmark Mode Cnt Score Error Units
RegularExpressionBenchmark.graaljs thrpt 6 362696.226 ± 15811.099 ops/s
RegularExpressionBenchmark.graaljs:gc.alloc.rate thrpt 6 2584.386 ± 112.708 MB/sec
RegularExpressionBenchmark.graaljs:gc.alloc.rate.norm thrpt 6 7472.003 ± 0.001 B/op
RegularExpressionBenchmark.graaljs:gc.count thrpt 6 130.000 counts
RegularExpressionBenchmark.graaljs:gc.time thrpt 6 144.000 ms
RegularExpressionBenchmark.jdk thrpt 6 2776184.321 ± 41838.479 ops/s
RegularExpressionBenchmark.jdk:gc.alloc.rate thrpt 6 1482.565 ± 22.343 MB/sec
RegularExpressionBenchmark.jdk:gc.alloc.rate.norm thrpt 6 560.000 ± 0.001 B/op
RegularExpressionBenchmark.jdk:gc.count thrpt 6 74.000 counts
RegularExpressionBenchmark.jdk:gc.time thrpt 6 78.000 ms
RegularExpressionBenchmark.joni thrpt 6 1810229.581 ± 35230.798 ops/s
RegularExpressionBenchmark.joni:gc.alloc.rate thrpt 6 1463.887 ± 28.483 MB/sec
RegularExpressionBenchmark.joni:gc.alloc.rate.norm thrpt 6 848.003 ± 0.001 B/op
RegularExpressionBenchmark.joni:gc.count thrpt 6 73.000 counts
RegularExpressionBenchmark.joni:gc.time thrpt 6 77.000 ms
```

File renamed without changes.
Loading