|
| 1 | +--- |
| 2 | +date: 2020-05-21 |
| 3 | +title: "Accessing Data and Metadata" |
| 4 | +linkTitle: "Accessing Data and Metadata" |
| 5 | +weight: 60 |
| 6 | +description: > |
| 7 | + The commons configuration for Connect File Pulse. |
| 8 | +--- |
| 9 | + |
| 10 | +Some filters (e.g : [AppendFilter](#appendfilter)) can be configured using *Simple Connect Expression Language*. |
| 11 | + |
| 12 | +*Simple Connect Expression Language* (ScEL for short) is an expression language based on regex that allows quick access and manipulating record fields and metadata. |
| 13 | + |
| 14 | +The syntax to define an expression is of the form : "`{{ <expression string> }}`". |
| 15 | + |
| 16 | +{{% alert title="Note" color="info" %}} |
| 17 | +In some situation double brackets can be omitted if the expression is used to write a value into a target field. |
| 18 | +{{% /alert %}} |
| 19 | + |
| 20 | +ScEL supports the following capabilities : |
| 21 | + |
| 22 | +* **Field Selector** |
| 23 | +* **Nested Navigation** |
| 24 | +* **String substitution** |
| 25 | +* **Functions** |
| 26 | + |
| 27 | +## Field Selector |
| 28 | + |
| 29 | +The expression language can be used to easily select one field from the input record : |
| 30 | + |
| 31 | +"`{{ username }}`" |
| 32 | + |
| 33 | +## Nested Navigation |
| 34 | + |
| 35 | +To navigate down a struct value, just use a period to indicate a nested field value : |
| 36 | + |
| 37 | +"`{{ address.city }}`" |
| 38 | + |
| 39 | +## String substitution |
| 40 | + |
| 41 | +The expression language can be used to easily build a new string field that concatenate multiple ones : |
| 42 | + |
| 43 | +"`{{ <expression one> }}-{{ <expression two>}}`" |
| 44 | + |
| 45 | +## Built-in Functions |
| 46 | + |
| 47 | +ScEL supports a number of predefined functions that can be used to apply a single transformation on a field. |
| 48 | + |
| 49 | +| Function | Description | Syntax | |
| 50 | +| ---------------| --------------|-----------| |
| 51 | +| `contains` | Returns `true` if an array field's value contains the specified value | `{{ contains(array, value) }}` | |
| 52 | +| `converts` | Converts a field'value into the specified type | `{{ converts(field, INTEGER) }}` | |
| 53 | +| `ends_with` | Returns `true` if an a string field's value end with the specified string suffix | `{{ ends_with(field, suffix) }}` | |
| 54 | +| `equals` | Returns `true` if an a string or number fields's value equals the specified value | `{{ equals(field, value) }}` | |
| 55 | +| `exists` | Returns `true` if an the specified field exists | `{{ ends_with(field, value) }}` | |
| 56 | +| `extract_array`| Returns the element at the specified position of the specified array | `{{extract_array(array, 0) }}` | |
| 57 | +| `is_null` | Returns `true` if a field's value is null | `{{ is_null(field) }}` | |
| 58 | +| `length` | Returns the number of elements into an array of the length of an string field | `{{ length(array) }}` | |
| 59 | +| `lowercase` | Converts all of the characters in a string field's value to lower case | `{{ lowercase(field) }}` | |
| 60 | +| `matches` | Returns `true` if a field's value match the specified regex | `{{ matches(field, regex) }}` | |
| 61 | +| `nlv` | Sets a default value if a field's value is null | `{{ length(array) }}` | |
| 62 | +| `replace_all ` | Replaces every subsequence of the field's value that matches the given pattern with the given replacement string. | `{{ replace_all(field, regex, replacement) }}` | |
| 63 | +| `starts_with` | Returns `true` if an a string field's value start with the specified string prefix | `{{ starts_with(field, prefix) }}` | |
| 64 | +| `trim` | Trims the spaces from the beginning and end of a string. | `{{ trim(field) }}` | |
| 65 | +| `uppercase` | Converts all of the characters in a string field's value to upper case | `{{ uppercase(field) }}` | |
| 66 | + |
| 67 | + |
| 68 | +In addition, ScEL supports nested functions. |
| 69 | + |
| 70 | +For example, the following expression is used to replace all whitespace characters after transforming our field's value into lowercase. |
| 71 | + |
| 72 | +``` |
| 73 | +{{ replace_all(lowercase(field), \\s, -)}} |
| 74 | +``` |
| 75 | + |
| 76 | +{{% alert title="Limitation" color="warning" %}} |
| 77 | +Currently, FilePulse does not support user-defined functions (UDFs). So you cannot register your own functions to enrich the expression language. |
| 78 | +{{% /alert %}} |
| 79 | + |
| 80 | + |
| 81 | +## Scopes |
| 82 | + |
| 83 | + |
| 84 | +In previous section, we have shown how to use the expression language to select a specific field. |
| 85 | +The selected field was part of our the current record being processed. |
| 86 | + |
| 87 | +Actually, ScEL allows you to get access to additional fields through the used of scopes. |
| 88 | +Basically, a scope defined the root object on which a selector expression must evaluated. |
| 89 | + |
| 90 | +The syntax to define an expression with a scope is of the form : "`{{ $<scope>.<selector expression string> }}`". |
| 91 | + |
| 92 | +By default, if no scope is defined in the expression, the scope `$value` is implicitly used. |
| 93 | + |
| 94 | +ScEL supports a number of predefined scopes that can be used for example : |
| 95 | + |
| 96 | + - **To override the output topic.** |
| 97 | + - **To define record the key to be used.** |
| 98 | + - **To get access to the source file metadata.** |
| 99 | + - Etc. |
| 100 | + |
| 101 | +| Scope | Description | Type | |
| 102 | +|--- | --- |--- | |
| 103 | +| `{{ $headers }}` | The record headers | - | |
| 104 | +| `{{ $key }}` | The record key | `string` | |
| 105 | +| `{{ $metadata }}` | The file metadata | `struct` | |
| 106 | +| `{{ $offset }}` | The offset information of this record into the source file | `struct` | |
| 107 | +| `{{ $system }}` | The system environment variables and runtime properties | `struct` | |
| 108 | +| `{{ $timestamp }}` | The record timestamp | `long` | |
| 109 | +| `{{ $topic }}` | The output topic | `string` | |
| 110 | +| `{{ $value }}` | The record value| `struct` | |
| 111 | +| `{{ $variables }}` | The contextual filter-chain variables| `map[string, object]` | |
| 112 | + |
| 113 | +Note, that in case of failures more fields are added to the current filter context (see : [Handling Failures](./handling-failures) |
| 114 | + |
| 115 | +### Record Headers |
| 116 | + |
| 117 | +The scope `headers` allows to defined the headers of the output record. |
| 118 | + |
| 119 | +### Record key |
| 120 | + |
| 121 | +The scope `key` allows to defined the key of the output record. Only string key is currently supported. |
| 122 | + |
| 123 | +### Source Metadata |
| 124 | + |
| 125 | +The scope `metadata` allows read access to information about the file being processing. |
| 126 | + |
| 127 | +| Predefined Fields (ScEL) | Description | Type | |
| 128 | +|--- | --- |--- | |
| 129 | +| `{{ $metadata.name }}` | The file name | `string` | |
| 130 | +| `{{ $metadata.path }}` | The file directory path | `string` | |
| 131 | +| `{{ $metadata.absolutePath }}` | The file absolute path | `string` | |
| 132 | +| `{{ $metadata.hash }}` | The file CRC32 hash | `int` | |
| 133 | +| `{{ $metadata.lastModified }}` | The file last modified time. | `long` | |
| 134 | +| `{{ $metadata.size }}` | The file size | `long` | |
| 135 | +| `{{ $metadata.inode }}` | The file Unix inode | `long` | |
| 136 | + |
| 137 | +## Record Offset |
| 138 | + |
| 139 | +The scope `offset` allows read access to information about the original position of the record into the source file. |
| 140 | +The available fields depend of the configured FileInputRecord. |
| 141 | + |
| 142 | +| Predefined Fields (ScEL) | Description | Type | |
| 143 | +|--- | --- |--- | |
| 144 | +| `{{ $offset.timestamp }}` | The creation time of the record (millisecond) | `long` | |
| 145 | + |
| 146 | +Information only available if `RowFilterReader` is configured. |
| 147 | + |
| 148 | +| Predefined Fields (ScEL) | Description | Type | |
| 149 | +|--- | --- |--- | |
| 150 | +| `{{ $offset.startPosition }}` | The start position of the record into the source file | `long` | |
| 151 | +| `{{ $offset.endPosition }}` | The end position of the record into the source file | `long` | |
| 152 | +| `{{ $offset.size }}` | The size in bytes | `long` | |
| 153 | +| `{{ $offset.row }}` | The row number of the record into the source | `long` | |
| 154 | + |
| 155 | +Information only available if `BytesArrayInputReader` is configured. |
| 156 | + |
| 157 | +| Predefined Fields (ScEL) | Description | Type | |
| 158 | +|--- | --- |--- | |
| 159 | +| `{{ $offset.startPosition }}` | The start position of the record into the source file (always equals to 0) | `long` | |
| 160 | +| `{{ $offset.endPosition }}` | The end position of the record into the source file (equals to the file size) | `long` | |
| 161 | + |
| 162 | +Information only available if `AvroFilterInputReader` is configured. |
| 163 | + |
| 164 | +| Predefined Fields (ScEL) | Description | Type | |
| 165 | +|--- | --- |--- | |
| 166 | +| `{{ $offset.blockStart }}` | The start position of the current block | `long` | |
| 167 | +| `{{ $offset.position }}` | The position into the current block. | `long` | |
| 168 | +| `{{ $offset.records }}` | The number of record read into the current block. | `long` | |
| 169 | + |
| 170 | +## System |
| 171 | + |
| 172 | +The scope `system` allows read access to system environment variables and runtime properties. |
| 173 | + |
| 174 | +| Predefined Fields (ScEL) | Description | Type | |
| 175 | +|--- | --- |--- | |
| 176 | +| `{{ $system.env }}` | The system environment variables. | `map[string, string]` | |
| 177 | +| `{{ $system.props }}` | The system environment properties. | `map[string, string]` | |
| 178 | + |
| 179 | +## Timestamp |
| 180 | + |
| 181 | +The scope `timestamp` allows to defined the timestamp of the output record. |
| 182 | + |
| 183 | +## Topic |
| 184 | + |
| 185 | +The scope `topic` allows to defined the target topic of the output record. |
| 186 | + |
| 187 | +## Value |
| 188 | + |
| 189 | +The scope `value` allows to defined the fields of the output record |
| 190 | + |
| 191 | +## Variables |
| 192 | + |
| 193 | +The scope `variables` allows read/write access to a simple key-value map structure. |
| 194 | +This scope can be used to share user-defined variables between filters. |
| 195 | + |
| 196 | +Note : variables are not cached between records. |
0 commit comments