Skip to content

Commit d3dc16a

Browse files
committed
update(site): add archives for release v1.3.x
1 parent 9e90720 commit d3dc16a

24 files changed

+1638
-3
lines changed

site/config.toml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ copyright = "The Kafka Connect File Pulse Authors"
7272
# This menu appears only if you have at least one [params.versions] set.
7373
version_menu = "Releases"
7474
archived_version = false
75-
version = "v1.3.0"
75+
version = "v1.3.x"
7676

7777
# Repository configuration (URLs for in-page links to opening issues and suggesting changes)
7878
github_repo = "https://github.com/streamthoughts/kafka-connect-file-pulse"
@@ -129,7 +129,11 @@ no = 'Sorry to hear that. Please <a href="https://github.com/streamthoughts/kafk
129129

130130
[[params.versions]]
131131
version = "master"
132-
url = "./docs"
132+
url = "/kafka-connect-file-pulse/docs/"
133+
134+
[[params.versions]]
135+
version = "v1.3.x"
136+
url = "/kafka-connect-file-pulse/v1-3/docs"
133137

134138
# As of Hugo 0.60
135139
[markup]
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
title: "Archives"
3+
linkTitle: "Archives"
4+
weight: 100
5+
url: "/archives"
6+
description: >
7+
The documentations of prior releases.
8+
---
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
date: 2020-05-21
3+
title: "Developer Guide"
4+
linkTitle: "Developer Guide"
5+
weight: 20
6+
description: >
7+
Learn about the concepts and the functionalities of the Connect File Pulse Plugin.
8+
---
9+
The Developer Guide section helps you learn about the functionalities of the File Pulse Connector and the concepts
10+
File Pulse uses to process and transform your data, and helps you obtain a deeper understanding of how File Pulse Connector works.
11+
12+
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
date: 2020-05-21
3+
title: "Accessing Data and Metadata"
4+
linkTitle: "Accessing Data and Metadata"
5+
weight: 60
6+
description: >
7+
The commons configuration for Connect File Pulse.
8+
---
9+
10+
Some filters (e.g : [AppendFilter](#appendfilter)) can be configured using *Simple Connect Expression Language*.
11+
12+
*Simple Connect Expression Language* (ScEL for short) is an expression language based on regex that allows quick access and manipulating record fields and metadata.
13+
14+
The syntax to define an expression is of the form : "`{{ <expression string> }}`".
15+
16+
{{% alert title="Note" color="info" %}}
17+
In some situation double brackets can be omitted if the expression is used to write a value into a target field.
18+
{{% /alert %}}
19+
20+
ScEL supports the following capabilities :
21+
22+
* **Field Selector**
23+
* **Nested Navigation**
24+
* **String substitution**
25+
* **Functions**
26+
27+
## Field Selector
28+
29+
The expression language can be used to easily select one field from the input record :
30+
31+
"`{{ username }}`"
32+
33+
## Nested Navigation
34+
35+
To navigate down a struct value, just use a period to indicate a nested field value :
36+
37+
"`{{ address.city }}`"
38+
39+
## String substitution
40+
41+
The expression language can be used to easily build a new string field that concatenate multiple ones :
42+
43+
"`{{ <expression one> }}-{{ <expression two>}}`"
44+
45+
## Built-in Functions
46+
47+
ScEL supports a number of predefined functions that can be used to apply a single transformation on a field.
48+
49+
| Function | Description | Syntax |
50+
| ---------------| --------------|-----------|
51+
| `contains` | Returns `true` if an array field's value contains the specified value | `{{ contains(array, value) }}` |
52+
| `converts` | Converts a field'value into the specified type | `{{ converts(field, INTEGER) }}` |
53+
| `ends_with` | Returns `true` if an a string field's value end with the specified string suffix | `{{ ends_with(field, suffix) }}` |
54+
| `equals` | Returns `true` if an a string or number fields's value equals the specified value | `{{ equals(field, value) }}` |
55+
| `exists` | Returns `true` if an the specified field exists | `{{ ends_with(field, value) }}` |
56+
| `extract_array`| Returns the element at the specified position of the specified array | `{{extract_array(array, 0) }}` |
57+
| `is_null` | Returns `true` if a field's value is null | `{{ is_null(field) }}` |
58+
| `length` | Returns the number of elements into an array of the length of an string field | `{{ length(array) }}` |
59+
| `lowercase` | Converts all of the characters in a string field's value to lower case | `{{ lowercase(field) }}` |
60+
| `matches` | Returns `true` if a field's value match the specified regex | `{{ matches(field, regex) }}` |
61+
| `nlv` | Sets a default value if a field's value is null | `{{ length(array) }}` |
62+
| `replace_all ` | Replaces every subsequence of the field's value that matches the given pattern with the given replacement string. | `{{ replace_all(field, regex, replacement) }}` |
63+
| `starts_with` | Returns `true` if an a string field's value start with the specified string prefix | `{{ starts_with(field, prefix) }}` |
64+
| `trim` | Trims the spaces from the beginning and end of a string. | `{{ trim(field) }}` |
65+
| `uppercase` | Converts all of the characters in a string field's value to upper case | `{{ uppercase(field) }}` |
66+
67+
68+
In addition, ScEL supports nested functions.
69+
70+
For example, the following expression is used to replace all whitespace characters after transforming our field's value into lowercase.
71+
72+
```
73+
{{ replace_all(lowercase(field), \\s, -)}}
74+
```
75+
76+
{{% alert title="Limitation" color="warning" %}}
77+
Currently, FilePulse does not support user-defined functions (UDFs). So you cannot register your own functions to enrich the expression language.
78+
{{% /alert %}}
79+
80+
81+
## Scopes
82+
83+
84+
In previous section, we have shown how to use the expression language to select a specific field.
85+
The selected field was part of our the current record being processed.
86+
87+
Actually, ScEL allows you to get access to additional fields through the used of scopes.
88+
Basically, a scope defined the root object on which a selector expression must evaluated.
89+
90+
The syntax to define an expression with a scope is of the form : "`{{ $<scope>.<selector expression string> }}`".
91+
92+
By default, if no scope is defined in the expression, the scope `$value` is implicitly used.
93+
94+
ScEL supports a number of predefined scopes that can be used for example :
95+
96+
- **To override the output topic.**
97+
- **To define record the key to be used.**
98+
- **To get access to the source file metadata.**
99+
- Etc.
100+
101+
| Scope | Description | Type |
102+
|--- | --- |--- |
103+
| `{{ $headers }}` | The record headers | - |
104+
| `{{ $key }}` | The record key | `string` |
105+
| `{{ $metadata }}` | The file metadata | `struct` |
106+
| `{{ $offset }}` | The offset information of this record into the source file | `struct` |
107+
| `{{ $system }}` | The system environment variables and runtime properties | `struct` |
108+
| `{{ $timestamp }}` | The record timestamp | `long` |
109+
| `{{ $topic }}` | The output topic | `string` |
110+
| `{{ $value }}` | The record value| `struct` |
111+
| `{{ $variables }}` | The contextual filter-chain variables| `map[string, object]` |
112+
113+
Note, that in case of failures more fields are added to the current filter context (see : [Handling Failures](./handling-failures)
114+
115+
### Record Headers
116+
117+
The scope `headers` allows to defined the headers of the output record.
118+
119+
### Record key
120+
121+
The scope `key` allows to defined the key of the output record. Only string key is currently supported.
122+
123+
### Source Metadata
124+
125+
The scope `metadata` allows read access to information about the file being processing.
126+
127+
| Predefined Fields (ScEL) | Description | Type |
128+
|--- | --- |--- |
129+
| `{{ $metadata.name }}` | The file name | `string` |
130+
| `{{ $metadata.path }}` | The file directory path | `string` |
131+
| `{{ $metadata.absolutePath }}` | The file absolute path | `string` |
132+
| `{{ $metadata.hash }}` | The file CRC32 hash | `int` |
133+
| `{{ $metadata.lastModified }}` | The file last modified time. | `long` |
134+
| `{{ $metadata.size }}` | The file size | `long` |
135+
| `{{ $metadata.inode }}` | The file Unix inode | `long` |
136+
137+
## Record Offset
138+
139+
The scope `offset` allows read access to information about the original position of the record into the source file.
140+
The available fields depend of the configured FileInputRecord.
141+
142+
| Predefined Fields (ScEL) | Description | Type |
143+
|--- | --- |--- |
144+
| `{{ $offset.timestamp }}` | The creation time of the record (millisecond) | `long` |
145+
146+
Information only available if `RowFilterReader` is configured.
147+
148+
| Predefined Fields (ScEL) | Description | Type |
149+
|--- | --- |--- |
150+
| `{{ $offset.startPosition }}` | The start position of the record into the source file | `long` |
151+
| `{{ $offset.endPosition }}` | The end position of the record into the source file | `long` |
152+
| `{{ $offset.size }}` | The size in bytes | `long` |
153+
| `{{ $offset.row }}` | The row number of the record into the source | `long` |
154+
155+
Information only available if `BytesArrayInputReader` is configured.
156+
157+
| Predefined Fields (ScEL) | Description | Type |
158+
|--- | --- |--- |
159+
| `{{ $offset.startPosition }}` | The start position of the record into the source file (always equals to 0) | `long` |
160+
| `{{ $offset.endPosition }}` | The end position of the record into the source file (equals to the file size) | `long` |
161+
162+
Information only available if `AvroFilterInputReader` is configured.
163+
164+
| Predefined Fields (ScEL) | Description | Type |
165+
|--- | --- |--- |
166+
| `{{ $offset.blockStart }}` | The start position of the current block | `long` |
167+
| `{{ $offset.position }}` | The position into the current block. | `long` |
168+
| `{{ $offset.records }}` | The number of record read into the current block. | `long` |
169+
170+
## System
171+
172+
The scope `system` allows read access to system environment variables and runtime properties.
173+
174+
| Predefined Fields (ScEL) | Description | Type |
175+
|--- | --- |--- |
176+
| `{{ $system.env }}` | The system environment variables. | `map[string, string]` |
177+
| `{{ $system.props }}` | The system environment properties. | `map[string, string]` |
178+
179+
## Timestamp
180+
181+
The scope `timestamp` allows to defined the timestamp of the output record.
182+
183+
## Topic
184+
185+
The scope `topic` allows to defined the target topic of the output record.
186+
187+
## Value
188+
189+
The scope `value` allows to defined the fields of the output record
190+
191+
## Variables
192+
193+
The scope `variables` allows read/write access to a simple key-value map structure.
194+
This scope can be used to share user-defined variables between filters.
195+
196+
Note : variables are not cached between records.
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
date: 2020-05-21
3+
title: "File Cleanup Policies"
4+
linkTitle: "File Cleanup Policies"
5+
weight: 100
6+
description: >
7+
The commons configuration for Connect File Pulse.
8+
---
9+
10+
The connector can be configured with a specific [FileCleanupPolicy](connect-file-pulse-api/src/main/java/io/streamthoughts/kafka/connect/filepulse/clean/FileCleanupPolicy.java) implementation.
11+
12+
The cleanup policy can be configured with the below connect property :
13+
14+
| Configuration | Description | Type | Default | Importance |
15+
| --------------| --------------|-----------| --------- | ------------- |
16+
|`fs.cleanup.policy.class` | The fully qualified name of the class which is used to cleanup files | class | *-* | high |
17+
18+
19+
## Available Cleanup Policies
20+
21+
### DeleteCleanPolicy
22+
23+
This policy deletes all files regardless of their final status (completed or failed).
24+
25+
To enable this policy, the property `fs.cleanup.policy.class` must configured to :
26+
27+
```
28+
io.streamthoughts.kafka.connect.filepulse.clean.DeleteCleanPolicy
29+
```
30+
31+
#### Configuration
32+
no configuration
33+
34+
### LogCleanPolicy
35+
36+
This policy prints to logs some information after files completion.
37+
38+
To enable this policy, the property `fs.cleanup.policy.class` must configured to :
39+
40+
```
41+
io.streamthoughts.kafka.connect.filepulse.clean.LogCleanPolicy
42+
```
43+
44+
#### Configuration
45+
no configuration
46+
47+
### MoveCleanPolicy
48+
49+
This policy attempts to move atomically files to configurable target directories.
50+
51+
To enable this policy, the property `fs.cleanup.policy.class` must configured to :
52+
53+
```
54+
io.streamthoughts.kafka.connect.filepulse.clean.MoveCleanPolicy
55+
```
56+
57+
#### Configuration
58+
59+
| Configuration | Description | Type | Default | Importance |
60+
| --------------| --------------|-----------| --------- | ------------- |
61+
|`cleaner.output.failed.path` | Target directory for file proceed with failure | string | *.failure* | high |
62+
|`cleaner.output.succeed.path` | Target directory for file proceed successfully | string | *.success* | high |
63+
64+
## Implementing your own policy
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
date: 2020-05-21
3+
title: "Conditional Execution"
4+
linkTitle: "Conditional Execution"
5+
weight: 60
6+
description: >
7+
Learn how to conditionally execute a transformation filter.
8+
---
9+
10+
A conditional property `if` can be configured on each filter to determine if that filter should be applied or skipped.
11+
When a filter is skipped, message flow to the next filter without any modification.
12+
13+
`if` configuration accepts a Simple Connect Expression that must return to `true` or `false`.
14+
If the configured expression does not evaluate to a boolean value the filter chain will failed.
15+
16+
The`if` property supports ([simple expression](accessing-data-and-metadata))
17+
18+
The boolean value returned from the filter condition can be inverted by setting the property `invert` to `true`.
19+
20+
For example, the below filter will only be applied on message having a log message containing "BadCredentialsException"
21+
22+
```
23+
filters.TagSecurityException.type=io.streamthoughts.kafka.connect.filepulse.filter.AppendFilter
24+
filters.TagSecurityException.if={{ contains(data.logmessage, BadCredentialsException) }}
25+
filters.TagSecurityException.invert=false
26+
filters.TagSecurityException.field=tags
27+
filters.TagSecurityException.values=SecurityAlert
28+
```
29+
30+
These boolean functions are available for use with `if` configuration :
31+
32+
| Function | Description | Syntax |
33+
| --------------| --------------|-----------|
34+
| `contains` | Returns `true` if an array field's value contains the specified value | `{% raw %}{{ contains(field, value) }}{% endraw %}` |
35+
| `ends_with` | Returns `true` if an a string field's value end with the specified string suffix | `{% raw %}{{ ends_with(field, suffix) }}{% endraw %}` |
36+
| `equals` | Returns `true` if an a string or number fields's value equals the specified value | `{% raw %}{{ equals(field, value) }}{% endraw %}` |
37+
| `exists` | Returns `true` if an the specified field exists | `{% raw %}{{ exists(struct, field) }}{% endraw %}` |
38+
| `is_null` | Returns `true` if a field's value is null | `{% raw %}{{ is_null(field) }}{% endraw %}` |
39+
| `matches` | Returns `true` if a field's value match the specified regex | `{% raw %}{{ matches(field, regex) }}{% endraw %}` |
40+
| `starts_with` | Returns `true` if an a string field's value start with the specified string prefix | `{% raw %}{{ starts_with(field, prefix) }}{% endraw %}` |
41+
42+
43+
**Limitations** :
44+
* `if` property does not support binary operator and then a single condition can be configured.
45+
* condition cannot be used to easily create pipeline branching.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
date: 2020-05-25
3+
title: "Basic Configuration"
4+
linkTitle: "Basic Configuration"
5+
weight: 20
6+
description: >
7+
The commons configuration for deploying a File Pulse connector.
8+
---
9+
10+
## Commons configuration
11+
12+
Whatever the kind of files you are processing a connector should always be configured with the below properties.
13+
Those configuration are described in detail in subsequent chapters.
14+
15+
| Configuration | Description | Type | Default | Importance |
16+
| --------------| --------------|-----------| --------- | ------------- |
17+
|`fs.scanner.class` | The fully qualified name of the class which is used to scan file system | class | *io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker* | medium |
18+
|`fs.cleanup.policy.class` | The fully qualified name of the class which is used to cleanup files | class | *-* | high |
19+
|`fs.scan.directory.path` | The input directory to scan | string | *-* | high |
20+
|`fs.scan.interval.ms` | Time interval (in milliseconds) at wish to scan input directory | long | *10000* | high |
21+
|`fs.scan.filters` | Filters use to list eligible input files| list | *-* | medium |
22+
|`filters` | List of filters aliases to apply on each data (order is important) | list | *-* | medium |
23+
|`internal.kafka.reporter.topic` | Name of the internal topic used by tasks and connector to report and monitor file progression. | class | *connect-file-pulse-status* | high |
24+
|`internal.kafka.reporter.bootstrap.servers` |A list of host/port pairs uses by the reporter for establishing the initial connection to the Kafka cluster. | string | *-* | high |
25+
|`task.reader.class` | The fully qualified name of the class which is used by tasks to read input files | class | *io.streamthoughts.kafka.connect.filepulse.reader.RowFileReader* | high |
26+
|`offset.strategy` | The strategy to use for building source offset from an input file; must be one of [name, path, name+hash] | string | *name+hash* | high |
27+
|`topic` | The default output topic to write | string | *-* | high |
28+
29+
30+
### Prior to Connect FilePulse 1.3.x (deprecated)
31+
| Configuration | Description | Type | Default | Importance |
32+
| --------------| --------------|-----------| --------- | ------------- |
33+
|`internal.kafka.reporter.id` | The reporter identifier to be used by tasks and connector to report and monitor file progression (default null). This property must only be set for users that have run a connector in version prior to 1.3.x to ensure backward-compatibility (when set, must be unique for each connect instance). | string | *-* | high |
34+

0 commit comments

Comments
 (0)