Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 0 additions & 56 deletions docs/architecture.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/architecture/detect_and_troubleshoot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
`sudo tcpdump -n -s 0 -S -i any -A -v 'port 514 and (tcp or udp)'`
Empty file.
Empty file.
3 changes: 3 additions & 0 deletions docs/architecture/ha.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Load balancing for high availability does not work well for stateless, unacknowledged syslog traffic. More data is preserved when you use a more simple design such as vMotioned VMs. With syslog, the protocol itself is prone to loss, and syslog data collection can be made "mostly available" at best.

The best deployment model for high availability is a [Microk8s](https://microk8s.io/) based deployment with MetalLB in BGP mode. This model uses a special class of load balancer that is implemented as destination network translation.
21 changes: 21 additions & 0 deletions docs/architecture/lb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Load balancers are not a best practice for SC4S
In syslog ingestion systems load balancers are usually used for horizontal scaling and high availability.

It is a best practice to avoid load balancing in both cases. Instead of horizontal scaling it is recommended use a robust, single server. For high availability choose rather a shared-IP cluster.

While neither recommended nor supported, the usage of LBs is still popular among SC4S users. This section of documentation discusses various LB solutions and their possible setups together with well known issues.

## General considerations regarding load balancers
While using load balancers it's recommended to:
- Preserve the actual source IP of the sending machine. The default behavior of L4 LBs is to overwrite the source IP from the client’s IP to their own.
- For high availability use the LB solution with HA mode

Load balancing setup differs for TCP/TLS and UDP.

For TCP/TLS:
- There are two ways of preserving the source IP: using the "PROXY" protocol or IP transparency (DNAT configuration)
- For the "PROXY" configuration make sure to enable it on the SC4S side with `SC4S_SOURCE_PROXYCONNECT=yes`
- TCP/TLS load balancers do not consider the weight of individual connection load and are frequently biased to one instance. Vertically scale all members in a single resource pool to accommodate the full workload

For UDP:
- Load balancers for UDP can only use DNAT, for example with DSR (Direct Server Response)
178 changes: 178 additions & 0 deletions docs/architecture/nginx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Nginx
While load balancing syslog with NGINX Open Source is neither recommended, nor supported by Splunk, it is still a "good enough" solution for some customers.

Note the main disadvantages of Nginx Open Source:
- Due to no High Availability an Nginx LB becomes a new single point of failure.
- Even with the round-robin we also often observe bias in traffic distribution which results in overloading some of the instances in the pool. This results in growing queues, which lead to delays, data drops and memory and disk issues.
- Nginx Open Source doesn't provide active health checking, which is crucial for UDP DSR (Direct Server Return) load balancing.

## Install Nginx
1. Refer to Nginx documentation for instructions on installing Nginx **with the stream module**, which is necessary for TCP/UDP load balancing. For example on Ubuntu:
```bash
sudo apt update
sudo apt -y install nginx libnginx-mod-stream
```

2. In the main Nginx configuration update `events` section to increase performance, for example:
`/etc/nginx/nginx.conf`
```conf
events {
worker_connections 20480;
multi_accept on;
use epoll;
}
```

## Preserving source IP
| Method | Protocol |
|----------------------------|------------|
| PROXY protocol | TCP/TLS |
| Transparent IP | TCP/TLS |
| Direct Server Return (DSR) | UDP |

## Option 1: Configure Nginx with the PROXY protocol
Advantages:
- easy to set up

Disadvantages:
- worse performance
- available only for TCP/TLS, not available for UDP
- overwriting the source IP in syslog-ng is not ideal. SOURCEIP is a hard macro and only HOST can be overwritten
- overwriting the source IP is available only in SC4S>3.4.0

1. On your LB node add a configuration similar to the following:
`/etc/nginx/modules-enabled/sc4s.conf`
```conf
stream {
# Define upstream for each of SC4S hosts and ports
# Default SC4S TCP ports are 514, 601, 5425, 6514
# Include also your custom ports
upstream stream_syslog_514 {
server <SC4S_IP_1>:514;
server <SC4S_IP_2>:514;
}
upstream stream_syslog_601 {
server <SC4S_IP_1>:601;
server <SC4S_IP_2>:601;
}
upstream stream_syslog_5425 {
server <SC4S_IP_1>:5425;
server <SC4S_IP_2>:5425;
}
upstream stream_syslog_6514 {
server <SC4S_IP_1>:6514;
server <SC4S_IP_2>:6514;
}

# Define a common configuration block for all servers
map $server_port $upstream_name {
514 stream_syslog_514;
601 stream_syslog_601;
5425 stream_syslog_5425;
}

# Define a virtual server for each upstream connection
# make sure to set 'proxy_protocol' to 'on'
server {
listen 514;
listen 601;
listen 5425;
proxy_pass $upstream_name;

proxy_timeout 3s;
proxy_connect_timeout 3s;

proxy_protocol on;
}

server {
listen 6514;
proxy_pass stream_syslog_6514;

proxy_timeout 3s;
proxy_connect_timeout 3s;

proxy_protocol on;

proxy_ssl on;
}
}
```
3. Refer to Nginx documentation to find the command to reload the service, for example `sudo nginx -s reload`.
4. Add the following parameter to SC4S configuration and restart your instances:
`/opt/sc4s/env_file`
```conf
SC4S_SOURCE_PROXYCONNECT=yes
```

### Test your setup
1. Send TCP/TLS messages to the load balancer and ensure that they are being correctly received in Splunk with the correct host IP:
```bash
echo "hello world" | netcat <LB_IP> 514
```

2. Run performance tests based on [Check TCP Performance](tcp_performance_tests.md)
| Receiver | Performance |
|----------------------------|-------------------------------|
| Single SC4S Server | 4,341,000 (71,738.98 msg/sec) |
| Load Balancer + 2 Servers | 5,996,000 (99,089.03 msg/sec) |


## Option 2: Configure Nginx with DSR (Direct Server Return)
Advantages:
- works for UDP
- more efficient (saves one hop)

Disadvantages:
- DSR setup requires active health checks, because LB cannot expect responses from the upstream. Active health checks are not available in Nginx open source. Switch to Nginx Plus or implement your own active health checking
- requires superuser privileges
- for cloud users might require disabling Source/Destination Checking (tested with AWS)

1. In the main Nginx configuration update `user` to root, for example:
`/etc/nginx/nginx.conf`
```conf
user root;
```

2. Add a configuration similar to the following:
`/etc/nginx/modules-enabled/sc4s.conf`
```conf
stream {
# Define upstream for each of SC4S hosts and ports
# Default SC4S UDP port is 514
# Include also your custom ports
upstream stream_syslog_514 {
server <SC4S_IP_1>:514;
server <SC4S_IP_2>:514;
}

# Define connections to each of your upstreams.
# Make sure to include `proxy_bind` and `proxy_responses 0`.
server {
listen 514 udp;
proxy_pass stream_syslog_514;

proxy_bind $remote_addr:$remote_port transparent;
proxy_responses 0;
}
}
```

3. Refer to Nginx documentation to find the command to reload the service, for example `sudo nginx -s reload`.

4. Make sure to disable `Source/Destination Checking` on your LB's host if you work on AWS

### Test your setup
1. Send UDP messages to the load balancer and ensure that they are being correctly received in Splunk with the correct host IP:
```bash
echo "hello world" > /dev/udp/<LB_IP>/514
```

2. Run performance tests

| Receiver / Drops Rate for EPS (msgs/sec) | 4,500 | 9,000 | 27,000 | 50,000 | 150,000 | 300,000 |
|------------------------------------------|--------|--------|--------|--------|---------|---------|
| Single SC4S Server | 0.33% | 1.24% | 52.31% | 74.71% | -- | -- |
| Load Balancer + 2 Servers | 1% | 1.19% | 6.11% | 47.64% | -- | -- |
| Single Finetuned SC4S Server | 0% | 0% | 0% | 0% | 47.37% | -- |
| Load Balancer + 2 Finetuned Servers | 0.98% | 1.14% | 1.05% | 1.16% | 3.56% | 55.54% |
Empty file.
62 changes: 62 additions & 0 deletions docs/architecture/recommendations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Architectural Considerations

Building a performant, HA, performant and scalable syslog ingestion system is a non-trivial task.

The syslog protocol design prioritizes speed and efficiency, which can occur at the expense of resiliency and reliability. Because of these tradeoffs, traditional methods to provide scale and resiliency do not necessarily transfer to syslog.

## Syslog Architecture recommendations
The following subsections provide recommendations and suggestions for planning your syslog ingestions system based on SC4S.

### Recommended system design sequence
1. Locate your SC4S server
2. Choose your optimal hardware setup
3. Fine-tune your SC4S instance
4. Monitor and troubleshoot
5. Build a high-availability architecture

#### Locate your SC4S server
Syslog is a "send and forget" protocol snf iy does not perform well when routed through substantial network infrastructure.

For centrally located syslog servers we often observe both UDP and TCP traffic problems and data loss.

Instead, provide for edge collection. Keep the client and server ideally a few - optimally one - hop away from each other. Syslog should not pass a WAN and the chance of a failure increaces with the number of Layer 4 devices in the path, including TCP/UDP load balancers.

#### Choose your optimal hardware setup
Hardware specification is the crucial part of designing a performant syslog ingestion system. See [Choose Your Hardware Setup](hardware.md).

#### Choose between UDP and TCP and fine-tune SC4S
While UDP is the protocol traditionally recommended for syslog, TCP is also an option provided by the standard and many vendors.

UDP reduces network load on the network stream with no required receipt verification or window adjustment. TCP uses Acknowledgement Signals (ACKS) to avoid data loss, however, loss can still occur, when:

* The TCP session is closed: Events published while the system is creating a new session are lost.
* The remote side is busy and cannot send an acknowledgement signal fast enough: Events are lost due to a full local buffer.
* A single acknowledgement signal is lost by the network and the client closes the connection: Local and remote buffer are lost.
* The remote server restarts for any reason: Local buffer is lost.
* The remote server restarts without closing the connection: Local buffer plus timeout time are lost.
* The client side restarts without closing the connection.
* Increased overhead on the network can lead to loss.

You can for example use TCP only if the syslog event is larger than the maximum size of the UDP packet on your network (typically limited to Web Proxy, DLP, and IDs type sources).

Depending on your choice you should check some or all of the following subsections:
- [Check UDP Performance]("architecture/udp_performance_tests.md")
- [Finetuning for UDP]("architecture/finetuning_for_udp.md")
- [Check TCP Performance]("architecture/tcp_performance_tests.md")
- [Finetuning for TCP]("architecture/finetuning_for_tcp.md")

#### Avoid load balancers in front of SC4S
It is common to see syslog designs with various load balancers distributing traffic to multiple SC4S instances.

We are aware of the popularity of this solution. We document best practices related to load balancers in the [Load Balancers](architecture/lb.md) section, as well as requirements and challenges related to load balancing syslog.

However, Splunk does not support architectures utilizing load balancers for scaling.

As a best practice, do not co-locate syslog servers for horizontal scale and do not load balance to them with a front-side load balancer. Instead, make sure that every SC4S instance in your HA cluster can accomodate the full workload.

For the reasons behind see the [Load Balancers](architecture/lb.md) section.

#### Monitor and troubleshoot

#### Build a high-availability architecture
Load balancing for high availability does not work well for stateless, unacknowledged syslog traffic. More data is preserved when you use a more simple design such as vMotioned VMs. With syslog, the protocol itself is prone to loss, and syslog data collection can be made "mostly available" at best.
13 changes: 0 additions & 13 deletions docs/lb.md

This file was deleted.

21 changes: 18 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,7 @@ theme:

nav:
- Home: "index.md"
- Architectural Considerations: "architecture.md"
- Load Balancers: "lb.md"

- Getting Started:
- Read First: "gettingstarted/index.md"
- Quickstart Guide: "gettingstarted/quickstart_guide.md"
Expand All @@ -59,7 +58,23 @@ nav:
- Read First: "sources/index.md"
- Basic Onboarding: "sources/base"
- Known Vendors: "sources/vendor"
- Performance: "performance.md"

- High Availability and Scalability:
- Architectural Recommendations: "architecture/recommendations.md"
- SC4S Performance:
- Check UDP Performance: "architecture/udp_performance_tests.md"
- Check TCP Performance: "architecture/tcp_performance_tests.md"
- Detect and Troubleshoot Data Losses: "architecture/detect_and_troubleshoot.md"
- Choose Your Hardware Setup: "architecture/hardware.md"
- Vertical Scaling:
- Finetuning for UDP: "architecture/finetuning_for_udp.md"
- Finetuning for TCP: "architecture/finetuning_for_tcp.md"
- High Availability:
- Recommendations: "architecture/ha.md"
- Load Balancers:
- Recommendations: "architecture/lb.md"
- Nginx: "architecture/nginx.md"

- SC4S Lite (Experimental):
- Intro: "lite.md"
- Pluggable modules: "pluggable_modules.md"
Expand Down