From 0eb4ecc7abca1c1ebb8c951ab6e29ec43ecfb7b2 Mon Sep 17 00:00:00 2001 From: mstopa-splunk Date: Thu, 29 Aug 2024 12:21:50 +0000 Subject: [PATCH 1/4] Scaling docs --- docs/architecture.md | 56 ------------------ docs/architecture/detect_and_troubleshoot.md | 0 docs/architecture/finetuning_for_tcp.md | 0 docs/architecture/finetuning_for_udp.md | 0 docs/architecture/ha.md | 1 + docs/architecture/lb.md | 3 + docs/architecture/performance_tests.md | 0 docs/architecture/recommendations.md | 62 ++++++++++++++++++++ mkdocs.yml | 19 +++++- 9 files changed, 82 insertions(+), 59 deletions(-) delete mode 100644 docs/architecture.md create mode 100644 docs/architecture/detect_and_troubleshoot.md create mode 100644 docs/architecture/finetuning_for_tcp.md create mode 100644 docs/architecture/finetuning_for_udp.md create mode 100644 docs/architecture/ha.md create mode 100644 docs/architecture/lb.md create mode 100644 docs/architecture/performance_tests.md create mode 100644 docs/architecture/recommendations.md diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index 1fe98b048d..0000000000 --- a/docs/architecture.md +++ /dev/null @@ -1,56 +0,0 @@ -# SC4S Architectural Considerations - -SC4S provides performant and reliable syslog data collection. When you are planning your configuration, review the following architectural considerations. These recommendations pertain to the Syslog protocol and age, and are not specific to Splunk Connect for Syslog. - -## The syslog Protocol - -The syslog protocol design prioritizes speed and efficiency, which can occur at the expense of resiliency and reliability. User Data Protocol (UDP) provides the ability to "send and forget" events over the network without regard to or acknowledgment of receipt. Transport Layer Secuirty (TLS) and Secure Sockets Layer (SSL) protocols are also supported, though UDP prevails as the preferred syslog transport for most data centers. - -Because of these tradeoffs, traditional methods to provide scale and resiliency do not necessarily transfer to syslog. - -## IP protocol - -By default, SC4S listens on ports using IPv4. IPv6 is also supported, see `SC4S_IPV6_ENABLE` in [source configuration options](https://splunk.github.io/splunk-connect-for-syslog/main/configuration/#syslog-source-configuration). - -## Collector Location - -Since syslog is a "send and forget" protocol, it does not perform well when routed through substantial network infrastructure. This -includes front-side load balancers and WAN. The most reliable way to collect syslog traffic is to provide for edge -collection rather than centralized collection. If you centrally locate your syslog server, the UDP and (stateless) -TCP traffic cannot adjust and data loss will occur. - -## syslog Data Collection at Scale -As a best practice, do not co-locate syslog-ng servers for horizontal scale and load balance to them with a front-side load balancer: - -* Attempting to load balance for scale can cause more data loss due to normal device operations -and attendant buffer loss. A simple, robust single server or shared-IP cluster provides the best performance. - -* Front-side load balancing causes inadequate data distribution on the upstream side, leading to uneven data load on the indexers. - -## High availability considerations and challenges - -Load balancing for high availability does not work well for stateless, unacknowledged syslog traffic. More data is preserved when you use a more simple design such as vMotioned VMs. With syslog, the protocol itself is prone to loss, and syslog data collection can be made "mostly available" at best. - -## UDP vs. TCP - -Run your syslog configuration on UDP rather than TCP. - -The syslogd daemon optimally uses UDP for log forwarding to reduce overhead. This is because UDP's streaming method does not require the overhead of establishing a network session. -UDP reduces network load on the network stream with no required receipt verification or window adjustment. - -TCP uses Acknowledgement Signals (ACKS) to avoid data loss, however, loss can still occur when: - -* The TCP session is closed: Events published while the system is creating a new session are lost. -* The remote side is busy and cannot send an acknowledgement signal fast enough: Events are lost due to a full local buffer. -* A single acknowledgement signal is lost by the network and the client closes the connection: Local and remote buffer are lost. -* The remote server restarts for any reason: Local buffer is lost. -* The remote server restarts without closing the connection: Local buffer plus timeout time are lost. -* The client side restarts without closing the connection. -* Increased overhead on the network can lead to loss. - -Use TCP if the syslog event is larger than the maximum size of the UDP packet on your network typically limited to Web Proxy, DLP, and IDs type sources. -To mitigate the drawbacks of TCP you can use TLS over TCP: - -* The TLS can continue a session over a broken TCP to reduce buffer loss conditions. -* The TLS fills packets for more efficient use of memory. -* The TLS compresses data in most cases. \ No newline at end of file diff --git a/docs/architecture/detect_and_troubleshoot.md b/docs/architecture/detect_and_troubleshoot.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/architecture/finetuning_for_tcp.md b/docs/architecture/finetuning_for_tcp.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/architecture/finetuning_for_udp.md b/docs/architecture/finetuning_for_udp.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/architecture/ha.md b/docs/architecture/ha.md new file mode 100644 index 0000000000..ee285ff777 --- /dev/null +++ b/docs/architecture/ha.md @@ -0,0 +1 @@ +Load balancing for high availability does not work well for stateless, unacknowledged syslog traffic. More data is preserved when you use a more simple design such as vMotioned VMs. With syslog, the protocol itself is prone to loss, and syslog data collection can be made "mostly available" at best. \ No newline at end of file diff --git a/docs/architecture/lb.md b/docs/architecture/lb.md new file mode 100644 index 0000000000..9926ddfad8 --- /dev/null +++ b/docs/architecture/lb.md @@ -0,0 +1,3 @@ +* Attempting to load balance for scale can cause more data loss due to normal device operations +and attendant buffer loss. A simple, robust single server or shared-IP cluster provides the best performance. +* Front-side load balancing causes inadequate data distribution on the upstream side, leading to uneven data load on the indexers. \ No newline at end of file diff --git a/docs/architecture/performance_tests.md b/docs/architecture/performance_tests.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/architecture/recommendations.md b/docs/architecture/recommendations.md new file mode 100644 index 0000000000..f2b37f7f46 --- /dev/null +++ b/docs/architecture/recommendations.md @@ -0,0 +1,62 @@ +# Architectural Considerations + +Building a performant, HA, performant and scalable syslog ingestion system is a non-trivial task. + +The syslog protocol design prioritizes speed and efficiency, which can occur at the expense of resiliency and reliability. Because of these tradeoffs, traditional methods to provide scale and resiliency do not necessarily transfer to syslog. + +## Syslog Architecture recommendations +The following subsections provide recommendations and suggestions for planning your syslog ingestions system based on SC4S. + +### Recommended system design sequence +1. Locate your SC4S server +2. Choose your optimal hardware setup +3. Fine-tune your SC4S instance +4. Monitor and troubleshoot +5. Build a high-availability architecture + +#### Locate your SC4S server +Syslog is a "send and forget" protocol snf iy does not perform well when routed through substantial network infrastructure. + +For centrally located syslog servers we often observe both UDP and TCP traffic problems and data loss. + +Instead, provide for edge collection. Keep the client and server ideally a few - optimally one - hop away from each other. Syslog should not pass a WAN and the chance of a failure increaces with the number of Layer 4 devices in the path, including TCP/UDP load balancers. + +#### Choose your optimal hardware setup +Hardware specification is the crucial part of designing a performant syslog ingestion system. See [Choose Your Hardware Setup](hardware.md). + +#### Choose between UDP and TCP and fine-tune SC4S +While UDP is the protocol traditionally recommended for syslog, TCP is also an option provided by the standard and many vendors. + +UDP reduces network load on the network stream with no required receipt verification or window adjustment. TCP uses Acknowledgement Signals (ACKS) to avoid data loss, however, loss can still occur, when: + +* The TCP session is closed: Events published while the system is creating a new session are lost. +* The remote side is busy and cannot send an acknowledgement signal fast enough: Events are lost due to a full local buffer. +* A single acknowledgement signal is lost by the network and the client closes the connection: Local and remote buffer are lost. +* The remote server restarts for any reason: Local buffer is lost. +* The remote server restarts without closing the connection: Local buffer plus timeout time are lost. +* The client side restarts without closing the connection. +* Increased overhead on the network can lead to loss. + +You can for example use TCP only if the syslog event is larger than the maximum size of the UDP packet on your network (typically limited to Web Proxy, DLP, and IDs type sources). + +Depending on your choice you should check some or all of the following subsections: +- [Check UDP Performance]("architecture/udp_performance_tests.md") +- [Finetuning for UDP]("architecture/finetuning_for_udp.md") +- [Check TCP Performance]("architecture/tcp_performance_tests.md") +- [Finetuning for TCP]("architecture/finetuning_for_tcp.md") + +#### Avoid load balancers in front of SC4S +It is common to see syslog designs with various load balancers distributing traffic to multiple SC4S instances. + +We are aware of the popularity of this solution. We document best practices related to load balancers in the [Load Balancers](architecture/lb.md) section, as well as requirements and challenges related to load balancing syslog. + +However, Splunk does not support architectures utilizing load balancers for scaling. + +As a best practice, do not co-locate syslog servers for horizontal scale and do not load balance to them with a front-side load balancer. Instead, make sure that every SC4S instance in your HA cluster can accomodate the full workload. + +For the reasons behind see the [Load Balancers](architecture/lb.md) section. + +#### Monitor and troubleshoot + +#### Build a high-availability architecture +Load balancing for high availability does not work well for stateless, unacknowledged syslog traffic. More data is preserved when you use a more simple design such as vMotioned VMs. With syslog, the protocol itself is prone to loss, and syslog data collection can be made "mostly available" at best. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 54f426b628..33d4d21ee7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -33,8 +33,7 @@ theme: nav: - Home: "index.md" - - Architectural Considerations: "architecture.md" - - Load Balancers: "lb.md" + - Getting Started: - Read First: "gettingstarted/index.md" - Quickstart Guide: "gettingstarted/quickstart_guide.md" @@ -59,7 +58,21 @@ nav: - Read First: "sources/index.md" - Basic Onboarding: "sources/base" - Known Vendors: "sources/vendor" - - Performance: "performance.md" + + - High Availability and Scalability: + - Architectural Recommendations: "architecture/recommendations.md" + - SC4S Performance: + - Check UDP Performance: "architecture/udp_performance_tests.md" + - Check TCP Performance: "architecture/tcp_performance_tests.md" + - Detect and Troubleshoot Data Losses: "architecture/detect_and_troubleshoot.md" + - Choose Your Hardware Setup: "architecture/hardware.md" + - Vertical Scaling: + - Finetuning for UDP: "architecture/finetuning_for_udp.md" + - Finetuning for TCP: "architecture/finetuning_for_tcp.md" + - High Availability: + - Recommendations: "architecture/ha.md" + - Load Balancers: "architecture/lb.md" + - SC4S Lite (Experimental): - Intro: "lite.md" - Pluggable modules: "pluggable_modules.md" From ad7f2ff0ca3e978e195dcfab4ded1125e58a2a1f Mon Sep 17 00:00:00 2001 From: mstopa-splunk Date: Fri, 30 Aug 2024 11:07:57 +0000 Subject: [PATCH 2/4] Add nginx PROXY description --- docs/architecture/detect_and_troubleshoot.md | 1 + docs/architecture/ha.md | 4 +- docs/architecture/lb.md | 24 +++- docs/architecture/nginx.md | 115 +++++++++++++++++++ docs/lb.md | 13 --- mkdocs.yml | 4 +- 6 files changed, 143 insertions(+), 18 deletions(-) create mode 100644 docs/architecture/nginx.md delete mode 100644 docs/lb.md diff --git a/docs/architecture/detect_and_troubleshoot.md b/docs/architecture/detect_and_troubleshoot.md index e69de29bb2..b881911b33 100644 --- a/docs/architecture/detect_and_troubleshoot.md +++ b/docs/architecture/detect_and_troubleshoot.md @@ -0,0 +1 @@ +`sudo tcpdump -n -s 0 -S -i any -A -v 'port 514 and (tcp or udp)` \ No newline at end of file diff --git a/docs/architecture/ha.md b/docs/architecture/ha.md index ee285ff777..4fcc11c6f3 100644 --- a/docs/architecture/ha.md +++ b/docs/architecture/ha.md @@ -1 +1,3 @@ -Load balancing for high availability does not work well for stateless, unacknowledged syslog traffic. More data is preserved when you use a more simple design such as vMotioned VMs. With syslog, the protocol itself is prone to loss, and syslog data collection can be made "mostly available" at best. \ No newline at end of file +Load balancing for high availability does not work well for stateless, unacknowledged syslog traffic. More data is preserved when you use a more simple design such as vMotioned VMs. With syslog, the protocol itself is prone to loss, and syslog data collection can be made "mostly available" at best. + +The best deployment model for high availability is a [Microk8s](https://microk8s.io/) based deployment with MetalLB in BGP mode. This model uses a special class of load balancer that is implemented as destination network translation. \ No newline at end of file diff --git a/docs/architecture/lb.md b/docs/architecture/lb.md index 9926ddfad8..83dbc00ac9 100644 --- a/docs/architecture/lb.md +++ b/docs/architecture/lb.md @@ -1,3 +1,21 @@ -* Attempting to load balance for scale can cause more data loss due to normal device operations -and attendant buffer loss. A simple, robust single server or shared-IP cluster provides the best performance. -* Front-side load balancing causes inadequate data distribution on the upstream side, leading to uneven data load on the indexers. \ No newline at end of file +# Load balancers are not a best practice for SC4S +In syslog ingestion systems load balancers are usually used for horizontal scaling and high availability. + +It is a best practice to avoid load balancing in both cases. Instead of horizontal scaling it is recommended use a robust, single server. For high availability choose rather a shared-IP cluster. + +While neither recommended nor supported, the usage of LBs is still popular among SC4S users. This section of documentation discusses various LB solutions and their possible setups together with well known issues. + +## General considerations regarding load balancers +While using load balancers it's recommended to: +- Preserve the actual source IP of the sending machine. The default behavior of L4 LBs is to overwrite the source IP from the client’s IP to their own. +- For high availability use the LB solution with HA mode + +Load balancing setup differs for TCP/TLS and UDP. + +For TCP/TLS: +- There are two ways of preserving the source IP: using the "PROXY" protocol or IP transparency (DNAT configuration) +- For the "PROXY" configuration make sure to enable it on the SC4S side with `SC4S_SOURCE_PROXYCONNECT=yes` +- TCP/TLS load balancers do not consider the weight of individual connection load and are frequently biased to one instance. Vertically scale all members in a single resource pool to accommodate the full workload + +For UDP: +- Load balancers for UDP can only use DNAT, for example with DSR (Direct Server Response) \ No newline at end of file diff --git a/docs/architecture/nginx.md b/docs/architecture/nginx.md new file mode 100644 index 0000000000..a5e5bb19f1 --- /dev/null +++ b/docs/architecture/nginx.md @@ -0,0 +1,115 @@ +# Nginx +While using a bare Nginx load balancing is neither a recommended, nor a supported solution, it is still a "good enough" solution for some customers. + +It is a free and open source solution, well documented and with a big and active community of users. +The open source version of Nginx doesn't provide High Availability, so in fact an nginx LB becomes a new single point of failure. Even with the round-robin we also often observe bias in traffic distribution which results in overloading some of the instances in the pool. As the result customers report memory and disk issues, growing queues and delays in processing. + +## Preserving source IP +| Method | Protocol | +|----------------------------|------------| +| PROXY protocol | TCP/TLS | +| Transparent IP | TCP/TLS | +| Direct Server Return (DSR) | UDP | + +## Install Nginx +1. Refer to Nginx documentation for instructions on installing Nginx **with the stream module**, which is necessary for TCP/UDP load balancing. For example on Ubuntu: +```bash +sudo apt update +sudo apt -y install nginx libnginx-mod-stream +``` + +2. In the main Nginx configuration update `events` section to increase performance, for example: +`/etc/nginx/nginx.conf` +```conf +events { + worker_connections 20480; + multi_accept on; + use epoll; +} +``` + +## Configure Nginx with the PROXY protocol +Advantages: +- easy to set up + +Disadvantages: +- worse performance +- available only for TCP/TLS, not available for UDP +- overwriting the source IP in syslog-ng is not ideal. SOURCEIP is a hard macro and only HOST can be overwritten +- overwriting the source IP is available only in SC4S>3.4.0 + +1. On your LB node add a configuration similar to the following: +`/etc/nginx/modules-enabled/sc4s.conf` +```conf +stream { + # Define upstream for each of SC4S hosts and ports + # Default SC4S TCP ports are 514, 601, 5425, 6514 + # Include also your custom ports + upstream stream_syslog_514 { + server :514; + server :514; + } + upstream stream_syslog_601 { + server :601; + server :601; + } + upstream stream_syslog_5425 { + server :5425; + server :5425; + } + upstream stream_syslog_6514 { + server :6514; + server :6514; + } + + # Define a common configuration block for all servers + map $server_port $upstream_name { + 514 stream_syslog_514; + 601 stream_syslog_601; + 5425 stream_syslog_5425; + 6514 stream_syslog_6514; + } + + # Define a virtual server for each upstream connection + # make sure to set 'proxy_protocol' to 'on' + server { + listen 514; + listen 601; + listen 5425; + listen 6514; + proxy_pass $upstream_name; + + proxy_timeout 3s; + proxy_connect_timeout 3s; + + proxy_protocol on; + + # Enable SSL only on port 6514 + ssl_preread on; + if ($server_port = 6514) { + proxy_ssl on; + } + } +} +``` +3. Refer to Nginx documentation to find the command to reload the service, for example `sudo nginx -s reload`. +4. Add the following parameter to SC4S configuration and restart your instances: +`/opt/sc4s/env_file` +```conf +SC4S_SOURCE_PROXYCONNECT=yes +``` + +## Test your setup +1. Send TCP/TLS messages to the load balancer and ensure that they are being correctly received in Splunk with the correct host IP: +```bash +echo "hello world" | netcat 514 +``` + +2. Run performance tests based on [Check TCP Performance](tcp_performance_tests.md) +| Receiver | Same Subnet | WAN | +|----------------|-------------------------------|--------------------------------| +| Server 1 | 4,410,000 (72,879.48 msg/sec) | 4,280,000 (70,726.90 msg/sec) | +| Server 2 | 4,341,000 (71,738.98 msg/sec) | 4,255,000 (70,316.86 msg/sec) | +| Load Balancer | 5,996,000 (99,089.03 msg/sec) | 6,046,000 (99,917.23 msg/sec) | + + diff --git a/docs/lb.md b/docs/lb.md deleted file mode 100644 index 38522a010f..0000000000 --- a/docs/lb.md +++ /dev/null @@ -1,13 +0,0 @@ -# About using load balancers - -Load balancers are not a best practice for SC4S. The exception to this is a narrow use case where the syslog server is exposed to untrusted clients on the internet, for example, with Palo Alto Cortex. - -## Considerations - -* UDP can only pass a load balancer using DNAT and source IP must be preserved. If you use this configuration, the load balancer becomes a new single point of failure. -* TCP/TLS can use either a DNAT configuration or SNAT with "PROXY" Protocol enabled `SC4S_SOURCE_PROXYCONNECT=yes`. -* TCP/TLS load balancers do not consider the weight of individual connection load and are frequently biased to one instance. Vertically scale all members in a single resource pool to accommodate the full workload. - -## Alternatives - -The best deployment model for high availability is a [Microk8s](https://microk8s.io/) based deployment with MetalLB in BGP mode. This model uses a special class of load balancer that is implemented as destination network translation. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 33d4d21ee7..6bcabedd08 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -71,7 +71,9 @@ nav: - Finetuning for TCP: "architecture/finetuning_for_tcp.md" - High Availability: - Recommendations: "architecture/ha.md" - - Load Balancers: "architecture/lb.md" + - Load Balancers: + - Recommendations: "architecture/lb.md" + - Nginx: "architecture/nginx.md" - SC4S Lite (Experimental): - Intro: "lite.md" From 6a3b5f9913f4db5b5181743f53a40cab07fa1b07 Mon Sep 17 00:00:00 2001 From: mstopa-splunk Date: Fri, 30 Aug 2024 12:54:42 +0000 Subject: [PATCH 3/4] Add a UDP DSR config --- docs/architecture/detect_and_troubleshoot.md | 2 +- docs/architecture/nginx.md | 94 ++++++++++++++++---- 2 files changed, 78 insertions(+), 18 deletions(-) diff --git a/docs/architecture/detect_and_troubleshoot.md b/docs/architecture/detect_and_troubleshoot.md index b881911b33..8469c313db 100644 --- a/docs/architecture/detect_and_troubleshoot.md +++ b/docs/architecture/detect_and_troubleshoot.md @@ -1 +1 @@ -`sudo tcpdump -n -s 0 -S -i any -A -v 'port 514 and (tcp or udp)` \ No newline at end of file +`sudo tcpdump -n -s 0 -S -i any -A -v 'port 514 and (tcp or udp)'` \ No newline at end of file diff --git a/docs/architecture/nginx.md b/docs/architecture/nginx.md index a5e5bb19f1..450ffb80c5 100644 --- a/docs/architecture/nginx.md +++ b/docs/architecture/nginx.md @@ -28,7 +28,7 @@ events { } ``` -## Configure Nginx with the PROXY protocol +## Option 1: Configure Nginx with the PROXY protocol Advantages: - easy to set up @@ -46,20 +46,20 @@ stream { # Default SC4S TCP ports are 514, 601, 5425, 6514 # Include also your custom ports upstream stream_syslog_514 { - server :514; - server :514; + server :514; + server :514; } upstream stream_syslog_601 { - server :601; - server :601; + server :601; + server :601; } upstream stream_syslog_5425 { - server :5425; - server :5425; + server :5425; + server :5425; } upstream stream_syslog_6514 { - server :6514; - server :6514; + server :6514; + server :6514; } # Define a common configuration block for all servers @@ -67,7 +67,6 @@ stream { 514 stream_syslog_514; 601 stream_syslog_601; 5425 stream_syslog_5425; - 6514 stream_syslog_6514; } # Define a virtual server for each upstream connection @@ -76,19 +75,24 @@ stream { listen 514; listen 601; listen 5425; - listen 6514; proxy_pass $upstream_name; proxy_timeout 3s; proxy_connect_timeout 3s; proxy_protocol on; + } - # Enable SSL only on port 6514 - ssl_preread on; - if ($server_port = 6514) { - proxy_ssl on; - } + server { + listen 6514; + proxy_pass stream_syslog_6514; + + proxy_timeout 3s; + proxy_connect_timeout 3s; + + proxy_protocol on; + + proxy_ssl on; } } ``` @@ -99,7 +103,7 @@ stream { SC4S_SOURCE_PROXYCONNECT=yes ``` -## Test your setup +### Test your setup 1. Send TCP/TLS messages to the load balancer and ensure that they are being correctly received in Splunk with the correct host IP: ```bash echo "hello world" | netcat 514 @@ -113,3 +117,59 @@ echo "hello world" | netcat 514 | Load Balancer | 5,996,000 (99,089.03 msg/sec) | 6,046,000 (99,917.23 msg/sec) | +## Option 2: Configure Nginx with DSR (Direct Server Return) +Advantages: +- works for UDP +- more efficient (saves one hop) + +Disadvantages: +- DSR setup requires active health checks, because LB cannot expect responses from the upstream. Active health checks are not available in Nginx open source. Switch to Nginx Plus or implement your own active health checking +- requires superuser privileges +- for cloud users might require disabling Source/Destination Checking (tested with AWS) + +1. In the main Nginx configuration update `user` to root, for example: +`/etc/nginx/nginx.conf` +```conf +user root; +``` + +2. Add a configuration similar to the following: +`/etc/nginx/modules-enabled/sc4s.conf` +```conf +stream { + # Define upstream for each of SC4S hosts and ports + # Default SC4S UDP port is 514 + # Include also your custom ports + upstream stream_syslog_514 { + server :514; + server :514; + } + + # Define connections to each of your upstreams. + # Make sure to include `proxy_bind` and `proxy_responses 0`. + server { + listen 514 udp; + proxy_pass stream_syslog_514; + + proxy_bind $remote_addr:$remote_port transparent; + proxy_responses 0; + } +} +``` + +3. Refer to Nginx documentation to find the command to reload the service, for example `sudo nginx -s reload`. + +4. Make sure to disable `Source/Destination Checking` if you work on AWS + +### Test your setup +1. Send UDP messages to the load balancer and ensure that they are being correctly received in Splunk with the correct host IP: +```bash +echo "hello world" > /dev/udp//514 +``` + +2. Run performance tests +| Receiver | Maximum EPS without drops | +|----------------|---------------------------| +| Server 1 | | +| Server 2 | | +| LB | | \ No newline at end of file From 31744f59fc13f05d7dd8cb4777a0a54f22231813 Mon Sep 17 00:00:00 2001 From: mstopa-splunk Date: Tue, 3 Sep 2024 11:35:34 +0000 Subject: [PATCH 4/4] Add performance test results for DSR --- docs/architecture/nginx.md | 45 ++++++++++++++++++++------------------ 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/docs/architecture/nginx.md b/docs/architecture/nginx.md index 450ffb80c5..ba431fdb3c 100644 --- a/docs/architecture/nginx.md +++ b/docs/architecture/nginx.md @@ -1,15 +1,10 @@ # Nginx -While using a bare Nginx load balancing is neither a recommended, nor a supported solution, it is still a "good enough" solution for some customers. +While load balancing syslog with NGINX Open Source is neither recommended, nor supported by Splunk, it is still a "good enough" solution for some customers. -It is a free and open source solution, well documented and with a big and active community of users. -The open source version of Nginx doesn't provide High Availability, so in fact an nginx LB becomes a new single point of failure. Even with the round-robin we also often observe bias in traffic distribution which results in overloading some of the instances in the pool. As the result customers report memory and disk issues, growing queues and delays in processing. - -## Preserving source IP -| Method | Protocol | -|----------------------------|------------| -| PROXY protocol | TCP/TLS | -| Transparent IP | TCP/TLS | -| Direct Server Return (DSR) | UDP | +Note the main disadvantages of Nginx Open Source: +- Due to no High Availability an Nginx LB becomes a new single point of failure. +- Even with the round-robin we also often observe bias in traffic distribution which results in overloading some of the instances in the pool. This results in growing queues, which lead to delays, data drops and memory and disk issues. +- Nginx Open Source doesn't provide active health checking, which is crucial for UDP DSR (Direct Server Return) load balancing. ## Install Nginx 1. Refer to Nginx documentation for instructions on installing Nginx **with the stream module**, which is necessary for TCP/UDP load balancing. For example on Ubuntu: @@ -28,6 +23,13 @@ events { } ``` +## Preserving source IP +| Method | Protocol | +|----------------------------|------------| +| PROXY protocol | TCP/TLS | +| Transparent IP | TCP/TLS | +| Direct Server Return (DSR) | UDP | + ## Option 1: Configure Nginx with the PROXY protocol Advantages: - easy to set up @@ -110,11 +112,10 @@ echo "hello world" | netcat 514 ``` 2. Run performance tests based on [Check TCP Performance](tcp_performance_tests.md) -| Receiver | Same Subnet | WAN | -|----------------|-------------------------------|--------------------------------| -| Server 1 | 4,410,000 (72,879.48 msg/sec) | 4,280,000 (70,726.90 msg/sec) | -| Server 2 | 4,341,000 (71,738.98 msg/sec) | 4,255,000 (70,316.86 msg/sec) | -| Load Balancer | 5,996,000 (99,089.03 msg/sec) | 6,046,000 (99,917.23 msg/sec) | +| Receiver | Performance | +|----------------------------|-------------------------------| +| Single SC4S Server | 4,341,000 (71,738.98 msg/sec) | +| Load Balancer + 2 Servers | 5,996,000 (99,089.03 msg/sec) | ## Option 2: Configure Nginx with DSR (Direct Server Return) @@ -159,7 +160,7 @@ stream { 3. Refer to Nginx documentation to find the command to reload the service, for example `sudo nginx -s reload`. -4. Make sure to disable `Source/Destination Checking` if you work on AWS +4. Make sure to disable `Source/Destination Checking` on your LB's host if you work on AWS ### Test your setup 1. Send UDP messages to the load balancer and ensure that they are being correctly received in Splunk with the correct host IP: @@ -168,8 +169,10 @@ echo "hello world" > /dev/udp//514 ``` 2. Run performance tests -| Receiver | Maximum EPS without drops | -|----------------|---------------------------| -| Server 1 | | -| Server 2 | | -| LB | | \ No newline at end of file + +| Receiver / Drops Rate for EPS (msgs/sec) | 4,500 | 9,000 | 27,000 | 50,000 | 150,000 | 300,000 | +|------------------------------------------|--------|--------|--------|--------|---------|---------| +| Single SC4S Server | 0.33% | 1.24% | 52.31% | 74.71% | -- | -- | +| Load Balancer + 2 Servers | 1% | 1.19% | 6.11% | 47.64% | -- | -- | +| Single Finetuned SC4S Server | 0% | 0% | 0% | 0% | 47.37% | -- | +| Load Balancer + 2 Finetuned Servers | 0.98% | 1.14% | 1.05% | 1.16% | 3.56% | 55.54% | \ No newline at end of file