Skip to content

Conversation

@andrewvc
Copy link
Contributor

@andrewvc andrewvc commented Aug 23, 2016

This is a WIP. This commit has added locking, but I may have missed some locks in a few spots, so it needs another pass before its really ready.

This provides a nice boost:

Before:

time bin/logstash -e "input { generator { count => 3000000} } filter {  } output { file { path => '/tmp/newfileout'} }"
Settings: Default pipeline workers: 8
Pipeline main started
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}
      139.95 real       223.61 user        28.93 sys

After

rm /tmp/newfileout; time bin/logstash -e "input { generator { count => 3000000} } filter {  } output { file { codec => json_lines path => '/tmp/newfileout'} }" ; ls -lh /tmp/newfileout
Settings: Default pipeline workers: 8
Pipeline main started
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}
       56.12 real       192.99 user        17.38 sys


private
def write_event(event, data)
def write_event(event)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be renamed

@jsvd
Copy link
Member

jsvd commented Aug 24, 2016

my first results:

With logstash 5.0.0-alpha6:

/tmp/logstash-5.0.0-alpha6 % time bin/logstash -e "input { generator { count => 3000000} } filter {  } output { file { path => '/tmp/newfileout'} }" && wc -l /tmp/newfileout
Pipeline main started
Pipeline main has been shutdown
stopping pipeline {:id=>"main", :level=>:warn}
bin/logstash -e   165.91s user 26.94s system 230% cpu 1:23.64 total
 3000000 /tmp/newfileout

Logstash master + logstash-output-file w/ this PR:

~/projects/logstash (git)-[master] % time bin/logstash -e "input { generator { count => 3000000} } filter {  } output { file { path => '/tmp/newfileout_master'} }" && wc -l /tmp/newfileout_master
Pipeline main started
Pipeline main has been shutdown
stopping pipeline {:id=>"main", :level=>:warn}
bin/logstash -e   150.86s user 17.39s system 273% cpu 1:01.57 total
 2999898 /tmp/newfileout_master

I've tried this again:

2999899 /tmp/newfileout_master

@jsvd
Copy link
Member

jsvd commented Aug 24, 2016

Another error which I only got once I got more than once:

~/projects/logstash (git)-[master] % time bin/logstash -b 20 -e "input { generator { count => 3000000} } filter {  } output { file { path => '/tmp/newfileout_master'} }" && wc -l /tmp/newfileout_master
--- jar coordinate com.fasterxml.jackson.core:jackson-annotations already loaded with version 2.7.3 - omit version 2.7.0
Pipeline main started
An unexpected error occurred! {:error=>#<NameError: undefined method `write' for class `NilClass'>, :backtrace=>["org/jruby/RubyKernel.java:2148:in `method'", "/Users/joaoduarte/new_plugins/logstash-output-file/lib/logstash/outputs/file.rb:142:in `multi_receive_encoded'", "org/jruby/RubyHash.java:1342:in `each'", "/Users/joaoduarte/new_plugins/logstash-output-file/lib/logstash/outputs/file.rb:140:in `multi_receive_encoded'", "/Users/joaoduarte/projects/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in `multi_receive'", "/Users/joaoduarte/projects/logstash/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:11:in `multi_receive'", "/Users/joaoduarte/projects/logstash/logstash-core/lib/logstash/output_delegator.rb:50:in `multi_receive'", "/Users/joaoduarte/projects/logstash/logstash-core/lib/logstash/pipeline.rb:302:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/Users/joaoduarte/projects/logstash/logstash-core/lib/logstash/pipeline.rb:301:in `output_batch'", "/Users/joaoduarte/projects/logstash/logstash-core/lib/logstash/pipeline.rb:252:in `worker_loop'", "/Users/joaoduarte/projects/logstash/logstash-core/lib/logstash/pipeline.rb:225:in `start_workers'"], :level=>:fatal}
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'
2016-08-24 14:29:01 +0100: Listen loop error: #<Errno::EBADF: Bad file descriptor - Bad file descriptor>
org/jruby/RubyIO.java:3705:in `select'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:322:in `handle_servers'
/Users/joaoduarte/projects/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/server.rb:296:in `run'bin/logstash -b 20 -e   57.31s user 3.50s system 301% cpu 20.166 total

@ph
Copy link
Contributor

ph commented Aug 24, 2016

@jsvd the error your reporting is strange to say the least, the first part of the message originate from the file output but the remaining part are from puma.

@jsvd
Copy link
Member

jsvd commented Aug 24, 2016

@ph I just wanted to post the whole error stack, but the puma one is happening frequently on logstash exit, I'm going to open an issue

@ph
Copy link
Contributor

ph commented Aug 24, 2016

@jsvd I am gonna take a look, we rewrote the puma integration not too long ago, I think we might be missing a rescue on close.

@jsvd
Copy link
Member

jsvd commented Aug 24, 2016

I created an issue for the puma errors in elastic/logstash#5822

@andrewvc andrewvc force-pushed the concurrent-and-sync branch 2 times, most recently from 6ab2a13 to eece460 Compare August 26, 2016 13:11
describe "ship lots of events to a file gzipped" do
Stud::Temporary.file('logstash-spec-output-file') do |tmp_file|
event_count = 10000 + rand(500)
event_count = 100000 + rand(500)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did catch some locking bugs here by upping this count to be a longer test.

@andrewvc
Copy link
Contributor Author

@jsvd I've fixed all the issues here as well as made the locking a bit more coarse. I was also able to simplify the code around a simple Mutex rather than a complex ReentrantReadWriteLock

@andrewvc andrewvc changed the title [WIP] Use concurrency :shared and sync codecs to optimize performance Use concurrency :shared and sync codecs to optimize performance Aug 26, 2016
@andrewvc andrewvc force-pushed the concurrent-and-sync branch from eece460 to 6c96b1d Compare August 26, 2016 14:08
As part of this refactor the `message_format` option had to be (finally)
obsoleted.It was previously deprecated for quite some time.

This provides a nice boost:

Before:

```
time bin/logstash -e "input { generator { count => 3000000} } filter {  } output { file { path => '/tmp/newfileout'} }"
Settings: Default pipeline workers: 8
Pipeline main started
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}
      139.95 real       223.61 user        28.93 sys
```

After

```
rm /tmp/newfileout; time bin/logstash -e "input { generator { count => 3000000} } filter {  } output { file { codec => json_lines path => '/tmp/newfileout'} }" ; ls -lh /tmp/newfileout
Settings: Default pipeline workers: 8
Pipeline main started
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}
       56.12 real       192.99 user        17.38 sys
```
@andrewvc andrewvc force-pushed the concurrent-and-sync branch from 6c96b1d to ee244b7 Compare August 26, 2016 17:01
@andrewvc
Copy link
Contributor Author

@jsvd I've bumped the plugin API so it will require LS 5.0+ to properly handle the obsolescence.

@jsvd
Copy link
Member

jsvd commented Aug 26, 2016

Tests pass, manual tests also good, nice work
LGTM

@andrewvc
Copy link
Contributor Author

Thanks @jsvd !

@elasticsearch-bot
Copy link

Andrew Cholakian merged this into the following branches!

Branch Commits
master 4e8ca37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants