Check for socket errors / socket status on recv as well as send #89

turtlesoupy · 2014-01-07T01:41:49Z

Hey,

I ran a "kill the broker" test for producers and got into a state where the topic metadata wasn't being refreshed because client._send_broker_aware_request does a conn.recv which doesn't propagate the correct exception upon socket timeout. I believe this is the correct behaviour below.

Integration tests still pass.

dpkp · 2014-01-08T16:44:37Z

we ran into the same problems in #88 and fixed in a couple different places (last was b4c20ac)

turtlesoupy · 2014-01-08T18:02:04Z

Doh, okay - should have checked the other issues first. I'll take a glance at that review and then close this after it is accepted.

When running on Linux with code on a case-insensitive file system, imports of the `Queue` module fail because python resolves the wrong file (It is trying to use a relative import of `queue.py` in the kafka directory). This change forces absolute imports via PEP328.

Previously, if you try to consume a message with a timeout greater than 10 seconds, but you don't receive data in those 10 seconds, a socket.timeout exception is raised. This allows a higher socket timeout to be set, or even None for no timeout.

According to the protocol documentation, the 4 byte integer at the beginning of a response represents the size of the payload only, not including those bytes. See http://goo.gl/rg5uom

…y time * Remove bufsize from client and conn, since they're not actually enforced Notes: This commit changes behavior a bit by raising a BufferUnderflowError when no data is received for the message size rather than a ConnectionError. Since bufsize in the socket is not actually enforced, but it is used by the consumer when creating requests, moving it there until a better solution is implemented.

…hContext

* Combine partition fetch requests into a single request * Put the messages received in a queue and update offsets * Grab as many messages from the queue as requested * When the queue is empty, request more * timeout param for get_messages() is the actual timeout for getting those messages * Based on #74 - don't increase min_bytes if the consumer fetch buffer size is too small. Notes: Change MultiProcessConsumer and _mp_consume() accordingly. Previously, when querying each partition separately, it was possible to block waiting for messages on partition 0 even if there are new ones in partition 1. These changes allow us to block while waiting for messages on all partitions, and reduce total number of kafka requests. Use Queue.Queue for single proc Queue instead of already imported multiprocessing.Queue because the latter doesn't seem to guarantee immediate availability of items after a put: >>> from multiprocessing import Queue >>> q = Queue() >>> q.put(1); q.get_nowait() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 152, in get_nowait return self.get(False) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 134, in get raise Empty Queue.Empty

to block forever if it's reached.

… iterator to exit when reached. Also put constant timeout values in pre-defined constants

Will remove once any error handling issues are resolved.

This is pretty much a rewrite. The tests that involve offset requests/responses are not implemented since that API is not supported in kafka 0.8 yet. Only kafka.codec and kafka.protocol are currently tested, so there is more work to be done here.

We always store the offset of the next available message, so we shouldn't decrement the offset deltas when seeking by an extra 1

…data Also, log.exception() is unhelpfully noisy. Use log.error() with some error details in the message instead.

This differentiates between errors that occur when sending the request and receiving the response, and adds BufferUnderflowError handling.

…tch size is too small Note: This can cause fetching a message to exceed a given timeout, but timeouts are not guaranteed anyways, and in this case it's the client's fault for not sending a big enough buffer size rather than the kafka server. This can be bad if max_fetch_size is None (no limit) and there is some message in Kafka that is crazy huge, but that is why we should have some max_fetch_size.

…ed in integration tests If some of the tests stop brokers then error out, the teardown method will try to close the same brokers and fail. This change allows it to continue.

This is better since the tests stop/start brokers, and if something goes wrong they can affect eachother.

…r debugging

* If the connection is dirty, reinit * If we get a BufferUnderflowError, the server could have gone away, so mark it dirty

Both errors are handled the same way when raised and caught, so this makes sense.

turtlesoupy · 2014-01-14T20:40:56Z

Closing since upstream has the fix.

turtlesoupy added 2 commits January 6, 2014 17:37

Check for socket status on read as well as send

410567e

Propagate error immediately if dirty

83b7adc

Joe Crobak and others added 26 commits January 13, 2014 14:19

Allow customizing socket timeouts.

6c1fbaf

Previously, if you try to consume a message with a timeout greater than 10 seconds, but you don't receive data in those 10 seconds, a socket.timeout exception is raised. This allows a higher socket timeout to be set, or even None for no timeout.

Read the correct number of bytes from kafka.

2937028

According to the protocol documentation, the 4 byte integer at the beginning of a response represents the size of the payload only, not including those bytes. See http://goo.gl/rg5uom

Allow None timeout in FetchContext even if block is False

c1877ad

Reset consumer fields to original values rather than defaults in Fetc…

2dd78f2

…hContext

Remove SimpleConsumer queue size limit since it can cause the iterator

781d6aa

to block forever if it's reached.

Add buffer_size param description to docstring

8570457

Add iter_timeout option to SimpleConsumer. If not None, it causes the…

6840af8

… iterator to exit when reached. Also put constant timeout values in pre-defined constants

Add comments and maintain 80 character line limit

3b8f445

Add and fix comments to protocol.py

a303105

Add note about questionable error handling while decoding messages.

a31156e

Will remove once any error handling issues are resolved.

Fix unit tests.

39d76bb

This is pretty much a rewrite. The tests that involve offset requests/responses are not implemented since that API is not supported in kafka 0.8 yet. Only kafka.codec and kafka.protocol are currently tested, so there is more work to be done here.

Style fix for imports

8009bb0

Fix seek offset deltas

b7ee169

We always store the offset of the next available message, so we shouldn't decrement the offset deltas when seeking by an extra 1

Raise a ConnectionError when a socket.error is raised when receiving …

2724043

…data Also, log.exception() is unhelpfully noisy. Use log.error() with some error details in the message instead.

Fix client error handling

34da78b

This differentiates between errors that occur when sending the request and receiving the response, and adds BufferUnderflowError handling.

Handle starting/stopping Kafka brokers that are already started/stopp…

9802393

…ed in integration tests If some of the tests stop brokers then error out, the teardown method will try to close the same brokers and fail. This change allows it to continue.

Remove unnecessary brackets

1e3f24b

Fix client and consumer params in integration tests

339650f

Add tests for limited and unlimited consumer max_buffer_size

fcf23c4

Make kafka brokers per-test in failover integration tests

ac411bf

This is better since the tests stop/start brokers, and if something goes wrong they can affect eachother.

Add object type and ID to message prefix in fixtures output for easie…

7a999a1

…r debugging

Use the same timeout when reinitializing a connection

a61990b

rdiomar and others added 8 commits January 13, 2014 14:19

Handle dirty flag in conn.recv()

c33c0d5

* If the connection is dirty, reinit * If we get a BufferUnderflowError, the server could have gone away, so mark it dirty

Remove unnecessary method

86770b8

Skip snappy/gzip tests if they're not available

4afe2dd

Some cleanup and easier to read test fixture output

d4d7981

Change BufferUnderflowError to ConnectionError in conn._read_bytes()

a2cfc70

Both errors are handled the same way when raised and caught, so this makes sense.

Change log.error() back to log.exception()

2b4451c

Check for socket status on read as well as send

e303b37

Propagate error immediately if dirty

a6c25fd

turtlesoupy closed this Jan 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check for socket errors / socket status on recv as well as send #89

Check for socket errors / socket status on recv as well as send #89

Uh oh!

turtlesoupy commented Jan 7, 2014

Uh oh!

dpkp commented Jan 8, 2014

Uh oh!

turtlesoupy commented Jan 8, 2014

Uh oh!

turtlesoupy commented Jan 14, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Check for socket errors / socket status on recv as well as send #89

Check for socket errors / socket status on recv as well as send #89

Uh oh!

Conversation

turtlesoupy commented Jan 7, 2014

Uh oh!

dpkp commented Jan 8, 2014

Uh oh!

turtlesoupy commented Jan 8, 2014

Uh oh!

turtlesoupy commented Jan 14, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants