Skip to content

Conversation

@steveloughran
Copy link
Contributor

HADOOP-13327 Output Stream Specification

Change-Id: I1b6bc258a40a8bd57879d9edc3e5bb1303f0fff2

@steveloughran
Copy link
Contributor Author

HDFS test have run out of memory


[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 64.999 s <<< FAILURE! - in org.apache.hadoop.TestRefreshCallQueue
[ERROR] testRefreshCallQueueWithFairCallQueue(org.apache.hadoop.TestRefreshCallQueue)  Time elapsed: 4.751 s  <<< ERROR!
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:717)
	at org.apache.hadoop.fs.FileSystem$Statistics.<clinit>(FileSystem.java:3707)
	at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:4175)
	at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:295)
	at org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:99)
	at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3324)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:136)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3373)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3341)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:491)
	at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:447)
	at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:159)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2785)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2698)
	at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1697)
	at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:913)

@steveloughran
Copy link
Contributor Author

I consider that to be an HDFS problem: not mine. This test is ready for review

Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first pass of the review. I haven't run any integration tests yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if the file is still open for writing, correct? Is it possible to tell if a given path is open for writing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no way, I'm afraid.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...is not a valid algorithm when working with object stores.

So shouldn't this be disallowed as the feature is not supported?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can lock and then throw. Should it be inside the try so the finally block can unlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this all to a separate patch, eventually

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the overwrite== above in the document...

Suggested change
1. The check for existing data in a `create()` call with `overwrite=False`, may
1. The check for existing data in a `create()` call with `overwrite==false`, may

@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec branch from 13dc093 to 8947901 Compare March 22, 2019 22:49
@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec branch from 8947901 to 455203f Compare June 7, 2019 12:02
@steveloughran
Copy link
Contributor Author

just rebased to trunk; not tested. I see there's comments I've got to address -will do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@apache apache deleted a comment from hadoop-yetus Sep 3, 2019
shanthoosh pushed a commit to shanthoosh/hadoop that referenced this pull request Oct 15, 2019
…vro schema types

Author: Aditya Toomula <[email protected]>

Reviewers: Srinivasulu Punuru <[email protected]>

Closes apache#575 from atoomula/bytes1 and squashes the following commits:

855a03d7 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
df4886d8 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
80268fc1 [Aditya Toomula] Fix SamzaSql AvroRelConverter to handle BYTES and FIXED avro schema types
steveloughran and others added 2 commits October 22, 2019 14:22
Change-Id: I1b6bc258a40a8bd57879d9edc3e5bb1303f0fff2
* Address code-side comments from reviewers; markdown is still WiP.
* move all inner stream propagation of hasCapability() tests to the StoreImplementationUtils operation.

Change-Id: Ic5be4ac09ddd36b8804eacfee5ad0587a819eaf0
@steveloughran steveloughran force-pushed the filesystem/HADOOP-13327-outputstream-spec branch from 455203f to 8a2154d Compare October 22, 2019 13:22
@steveloughran
Copy link
Contributor Author

Had some time to look at the comments here after too long break; more work is needed

Output stream spec

Needs more review of the comments, especiallys Sean's

Stream Model + impl.

I now suspect that I need to do a bit more resilience in the failure handling of S3ABlockOutputStream.

Specifically: if 1+ POST has failed in a multipart write, we should make a best effort attempt to abort the upload. Admittedly, the same network or authentication errors which cause the post to fail are likely to cause the abort operation to fail too -that should not stop us up from at least trying.

Proposed: close() will attempt to abort the upload if a failure occurred earlier.

This is going to be fun to test. Indeed, it's a good argument for WriteOperationHelper to be converted to an interface with a mocking implementation alongside the production one. I might play with some subclassing instead;

Plan

I think these two bits of work can be separated; the S3ABlockOutput stream model/locking/aborting is complex enough to merit isolation.

Will follow up with that in a separate PR while retaining this purely for the output stream spec and tests.

This one doesn't apply to trunk no more; once I have gone through the comments I may produce a new PR for the output stream changes too.

Change-Id: I92788063536a4bf15ee28f17187c7e09fa09940f
private IOException exception;

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

}

/**
* Create for a path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

/**
* Create for a path.
* @param path optional path for exception messages.
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace:end of line

@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@apache apache deleted a comment from hadoop-yetus Nov 5, 2019
@steveloughran
Copy link
Contributor Author

I'm closing this and opening up a peer without any of the S3A changes, #1694. It currently addresses most of the comments of people's reviews, excluding Sean's request about restructuring the model

Could this be in a section about visibility and not in the model definition? Maybe later. "here's the model, here's how that model works with creation, here's how it works when reading/writing" flows much better and visibility should be in that third part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants