Skip to content

Conversation

@bilaharith
Copy link
Contributor

This is a draft PR not ready fo review.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@steveloughran
Copy link
Contributor

we should talk about this in 2021. For now

I think it makes sense to have an over all "optimise abfs incremental listings" JIRA and create issues underneath, as a lot is unified.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.


private final Path path;
private final AzureBlobFileSystemStore abfsStore;
private final Queue<ListIterator<FileStatus>> iteratorsQueue =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayBlockingQueue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public ListStatusRemoteIterator(final Path path,
final AzureBlobFileSystemStore abfsStore) throws IOException {
this.path = path;
this.abfsStore = abfsStore;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioExcetion and currentIterator to null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@steveloughran
Copy link
Contributor

I need you to use org.apache.hadoop.util.functional.RemoteIterators as the wrapper iterators. These are only in trunk but will be backported with the rest of HADOOP-16380 after a few days of stabilisation.

These iterators propagate the IOStatisticsSource interface, so when the innermost iterator collects cost/count of list calls, the stats will be visible to and collectable by callers.

continuation = getIsNamespaceEnabled()
? generateContinuationTokenForXns(startFrom)
: generateContinuationTokenForNonXns(relativePath, startFrom);
if (continuation == null || continuation.length() < 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuation.isEmpty()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

updateCurrentIterator();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are here because the thread already got interrupted ? why call interrupt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@snvijaya snvijaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the comments

@steveloughran steveloughran added enhancement fs/azure changes related to azure; submitter must declare test endpoint labels Jan 13, 2021
@steveloughran
Copy link
Contributor

  1. What is is the JIRA ID?
  2. As discussed, you need an uber-JIRA to cover the whole set of list optimisations you can do, including an overall goal.

I would recommend something like "ABFS listing to support asynchronous prefetch and optimise for incremental listing of large directories and deep/wide directory trees"

That is: if that hurts performance of listing empty directories, or calling the listX calls against files, that is acceptable.

@apache apache deleted a comment from hadoop-yetus Jan 13, 2021
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear you are the same mistake we did in the S3A codebase: giving all the helper classes a reference back to the ABFS Store class, so making them too intermingled. It is unsustainable. See: https://github.com/steveloughran/engineering-proposals/blob/trunk/refactoring-s3a.md

Proposed

  • Use a specific listing callback which the store can directly/indirectly implement: org.apache.hadoop.fs.s3a.impl.ListingOperationCallbacks.
  • This will also help you write unit tests against a stub implementation without having to use reflection and manipulating access modifiers in a way that is over-complex and brittle.
  • Use IOStatistics API to track duration of calls
  • and serve this up. It is really interesting to know how long lists take
  • add a close() operator to cancel queue & wait for completion.

the other incremental listX calls (listFileStatus, listLocatedStatus) are all similar and should be done in the same uber-JIRA. listLocatedStatus is used in LocatedFileStatusFetcher, which is used during MR and spark file scanning. listFileStatus should be used more for deep tree scans.

I'd also prefer if you use a shared thread pool of that ABFS store instance

  • stops many, many list iterators overloading things
  • may offer faster startup
  • custom thread names help identify origin of less-helpful stack traces
  • could have a gauge on the instance to measure pool load
  • filesystem.close() can interrupt the pool to shut things down.

Have a look @ org.apache.hadoop.fs.s3a.Listing in trunk to see what's been done there. There's a lot you don't need (S3Guard reconciliation), and it currently only prefetches the next page of results. But it does include the stats collection, a restricted callback to the FS, use of org.apache.hadoop.util.functional.RemoteIterators for functional-programming style use of RemoteIterator classes. Take a look at it's {{org.apache.hadoop.util.functional.RemoteIterators#foreach}} method too.

I know, I'm giving extra homework. But I'll only keep asking for these, so better to start now.

}

private void fetchBatchesAsync() {
CompletableFuture.runAsync(() -> asyncOp());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should only be scheduled if there isn't one already in progress

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bilaharith bilaharith marked this pull request as draft January 14, 2021 18:18
@bilaharith
Copy link
Contributor Author

I fear you are the same mistake we did in the S3A codebase: giving all the helper classes a reference back to the ABFS Store class, so making them too intermingled. It is unsustainable. See: https://github.com/steveloughran/engineering-proposals/blob/trunk/refactoring-s3a.md

Proposed

  • Use a specific listing callback which the store can directly/indirectly implement: org.apache.hadoop.fs.s3a.impl.ListingOperationCallbacks.
  • This will also help you write unit tests against a stub implementation without having to use reflection and manipulating access modifiers in a way that is over-complex and brittle.
  • Use IOStatistics API to track duration of calls
  • and serve this up. It is really interesting to know how long lists take
  • add a close() operator to cancel queue & wait for completion.

the other incremental listX calls (listFileStatus, listLocatedStatus) are all similar and should be done in the same uber-JIRA. listLocatedStatus is used in LocatedFileStatusFetcher, which is used during MR and spark file scanning. listFileStatus should be used more for deep tree scans.

I'd also prefer if you use a shared thread pool of that ABFS store instance

  • stops many, many list iterators overloading things
  • may offer faster startup
  • custom thread names help identify origin of less-helpful stack traces
  • could have a gauge on the instance to measure pool load
  • filesystem.close() can interrupt the pool to shut things down.

Have a look @ org.apache.hadoop.fs.s3a.Listing in trunk to see what's been done there. There's a lot you don't need (S3Guard reconciliation), and it currently only prefetches the next page of results. But it does include the stats collection, a restricted callback to the FS, use of org.apache.hadoop.util.functional.RemoteIterators for functional-programming style use of RemoteIterator classes. Take a look at it's {{org.apache.hadoop.util.functional.RemoteIterators#foreach}} method too.

I know, I'm giving extra homework. But I'll only keep asking for these, so better to start now.

I will raise the JIRA as suggested. For now keeping the PR as a draft.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 30m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 29s trunk passed
+1 💚 compile 0m 38s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 checkstyle 0m 27s trunk passed
+1 💚 mvnsite 0m 39s trunk passed
+1 💚 shadedclient 16m 28s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javadoc 0m 29s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+0 🆗 spotbugs 0m 59s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 57s trunk passed
-0 ⚠️ patch 1m 16s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
+1 💚 javac 0m 29s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
+1 💚 javac 0m 26s the patch passed
-0 ⚠️ checkstyle 0m 16s /diff-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 5 new + 4 unchanged - 0 fixed = 9 total (was 4)
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 14m 55s patch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 0m 27s /diff-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 generated 3 new + 15 unchanged - 2 fixed = 18 total (was 17)
-1 ❌ javadoc 0m 25s /diff-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu118.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu118.04-b01 generated 3 new + 15 unchanged - 2 fixed = 18 total (was 17)
-1 ❌ findbugs 1m 1s /new-findbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 1m 31s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
108m 12s
Reason Tests
FindBugs module:hadoop-tools/hadoop-azure
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsListStatusRemoteIterator.ioException; locked 66% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:66% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:[line 128]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/8/artifact/out/Dockerfile
GITHUB PR #2548
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux be81f3ee1b30 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 630f8dd
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/8/testReport/
Max. process+thread count 546 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/8/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus

This comment has been minimized.

@hadoop-yetus

This comment has been minimized.

@bilaharith
Copy link
Contributor Author

Hi @steveloughran
I have created a separate JIRA for IOStatistics collection and linked the same with HADOOP-17475. We will be picking the same afterwards.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very close to getting in. other than some minor checkstyle/javadoc complaints, that findbugs needs to be made to stop complaining. Usually it is indicating a real risk of a problem, so I'd like to see what can be done about it -rather than just edit the XML file to turn off the check

continuation = listingSupport
.listStatus(fileStatus.getPath(), null, fileStatuses, FETCH_ALL_FALSE,
continuation);
if(!fileStatuses.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* "/folder/hfile" and "/folder/ifile".
* @return the entries in the path start from "startFrom" in lexical order.
*/
@InterfaceStability.Unstable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add that at the actual interface, along with the @Private . Not that we'd expect anyone to use it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (Future<Void> task : tasks) {
task.get();
}
es.shutdownNow();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in a finally block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bilaharith
Copy link
Contributor Author

very close to getting in. other than some minor checkstyle/javadoc complaints, that findbugs needs to be made to stop complaining. Usually it is indicating a real risk of a problem, so I'd like to see what can be done about it -rather than just edit the XML file to turn off the check

There is one one findbugs warning remaining which is of medium priority. The same can be ignored since at line 147 the continuation token returned by the ListingOperation ouside the synchroniced lock since the same involve an http call.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 2s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 10s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 30s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 0m 27s trunk passed
+1 💚 mvnsite 0m 37s trunk passed
+1 💚 shadedclient 14m 19s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 29s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 0m 59s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 58s trunk passed
-0 ⚠️ patch 1m 16s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 28s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 0m 25s the patch passed
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 28s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 shadedclient 12m 46s patch has no errors when building and testing our client artifacts.
-1 ❌ javadoc 0m 26s /diff-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04.txt hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 generated 3 new + 15 unchanged - 2 fixed = 18 total (was 17)
-1 ❌ javadoc 0m 25s /diff-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01.txt hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu120.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu120.04-b01 generated 3 new + 15 unchanged - 2 fixed = 18 total (was 17)
-1 ❌ findbugs 1m 0s /new-findbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 1m 55s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
78m 53s
Reason Tests
FindBugs module:hadoop-tools/hadoop-azure
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsListStatusRemoteIterator.continuation; locked 50% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:50% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:[line 147]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/17/artifact/out/Dockerfile
GITHUB PR #2548
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 4d0b871b529a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 18978f2
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/17/testReport/
Max. process+thread count 536 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/17/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Feb 2, 2021
@apache apache deleted a comment from hadoop-yetus Feb 2, 2021
@apache apache deleted a comment from hadoop-yetus Feb 2, 2021
@steveloughran
Copy link
Contributor

There is one one findbugs warning remaining which is of medium priority. The same can be ignored since at line 147 the continuation token returned by the ListingOperation ouside the synchroniced lock since the same involve an http call.

well, it needs to be dealt with either by fixing or findbugs.xml. Define "medium priority" here

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 6m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 28s trunk passed
+1 💚 compile 0m 39s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 29s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 shadedclient 14m 6s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 30s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 1m 8s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 6s trunk passed
-0 ⚠️ patch 1m 25s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 0m 26s the patch passed
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 12m 36s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 28s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 generated 0 new + 15 unchanged - 2 fixed = 15 total (was 17)
+1 💚 javadoc 0m 24s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu120.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu120.04-b01 generated 0 new + 15 unchanged - 2 fixed = 15 total (was 17)
-1 ❌ findbugs 1m 7s /new-findbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 1m 56s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
79m 34s
Reason Tests
FindBugs module:hadoop-tools/hadoop-azure
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsListStatusRemoteIterator.continuation; locked 50% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:50% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:[line 147]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/18/artifact/out/Dockerfile
GITHUB PR #2548
Optional Tests dupname asflicense xml compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux c0cbb8a0f34f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 394b9f7
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/18/testReport/
Max. process+thread count 664 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/18/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 2m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 37s trunk passed
+1 💚 compile 0m 37s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 38s trunk passed
+1 💚 shadedclient 14m 6s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 37s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 0m 59s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 57s trunk passed
-0 ⚠️ patch 1m 15s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 0m 27s the patch passed
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 12m 48s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 26s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 generated 0 new + 15 unchanged - 2 fixed = 15 total (was 17)
+1 💚 javadoc 0m 23s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu120.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu120.04-b01 generated 0 new + 15 unchanged - 2 fixed = 15 total (was 17)
-1 ❌ findbugs 1m 4s /new-findbugs-hadoop-tools_hadoop-azure.html hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 2m 0s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
76m 16s
Reason Tests
FindBugs module:hadoop-tools/hadoop-azure
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsListStatusRemoteIterator.continuation; locked 50% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:50% of time Unsynchronized access at AbfsListStatusRemoteIterator.java:[line 147]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/19/artifact/out/Dockerfile
GITHUB PR #2548
Optional Tests dupname asflicense xml compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 58c7d8478c29 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 394b9f7
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/19/testReport/
Max. process+thread count 616 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/19/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 5s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 51s trunk passed
+1 💚 compile 0m 39s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 35s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 checkstyle 0m 29s trunk passed
+1 💚 mvnsite 0m 40s trunk passed
+1 💚 shadedclient 14m 6s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 31s trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 30s trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+0 🆗 spotbugs 0m 59s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 56s trunk passed
-0 ⚠️ patch 1m 15s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 29s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
+1 💚 javac 0m 26s the patch passed
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 12m 46s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 27s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 generated 0 new + 15 unchanged - 2 fixed = 15 total (was 17)
+1 💚 javadoc 0m 25s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu120.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu120.04-b01 generated 0 new + 15 unchanged - 2 fixed = 15 total (was 17)
+1 💚 findbugs 0m 58s the patch passed
_ Other Tests _
+1 💚 unit 1m 56s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
73m 52s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/20/artifact/out/Dockerfile
GITHUB PR #2548
Optional Tests dupname asflicense xml compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname Linux 2471c0bd7764 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 26b9d48
Default Java Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/20/testReport/
Max. process+thread count 723 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2548/20/console
versions git=2.25.1 maven=3.6.3 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@bilaharith
Copy link
Contributor Author

There is one one findbugs warning remaining which is of medium priority. The same can be ignored since at line 147 the continuation token returned by the ListingOperation ouside the synchroniced lock since the same involve an http call.

well, it needs to be dealt with either by fixing or findbugs.xml. Define "medium priority" here

Medium priority was a terminology used within findbugs report.
The member variable continuation is returned from an external http call. Keeping this outside synchronized block since the same is costly.
The same is excluded in the findbugs.xml file

@steveloughran
Copy link
Contributor

that's fine: key is "findbugs must be happy, sometimes it overreacts -but you need to justify its complaints. Sometimes it's a losing battler with the sync stuff; I had that with the MeanStatistic stuff in IOStatistics and just made everything sync even when it was overkill

@steveloughran
Copy link
Contributor

+1, Merging

@steveloughran steveloughran merged commit 5f34271 into apache:trunk Feb 4, 2021
@steveloughran
Copy link
Contributor

merged to trunk; cp'd to branch-3.3 and will push up if a recompile works. I am not testing that ---can you do that and if there are problems we can do a followup?

asfgit pushed a commit that referenced this pull request Feb 4, 2021
The ABFS connector now implements listStatusIterator() with
asynchronous prefetching of the next page(s) of results.
For listing large directories this can provide tangible speedups.

If for any reason this needs to be disabled, set
fs.azure.enable.abfslistiterator to false.

Contributed by Bilahari T H.

Change-Id: Ic9a52b80df1d0ffed4c81beae92c136e2a12698c
@bilaharith
Copy link
Contributor Author

merged to trunk; cp'd to branch-3.3 and will push up if a recompile works. I am not testing that ---can you do that and if there are problems we can do a followup?

Sure

jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…tor (apache#2548)

The ABFS connector now implements listStatusIterator() with
asynchronous prefetching of the next page(s) of results.
For listing large directories this can provide tangible speedups.

If for any reason this needs to be disabled, set
fs.azure.enable.abfslistiterator to false.

Contributed by Bilahari T H.

Change-Id: Ic9a52b80df1d0ffed4c81beae92c136e2a12698c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement fs/azure changes related to azure; submitter must declare test endpoint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants