-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-32130][SQL][FOLLOWUP] Enable timestamps inference in JsonBenchmark #28981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
MaxGekk
wants to merge
3
commits into
apache:master
from
MaxGekk:json-inferTimestamps-disable-by-default-followup
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106 | |
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| No encoding 69219 69342 116 1.4 692.2 1.0X | ||
| UTF-8 is set 143950 143986 55 0.7 1439.5 0.5X | ||
| No encoding 73307 73400 141 1.4 733.1 1.0X | ||
| UTF-8 is set 143834 143925 152 0.7 1438.3 0.5X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| No encoding 57828 57913 136 1.7 578.3 1.0X | ||
| UTF-8 is set 83649 83711 60 1.2 836.5 0.7X | ||
| No encoding 50894 51065 292 2.0 508.9 1.0X | ||
| UTF-8 is set 98462 99455 1173 1.0 984.6 0.5X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| No encoding 64560 65193 1023 0.2 6456.0 1.0X | ||
| UTF-8 is set 102925 103174 216 0.1 10292.5 0.6X | ||
| No encoding 64011 64969 1001 0.2 6401.1 1.0X | ||
| UTF-8 is set 102757 102984 311 0.1 10275.7 0.6X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| No encoding 131002 132316 1160 0.0 262003.1 1.0X | ||
| UTF-8 is set 152128 152371 332 0.0 304256.5 0.9X | ||
| No encoding 132559 133561 1010 0.0 265117.3 1.0X | ||
| UTF-8 is set 151458 152129 611 0.0 302915.4 0.9X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| Select 10 columns 19376 19514 160 0.5 1937.6 1.0X | ||
| Select 1 column 24089 24156 58 0.4 2408.9 0.8X | ||
| Select 10 columns 21148 21202 87 0.5 2114.8 1.0X | ||
| Select 1 column 24701 24724 21 0.4 2470.1 0.9X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| Short column without encoding 8131 8219 103 1.2 813.1 1.0X | ||
| Short column with UTF-8 13464 13508 44 0.7 1346.4 0.6X | ||
| Wide column without encoding 108012 108598 914 0.1 10801.2 0.1X | ||
| Wide column with UTF-8 150988 151369 412 0.1 15098.8 0.1X | ||
| Short column without encoding 6945 6998 59 1.4 694.5 1.0X | ||
| Short column with UTF-8 11510 11569 51 0.9 1151.0 0.6X | ||
| Wide column without encoding 95004 95795 790 0.1 9500.4 0.1X | ||
| Wide column with UTF-8 149223 149409 276 0.1 14922.3 0.0X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| Text read 753 765 18 13.3 75.3 1.0X | ||
| from_json 23182 23446 230 0.4 2318.2 0.0X | ||
| json_tuple 31129 31304 181 0.3 3112.9 0.0X | ||
| get_json_object 22821 23073 225 0.4 2282.1 0.0X | ||
| Text read 649 652 3 15.4 64.9 1.0X | ||
| from_json 22284 22393 99 0.4 2228.4 0.0X | ||
| json_tuple 32310 32824 484 0.3 3231.0 0.0X | ||
| get_json_object 22111 22751 568 0.5 2211.1 0.0X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| Text read 3078 3101 26 16.2 61.6 1.0X | ||
| schema inferring 30225 30434 333 1.7 604.5 0.1X | ||
| parsing 32237 32308 63 1.6 644.7 0.1X | ||
| Text read 2894 2903 8 17.3 57.9 1.0X | ||
| schema inferring 26724 26785 62 1.9 534.5 0.1X | ||
| parsing 37502 37632 131 1.3 750.0 0.1X | ||
|
|
||
| Preparing data for benchmarking ... | ||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| Text read 10835 10900 86 4.6 216.7 1.0X | ||
| Schema inferring 37720 37805 110 1.3 754.4 0.3X | ||
| Parsing without charset 35464 35538 100 1.4 709.3 0.3X | ||
| Parsing with UTF-8 67311 67738 381 0.7 1346.2 0.2X | ||
| Text read 10994 11010 16 4.5 219.9 1.0X | ||
| Schema inferring 45654 45677 37 1.1 913.1 0.2X | ||
| Parsing without charset 34476 34559 73 1.5 689.5 0.3X | ||
| Parsing with UTF-8 56987 57002 13 0.9 1139.7 0.2X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| Create a dataset of timestamps 2208 2222 14 4.5 220.8 1.0X | ||
| to_json(timestamp) 14299 14570 285 0.7 1429.9 0.2X | ||
| write timestamps to files 12955 12969 13 0.8 1295.5 0.2X | ||
| Create a dataset of dates 2297 2323 30 4.4 229.7 1.0X | ||
| to_json(date) 8509 8561 74 1.2 850.9 0.3X | ||
| write dates to files 6786 6827 45 1.5 678.6 0.3X | ||
| Create a dataset of timestamps 2150 2188 35 4.7 215.0 1.0X | ||
| to_json(timestamp) 17874 18080 294 0.6 1787.4 0.1X | ||
| write timestamps to files 12518 12538 34 0.8 1251.8 0.2X | ||
| Create a dataset of dates 2298 2310 18 4.4 229.8 0.9X | ||
| to_json(date) 11673 11703 27 0.9 1167.3 0.2X | ||
| write dates to files 7121 7135 12 1.4 712.1 0.3X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws | ||
| Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz | ||
| Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ------------------------------------------------------------------------------------------------------------------------ | ||
| read timestamp text from files 2598 2613 18 3.8 259.8 1.0X | ||
| read timestamps from files 42007 42028 19 0.2 4200.7 0.1X | ||
| infer timestamps from files 18102 18120 28 0.6 1810.2 0.1X | ||
| read date text from files 2355 2360 5 4.2 235.5 1.1X | ||
| read date from files 17420 17458 33 0.6 1742.0 0.1X | ||
| timestamp strings 3099 3101 3 3.2 309.9 0.8X | ||
| parse timestamps from Dataset[String] 48188 48215 25 0.2 4818.8 0.1X | ||
| infer timestamps from Dataset[String] 22929 22988 102 0.4 2292.9 0.1X | ||
| date strings 4090 4103 11 2.4 409.0 0.6X | ||
| parse dates from Dataset[String] 24952 25068 139 0.4 2495.2 0.1X | ||
| from_json(timestamp) 66038 66352 413 0.2 6603.8 0.0X | ||
| from_json(date) 43755 43782 27 0.2 4375.5 0.1X | ||
| read timestamp text from files 2616 2641 34 3.8 261.6 1.0X | ||
| read timestamps from files 37481 37517 58 0.3 3748.1 0.1X | ||
| infer timestamps from files 84774 84964 201 0.1 8477.4 0.0X | ||
| read date text from files 2362 2365 3 4.2 236.2 1.1X | ||
| read date from files 16583 16612 29 0.6 1658.3 0.2X | ||
| timestamp strings 3927 3963 40 2.5 392.7 0.7X | ||
| parse timestamps from Dataset[String] 52827 53004 243 0.2 5282.7 0.0X | ||
| infer timestamps from Dataset[String] 101108 101644 769 0.1 10110.8 0.0X | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! |
||
| date strings 4886 4906 26 2.0 488.6 0.5X | ||
| parse dates from Dataset[String] 27623 27694 62 0.4 2762.3 0.1X | ||
| from_json(timestamp) 71764 71887 124 0.1 7176.4 0.0X | ||
| from_json(date) 46200 46314 99 0.2 4620.0 0.1X | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.