generated from ksclarke/vertx-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Add script chaining A/V Pairtree, Metagetter, and Festerize #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # A/V Pairtree Scripts | ||
|
|
||
| Various scripts for use with the A/V Pairtree application. | ||
|
|
||
| ## avpt_data_pipeline.sh | ||
|
|
||
| This script constructs a data processing pipeline consisting of A/V Pairtree, Metagetter, and Festerize (in that order), in which the output CSV files of each component application are passed to the next one for further processing. | ||
|
|
||
| ### Installation | ||
|
|
||
| Dependencies: | ||
| - GNU bash (written for version 4.2.46(2)-release (x86_64-redhat-linux-gnu)) | ||
| - GNU coreutils | ||
| - curl | ||
| - ffmpeg | ||
| - [inotifywait](https://github.com/inotify-tools/inotify-tools) | ||
| - [UCLALibrary/services-metagetter](https://github.com/UCLALibrary/services-metagetter) | ||
| - [UCLALibrary/festerize](https://github.com/UCLALibrary/festerize) | ||
|
|
||
| ### Usage | ||
|
|
||
| The following environment variables must be set: | ||
|
|
||
| Environment variable|Description | ||
| ---|--- | ||
| AVPTDP_INPUT_DIRECTORY|directory where A/V Pairtree puts .out files; this is the input directory for the pipeline (and thus, for Metagetter) | ||
| AVPTDP_FESTERIZE_OUTPUT_DIRECTORY|directory where Festerize puts .csv files | ||
| AVPTDP_METAGETTER_MEDIA_DIRECTORY|directory where Metagetter will search for A/V media files | ||
| AVPTDP_METAGETTER_OUTPUT_DIRECTORY|directory where Metagetter puts .out files (which are then renamed as .csv); this is the input directory for Festerize | ||
| AVPTDP_SLACK_WEBHOOK_URL|URL of the webhook for posting to Slack | ||
|
|
||
| The script takes a single optional positional argument: an alias for the ingest Fester instance to Festerize the data with. If omitted, or if an unknown alias is used, the script will point Festerize at http://localhost:8888. | ||
|
|
||
| Known aliases: | ||
|
|
||
| Argument|Description | ||
| ---|--- | ||
| prod|https://ingest.iiif.library.ucla.edu | ||
| test|https://test-iiif.library.ucla.edu | ||
|
|
||
| For example: | ||
|
|
||
| ```bash | ||
| #!/bin/bash | ||
|
|
||
| export AVPTDP_INPUT_DIRECTORY="avpt_output/" | ||
| export AVPTDP_FESTERIZE_OUTPUT_DIRECTORY="festerize_output/" | ||
| export AVPTDP_METAGETTER_MEDIA_DIRECTORY="metagetter_media/" | ||
| export AVPTDP_METAGETTER_OUTPUT_DIRECTORY="metagetter_output/" | ||
| export AVPTDP_SLACK_WEBHOOK_URL="https://hooks.slack.com/services/0123456789" | ||
|
|
||
| ./avpt_data_pipeline.sh prod | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| #!/bin/bash | ||
|
|
||
| function get_av_metadata { | ||
| # Runs the CSV at the path provided via $1 through services-metagetter and outputs the path of the result CSV | ||
| 2>/dev/null 1>&2 \ | ||
| java -jar UCLALibrary/services-metagetter/target/build-artifact/services-metagetter-0.0.1-SNAPSHOT.jar \ | ||
| $1 ${AVPTDP_METAGETTER_MEDIA_DIRECTORY} `which ffprobe` ${AVPTDP_METAGETTER_OUTPUT_DIRECTORY} && | ||
| echo `strip_trailing_slash ${AVPTDP_METAGETTER_OUTPUT_DIRECTORY}`/`basename $1` | ||
| } | ||
|
|
||
| function change_filename_extension { | ||
| # Change the filename extension of the provided path (piped to stdin) from .out to .csv, since festerize only looks | ||
| # at .csv files | ||
| read filename_dot_out && | ||
| filename_dot_csv=`sed -e "s/\.out$/.csv/" <<< ${filename_dot_out}` | ||
| mv ${filename_dot_out} ${filename_dot_csv} | ||
| echo ${filename_dot_csv} | ||
| } | ||
|
|
||
| function festerize_ { | ||
| # Runs the CSV at the provided path (piped to stdin) through festerize (using the base URL provided via $1) and | ||
| # outputs the path of the result CSV | ||
| read csv_filename && | ||
| yes | | ||
| 2>/dev/null 1>&2 \ | ||
| festerize --iiif-api-version 3 --server $1 --out ${AVPTDP_FESTERIZE_OUTPUT_DIRECTORY} ${csv_filename} && | ||
| echo `strip_trailing_slash ${AVPTDP_FESTERIZE_OUTPUT_DIRECTORY}`/`basename ${csv_filename}` | ||
| } | ||
|
|
||
| function send_slack_notification { | ||
| # Posts a notification to a Slack channel with a message about the input CSV ($1), the ingest Fester base URL ($2), | ||
| # and the output CSV (stdin), and then outputs the message | ||
| read csv_filename && | ||
| message="Input CSV $1 was updated successfully, and after Festerizing with $2 is now available at ${csv_filename}." | ||
| curl -s -X POST -H 'Content-type: application/json' --data '{"text":${message}}' ${AVPTDP_SLACK_WEBHOOK_URL} | ||
| echo ${message} | ||
| } | ||
|
|
||
| function get_ingest_fester_base_url { | ||
| # Outputs the base URL of the ingest Fester instance associated with the provided alias | ||
| case $1 in | ||
| prod) | ||
| echo "https://ingest.iiif.library.ucla.edu" | ||
| ;; | ||
| test) | ||
| echo "https://test-iiif.library.ucla.edu" | ||
| ;; | ||
| *) | ||
| echo "http://localhost:8888" | ||
| ;; | ||
| esac | ||
| } | ||
|
|
||
| function strip_trailing_slash { | ||
| # Outputs the provided path with any trailing slash removed | ||
| sed -e "s/\/$//" <<< $1 | ||
| } | ||
|
|
||
| # Check if the required env vars are set | ||
| if [ -z "${AVPTDP_INPUT_DIRECTORY}" ] | ||
| then | ||
| echo "The env var AVPTDP_INPUT_DIRECTORY must be set." | ||
| exit 1 | ||
| elif [ -z "${AVPTDP_FESTERIZE_OUTPUT_DIRECTORY}" ] | ||
| then | ||
| echo "The env var AVPTDP_FESTERIZE_OUTPUT_DIRECTORY must be set." | ||
| exit 1 | ||
| elif [ -z "${AVPTDP_METAGETTER_MEDIA_DIRECTORY}" ] | ||
| then | ||
| echo "The env var AVPTDP_METAGETTER_MEDIA_DIRECTORY must be set." | ||
| exit 1 | ||
| elif [ -z "${AVPTDP_METAGETTER_OUTPUT_DIRECTORY}" ] | ||
| then | ||
| echo "The env var AVPTDP_METAGETTER_OUTPUT_DIRECTORY must be set." | ||
| exit 1 | ||
| elif [ -z "${AVPTDP_SLACK_WEBHOOK_URL}" ] | ||
| then | ||
| echo "The env var AVPTDP_SLACK_WEBHOOK_URL must be set." | ||
| exit 1 | ||
| fi | ||
|
|
||
| ingest_fester_base_url=`get_ingest_fester_base_url $1` | ||
| >&2 echo "Using Fester instance at ${ingest_fester_base_url} for ingest." | ||
|
|
||
| # Get a more informative return status from our pipeline in the main loop | ||
| set -o pipefail | ||
|
|
||
| inotifywait -mr \ | ||
| --timefmt '%d/%m/%y %H:%M' --format '%T %w %f' \ | ||
| -e close_write \ | ||
| ${AVPTDP_INPUT_DIRECTORY} | | ||
| while read -r date time dir file; do | ||
| # Only process files with a ".out" filename extension | ||
| case ${file} in | ||
| *.out) | ||
| abs_path=${dir}${file} | ||
|
|
||
| get_av_metadata ${abs_path} | | ||
| change_filename_extension | | ||
| festerize_ ${ingest_fester_base_url} | | ||
| send_slack_notification ${abs_path} ${ingest_fester_base_url} | ||
markmatney marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ;; | ||
| *) | ||
| ;; | ||
| esac | ||
| done | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.