Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ jobs:
name: Maven PR Builder (JDK ${{ matrix.java }})
runs-on: ubuntu-latest
env:
AUDIOWAVEFORM_S3_BUCKET: ${{ secrets.AUDIOWAVEFORM_S3_BUCKET }}
AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE: ${{ secrets.AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_DEFAULT_REGION: us-west-2
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
MAVEN_CACHE_KEY: ${{ secrets.MAVEN_CACHE_KEY }}
strategy:
matrix:
Expand All @@ -24,6 +29,8 @@ jobs:
with:
distribution: 'zulu'
java-version: ${{ matrix.java }}
- name: Install Audio Waveform Image Generator
run: sudo add-apt-repository ppa:chris-needham/ppa && sudo apt-get install audiowaveform
# If running locally in act, install Maven
- name: Set up Maven if needed
if: ${{ env.ACT }}
Expand Down Expand Up @@ -52,3 +59,4 @@ jobs:
maven_args: >
-V -ntp -Dorg.slf4j.simpleLogger.log.net.sourceforge.pmd=error
-Dsurefire.skipAfterFailureCount=1 -Dfailsafe.skipAfterFailureCount=1
-DghActions
8 changes: 8 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@ jobs:
name: Maven Artifact Publisher (JDK 11)
runs-on: ubuntu-latest
env:
AUDIOWAVEFORM_S3_BUCKET: ${{ secrets.AUDIOWAVEFORM_S3_BUCKET }}
AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE: ${{ secrets.AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE }}
AUTORELEASE_ARTIFACT: ${{ secrets.AUTORELEASE_ARTIFACT }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_DEFAULT_REGION: us-west-2
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
SKIP_JAR_DEPLOYMENT: ${{ secrets.SKIP_JAR_DEPLOYMENT }}
MAVEN_CACHE_KEY: ${{ secrets.MAVEN_CACHE_KEY }}
steps:
Expand All @@ -20,6 +25,8 @@ jobs:
uses: actions/setup-java@d202f5dbf7256730fb690ec59f6381650114feb2 # v1
with:
java-version: 11
- name: Install Audio Waveform Image Generator
run: sudo add-apt-repository ppa:chris-needham/ppa && sudo apt-get install audiowaveform
# If running locally in act, install Maven
- name: Set up Maven if needed
if: ${{ env.ACT }}
Expand Down Expand Up @@ -62,3 +69,4 @@ jobs:
-Ddocker.registry.username=${{ secrets.DOCKER_USERNAME }}
-Ddocker.registry.account=${{ secrets.DOCKER_REGISTRY_ACCOUNT}}
-Ddocker.registry.password=${{ secrets.DOCKER_PASSWORD }}
-DghActions
11 changes: 7 additions & 4 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,18 @@ This document describes the high level architecture of the av-pairtree program.

The av-pairtree program is an application that watches a "drop box" for new or updated CSV files with a certain structure. When a CSV file with A/V files has been put into the drop box, the av-pairtree program picks them up and processes their audio and video files.

The processing of video files (which are MP4s by default) involves putting them into a [Pairtree](https://tools.ietf.org/html/draft-kunze-pairtree-01) structure that's accessible to a media server. The processing of audio files (which are WAVs by default) involves converting the WAV files into MP4 files and, then, putting them into the Pairtree structure.
The processing of video files (which are MP4s by default) involves putting them into a [Pairtree](https://tools.ietf.org/html/draft-kunze-pairtree-01) structure that's accessible to a media server. The processing of audio files (which are WAVs by default) involves two conversions:

After all the A/V files in a CSV file have been processed, the input CSV is updated to include the resources' new access URLs (i.e. the URLs of the media files as served by the media server) and written back out to the file system.
* to MP4 format, the results of which are put into the Pairtree structure; and
* to binary [audiowaveform](https://github.com/bbc/audiowaveform) format, the results of which are deposited to AWS S3.

After all the A/V files in a CSV file have been processed, the input CSV is updated to include the resources' new access URLs (i.e. the URLs of the media files as served by the media server) and audiowaveform URLs, then written back out to the file system.

![Overview diagram for av-pairtree's components](docs/images/av_pairtree_overview.svg)

## Expected CSV Structure

A CSV file that is going to be processed by av-pairtree should have two required columns: `File Name` and `ItemARK`. The first is used to retrieve the media file to be processed and the second is used to create the Pairtree structure. If the file has been previously processed (or has been processed by the Bucketeer application), it will also have a `IIIF Access URL` column. That's fine. However, older CSV files may have `iiif_access_url` as a column header. Any CSVs with that column header should be manually updated before processing with av-pairtree.
A CSV file that is going to be processed by av-pairtree should have two required columns: `File Name` and `ItemARK`. The first is used to retrieve the media file to be processed and the second is used to create the Pairtree structure. If the file has been previously processed (or has been processed by the Bucketeer application), it will also have a `IIIF Access URL` column; if previously processed, it may also have a `Waveform` column. That's fine. However, older CSV files may have `iiif_access_url` as a column header. Any CSVs with that column header should be manually updated before processing with av-pairtree.

## Code Map

Expand All @@ -33,7 +36,7 @@ The basic structure of this particular Vert.x program includes: verticles, handl
| CsvItem | This is an object which represents a single item (or row) from the CSV file | [src/main/java/edu/ucla/library/avpairtree/CsvItem.java](https://github.com/UCLALibrary/av-pairtree/blob/main/src/main/java/edu/ucla/library/avpairtree/CsvItem.java) |
| CsvItemCodec | This codec implements a JSON (de)serialization of CsvItem so that it can be sent over the event bus | [src/main/java/edu/ucla/library/avpairtree/CsvItemCodec.java](https://github.com/UCLALibrary/av-pairtree/blob/main/src/main/java/edu/ucla/library/avpairtree/CsvItemCodec.java) |

Actions (e.g., the parsing of CSV files, conversion of media files, or storage of media files in a Pairtree structure, etc.) are performed by the application's various verticles (e.g., WatcherVerticle, ConverterVerticle, PairtreeVerticle, etc.) Cf. the `verticles` directory for examples.
Actions (e.g., the parsing of CSV files, conversion of media files, or storage of media files in a Pairtree structure, etc.) are performed by the application's various verticles (e.g., WatcherVerticle, ConverterVerticle, PairtreeVerticle, WaveformVerticle, etc.) Cf. the `verticles` directory for examples.

## Sequence Diagram

Expand Down
46 changes: 42 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,38 @@
# A/V Pairtree

This project processes A/V files into a collection of Pairtree structures.
This project processes A/V files into a collection of Pairtree structures and generates waveform data (for visualization) from audio files.

## Pre-requisites

The only hard requirement is that you need a [JDK (>= 11)](https://adoptopenjdk.net/) installed and configured.
In addition to a [JDK (>= 11)](https://adoptopenjdk.net/) installed and configured, you will also need the following in order to generate waveform data for audio files:

* the BBC's [audiowaveform](https://github.com/bbc/audiowaveform) command line tool, with the executable on your PATH; and
* an AWS S3 bucket for storing the audio waveform data, and write credentials for the bucket (which you'll likely want to be world-readable).

There are two sets of build instructions: one for systems with [Maven](https://maven.apache.org/) pre-installed and one for systems without Maven.

## Setting AWS credentials for integration tests

In order to run the integration tests that use AWS S3, you should have an entry like the following in the `profiles` section of your `/home/.m2/settings.xml` (or another settings file elsewhere):

```xml
<profile>
<id>av-pairtree</id>
<activation>
<property>
<name>!skipDefaultProfile</name>
</property>
</activation>
<properties>
<avpt.s3.access_key>myAwsAccessKey</avpt.s3.access_key>
<avpt.s3.bucket>myAwsS3Bucket</avpt.s3.bucket>
<avpt.s3.object.url.template>https://${avpt.s3.bucket}.s3-${avpt.s3.region}.amazonaws.com/{}</avpt.s3.object.url.template>
<avpt.s3.region>us-west-2</avpt.s3.region>
<avpt.s3.secret_key>myAwsSecretKey</avpt.s3.secret_key>
</properties>
</profile>
```

## Building and testing locally without Maven pre-installed

To build the project the first time, type:
Expand Down Expand Up @@ -42,9 +67,22 @@ To process one of the test CSVs, you can copy a CSV file from `src/test/resource

## Running in production

To run av-pairtree from the Jar file, one needs to type the following:
To run av-pairtree from the Jar file, one must set AWS S3 credentials and then run the JAR:

```bash
#!/bin/bash

export AUDIOWAVEFORM_S3_BUCKET=myAwsS3Bucket
export AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE=http://example.com/{}
export AWS_ACCESS_KEY_ID=myAwsAccessKey
export AWS_DEFAULT_REGION=us-west-2
export AWS_SECRET_ACCESS_KEY=myAwsSecretKey

java -Dvertx.logger-delegate-factory-class-name=io.vertx.core.logging.SLF4JLogDelegateFactory -Dvertx-config-path=config.properties -jar target/av-pairtree-0.0.1-SNAPSHOT.jar run edu.ucla.library.avpairtree.verticles.MainVerticle
java \
-Dvertx.logger-delegate-factory-class-name=io.vertx.core.logging.SLF4JLogDelegateFactory \
-Dvertx-config-path=config.properties \
-jar target/av-pairtree-0.0.1-SNAPSHOT.jar run edu.ucla.library.avpairtree.verticles.MainVerticle
```

The application is configured by the value of `vertx-config-path`, which in the example above is a config file residing in the same directory as the Jar file.

Expand Down
40 changes: 39 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
<slf4j.ext.version>1.7.30</slf4j.ext.version>
<dir.watcher.version>0.15.0</dir.watcher.version>
<freelib.utils.version>2.3.0</freelib.utils.version>
<awssdk.version>2.15.15</awssdk.version>

<!-- Build plugin versions -->
<vertx.plugin.version>1.0.23</vertx.plugin.version>
Expand Down Expand Up @@ -67,6 +68,13 @@
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>bom</artifactId>
<version>${awssdk.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

Expand Down Expand Up @@ -121,6 +129,10 @@
<artifactId>jave-core</artifactId>
<version>${jave.version}</version>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
</dependency>

<!-- Below is a dependency that needs updating due to security issue (may be able to remove in future) -->
<dependency>
Expand Down Expand Up @@ -150,6 +162,7 @@
<excludes>
<exclude>synanon/video/synanon.mp4</exclude>
<exclude>soul/audio/uclapasc.wav</exclude>
<exclude>soul/audio/uclapasc.dat</exclude>
</excludes>
</testResource>
</testResources>
Expand Down Expand Up @@ -378,8 +391,33 @@
</plugins>
</build>
</profile>
</profiles>

<!-- A profile for setting AWS environment variables for Maven Surefire when we're not running the build on GitHub Actions -->
<profile>
<id>surefire-config-aws-environment-variables</id>
<activation>
<property>
<name>!ghActions</name>
</property>
</activation>
<build>
<plugins>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<environmentVariables>
<AUDIOWAVEFORM_S3_BUCKET>${avpt.s3.bucket}</AUDIOWAVEFORM_S3_BUCKET>
<AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE>${avpt.s3.object.url.template}</AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE>
<AWS_ACCESS_KEY_ID>${avpt.s3.access_key}</AWS_ACCESS_KEY_ID>
<AWS_DEFAULT_REGION>${avpt.s3.region}</AWS_DEFAULT_REGION>
<AWS_SECRET_ACCESS_KEY>${avpt.s3.secret_key}</AWS_SECRET_ACCESS_KEY>
</environmentVariables>
</configuration>
</plugin>
</plugins>
</build>
</profile>
</profiles>
<parent>
<artifactId>freelib-parent</artifactId>
<groupId>info.freelibrary</groupId>
Expand Down
32 changes: 32 additions & 0 deletions src/main/java/edu/ucla/library/avpairtree/AvPtUtils.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
package edu.ucla.library.avpairtree;

import java.nio.file.Path;

import info.freelibrary.util.Constants;

/**
* A class for utility methods.
*/
public final class AvPtUtils {

private AvPtUtils() {
}

/**
* Gets the input file path from available variables.
*
* @param aCsvItem A CSV item sent to us over the wire
* @param aSourceDir A pre-configured source files directory
* @return A file system path for the input file
*/
public static Path getInputFilePath(final CsvItem aCsvItem, final String aSourceDir) {
final String relativeFilePath = aCsvItem.getFilePath();

// Use our source folder unless we receive a file path that is absolute
if (!relativeFilePath.startsWith(Constants.SLASH)) {
return Path.of(aSourceDir, relativeFilePath);
} else {
return Path.of(relativeFilePath);
}
}
}
10 changes: 10 additions & 0 deletions src/main/java/edu/ucla/library/avpairtree/Config.java
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ public final class Config {
*/
public static final String CONVERSION_WORKERS = "conversion.workers";

/**
* The environment variable for the S3 bucket for audio waveforms.
*/
public static final String AUDIOWAVEFORM_S3_BUCKET = "AUDIOWAVEFORM_S3_BUCKET";

/**
* The environment variable for the S3 object URL template for audio waveforms.
*/
public static final String AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE = "AUDIOWAVEFORM_S3_OBJECT_URL_TEMPLATE";

// Constant classes should have private constructors.
private Config() {
}
Expand Down
10 changes: 9 additions & 1 deletion src/main/java/edu/ucla/library/avpairtree/CsvItem.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,19 @@
public class CsvItem {

/**
* The CSV header column for the IIIF access URL.
* The CSV header column for the IIIF access URL. Note that this not used for deserialization; see
* WatcherVerticle.updateCSV for its use in serialization.
*/
@CsvIgnore
public static final String IIIF_ACCESS_URL_HEADER = "IIIF Access URL";

/**
* The CSV header column for the Waveform. Note that this not used for deserialization; see
* WatcherVerticle.updateCSV for its use in serialization.
*/
@CsvIgnore
public static final String WAVEFORM_HEADER = "Waveform";

/**
* The CSV header column for the item's identifier.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import info.freelibrary.util.Logger;
import info.freelibrary.util.LoggerFactory;

import edu.ucla.library.avpairtree.AvPtUtils;
import edu.ucla.library.avpairtree.Config;
import edu.ucla.library.avpairtree.CsvItem;
import edu.ucla.library.avpairtree.MessageCodes;
Expand Down Expand Up @@ -79,7 +80,7 @@ public void start(final Promise<Void> aPromise) {
final CsvItem csvItem = message.body();

try {
final Path inputFilePath = getInputFilePath(csvItem, sourceDir);
final Path inputFilePath = AvPtUtils.getInputFilePath(csvItem, sourceDir);
final Path outputFilePath = getOutputFilePath(inputFilePath, outputFormat);
final EncodingAttributes encoding = new EncodingAttributes();
final AudioAttributes audio = new AudioAttributes();
Expand Down Expand Up @@ -138,22 +139,4 @@ private Path getOutputFilePath(final Path aInputFilePath, final String aOutputFo

return Path.of(SYSTEM_TMP_DIR, SCRATCH_SPACE, outputFileName);
}

/**
* Gets the input file path from available variables.
*
* @param aCsvItem A CSV item sent to us over the wire
* @param aSourceDir A pre-configured source files directory
* @return A file system path for the input file
*/
private Path getInputFilePath(final CsvItem aCsvItem, final String aSourceDir) {
final String relativeFilePath = aCsvItem.getFilePath();

// Use our source folder unless we receive a file path that is absolute
if (!relativeFilePath.startsWith(Constants.SLASH)) {
return Path.of(aSourceDir, relativeFilePath);
} else {
return Path.of(relativeFilePath);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import edu.ucla.library.avpairtree.handlers.StatusHandler;

import io.methvin.watcher.DirectoryWatcher;

import io.vertx.config.ConfigRetriever;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.CompositeFuture;
Expand Down Expand Up @@ -131,6 +132,7 @@ private void configureServer(final JsonObject aConfig, final Promise<Void> aProm
futures.add(deployVerticle(new WatcherVerticle(), aConfig));
futures.add(deployVerticle(new PairtreeVerticle(), aConfig));
futures.add(deployVerticle(new ConverterVerticle(), aConfig.copy().put(WORKER, true)));
futures.add(deployVerticle(new WaveformVerticle(), aConfig));

CompositeFuture.all(futures).onSuccess(result -> {
startCsvDirWatcher(aConfig).onComplete(startup -> {
Expand Down
Loading