Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.

Commit 9d67bd5

Browse files
authored
Merge pull request #5 from hathitrust/DEV-667-rebased-squashed
DEV-667: stage item in repo & index with catalog & full-text
2 parents 5592be7 + 336d1ac commit 9d67bd5

File tree

18 files changed

+673
-377
lines changed

18 files changed

+673
-377
lines changed

.gitignore

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
vendor
2+
.bundle
3+
.env
4+
stage-item/*.xml
5+
stage-item/*.zip
6+
7+
# other repositories
8+
9+
catalog/
10+
common/
11+
hathitrust_catalog_indexer/
12+
ht-pairtree/
13+
imgsrv-sample-data/
14+
imgsrv/
15+
lss_solr_configs/
16+
pt/
17+
sample-data/
18+
slip/
19+
ssd/
20+
logs/
21+
cache/

Dockerfile

Lines changed: 0 additions & 219 deletions
This file was deleted.

README.md

Lines changed: 80 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -8,64 +8,113 @@ Clone all the repositories in a working directory.
88
We're going to be running docker from this working directory,
99
so `babel-local-dev` has access to the other repositories.
1010

11-
There's a lot, because we're replicating running on the
12-
dev servers with `debug_local=1` enabled.
13-
14-
```
15-
$ mkdir workdir
16-
$ cd workdir
17-
$ git clone [email protected]:hathitrust/babel-local-dev.git
18-
$ git clone [email protected]:hathitrust/catalog.git
19-
$ git clone [email protected]:hathitrust/common.git
20-
$ git clone [email protected]:hathitrust/imgsrv.git
21-
$ git clone [email protected]:hathitrust/pt.git
22-
$ git clone [email protected]:hathitrust/mdp-lib.git
23-
$ git clone [email protected]:hathitrust/slip-lib.git
24-
$ git clone [email protected]:hathitrust/plack-lib.git
25-
$ git clone [email protected]:hathitrust/imgsrv-sample-data.git
26-
# more to come
11+
First clone this repository:
12+
```bash
13+
git clone [email protected]:hathitrust/babel-local-dev.git babel
2714
```
2815

29-
## Step 2: intialize all the submodules
16+
Then run:
3017

31-
*Insert fancy one liner if available.*
18+
```bash
19+
cd babel
20+
./setup.sh
21+
```
22+
23+
This will check out the other repositories along with their submodules.
24+
There's a lot, because we're replicating running on the dev servers with
25+
`debug_local=1` enabled.
3226

3327
## Step 3: build the `babel-local-dev` environment
3428

3529
In your workdir:
3630

3731
```
38-
docker-compose -f ./babel-local-dev/docker-compose.yml build
32+
docker-compose build
3933
```
4034

4135
## Step 4: run `babel-local-dev`:
4236

4337
In your workdir:
4438

4539
```
46-
docker-compose -f ./babel-local-dev/docker-compose.yml up
40+
docker-compose up
4741
```
4842

4943
In your browser:
5044

51-
* http://localhost:8080/Search/Home
52-
* http://localhost:8080/cgi/pt?id=test.pd_open
45+
* catalog: `http://localhost:8080/Search/Home`
46+
* catalog solr: `http://localhost:9033`
47+
* full-text solr: `http://localhost:8983`
48+
49+
PageTurner & imgsrv:
5350

51+
* `http://localhost:8080/cgi/pt?id=test.pd_open`
52+
* `http://localhost:8080/cgi/imgsrv/cover?id=test.pd_open`
53+
* `http://localhost:8080/cgi/imgsrv/image?id=test.pd_open&seq=1`
54+
* `http://localhost:8080/cgi/imgsrv/html?id=test.pd_open&seq=1`
55+
* `http://localhost:8080/cgi/imgsrv/download/pdf?id=test.pd_open&seq=1&attachment=0`
56+
57+
mysql is exposed at 127.0.0.1:3307. The default username & password with write
58+
access is `mdp-admin` / `mdp-admin` (needless to say, do not use this image in
59+
production!)
60+
61+
```bash
62+
mysql -h 127.0.0.1 -p 3307 -u mdp-admin -p
63+
```
5464
Huzzah!
5565

66+
Not yet configured:
67+
* `http://localhost:8080/cgi/mb`
68+
* `http://localhost:8080/cgi/ls`
69+
* `http://localhost:8080/cgi/whoami`
70+
* `http://localhost:8080/cgi/ping`
71+
* etc
72+
5673
## How this works (for now)
5774

58-
The `docker-commpose` provides a custom catalog configuration to the `nginx` service to
59-
proxy `babel` CGI requests to the `apache-cgi` service, and serve `common` requests from
60-
the local `common` checkout.
75+
* catalog runs nginx + php
76+
* babel cgi apps run under apache in a single container
77+
* imgsrv plack/psgi process runs in its own container
78+
79+
## Staging an Item
6180

62-
`apache-cgi` is there because `nginx` can only speak FastCGI/HTTP and running *all* the babel
63-
apps under FastCGI/HTTP is still aspirational.
81+
First, get a HathiTrust ZIP and METS. The easiest way to do this is probably by
82+
using the [Data API client](https://babel.hathitrust.org/cgi/htdc) to download
83+
a public domain item unencumbered by any contractual restrictions, for example
84+
`uc2.ark:/13960/t4mk66f1d`. Select "Download" and in turn select "Item METS
85+
file" and "entire item" and submit the form; this will download the ZIP and
86+
METS respectively.
87+
88+
Running the stage item script requires a Ruby runtime. It will automate putting
89+
the item in the appropriate location under `imgsrv-sample-data`, fetch the
90+
bibliographic data, and extract and index the full text.
91+
92+
First make sure all the dependencies are running:
93+
94+
```bash
95+
docker-compose build
96+
docker-compose up
97+
```
98+
99+
Then, install dependencies for the `stage-item` script and run it with the
100+
downloaded zip and METS:
101+
102+
```bash
103+
docker-compose run traject bundle install
104+
cd stage-item
105+
bundle config set --local path 'vendor/bundle'
106+
bundle install
107+
bundle exec ruby stage_item.rb uc2.ark:/13960/t4mk66f1d ark+=13960=t4mk66f1d.zip ark+=13960=t4mk66f1d.mets.xml
108+
```
109+
110+
Note that the zip and METS must be named as they are in the actual
111+
repository -- if you name them "foo.zip" or "foo.xml" they will not be renamed,
112+
and full-text indexing and PageTurner will not be able to find the item.
64113

65114
## TODO
66115

67-
- [ ] merge the `imgsrv` DEV-231-grok branch and update the `Dockerfile`s to include `grok`
68-
- [ ] update `slip-lib/Searcher.pm` to set `wt=xml` because the new solr defaults return JSON
69-
- [ ] adding `pt` requires filling out more of the `ht_web` tables (namely `mb_*`)
116+
- [ ] add `mb` and `ls`
117+
- [ ] ensure database user can write to relevant tables
118+
- [ ] link to documentation for important tasks - e.g. running apps under debugging, updating css/js, etc
70119
- [ ] easy mechanism to generate placeholder volumes in `imgsrv-sample-data` that correspond to the records in the catalog
71-
120+
- [ ] make it easier to fetch real volumes

cache/.keep

Whitespace-only changes.

0 commit comments

Comments
 (0)