Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.

Conversation

@aelkiss
Copy link
Member

@aelkiss aelkiss commented Mar 23, 2023

A couple questions that are coming up as I work through this:

  • To what extent should individual things have their own docker-compose files, to what extent should they include dependencies and what should the default practices for port binding be? I think my inclination for the babel apps is to keep it fairly limited - enough to run any tests that might be present, but not replicating the full stack of dependencies for each app, and instead to keep that here.

  • Where should the apache configuration go? My inclination would be to keep it here rather than in imgsrv, since it will serve for the other CGI applications (or should)

  • Should we rename https://github.com/hathitrust/imgsrv-sample-data ? Does it make sense to keep it separate, or should we merge it with this repository?

  • Things are somewhat inconsistent in terms of their reference to mysql-sdr vs mariadb. For consistency with current production and to minimize pain I think it makes sense to standardize on the existing mysql-sdr (and likewise for solr-sdr-catalog, perhaps?) but I could be convinced otherwise.

aelkiss added 3 commits March 22, 2023 17:00
Almost working - just need to get it to index the output from solr
(probably an issue with load_into_solr.sh)
Get it up to date with new imgsrv & slip stuff
@aelkiss aelkiss requested a review from respinos March 23, 2023 15:36
@aelkiss
Copy link
Member Author

aelkiss commented Mar 23, 2023

I think it does not make sense to containerize stage-item, largely because it itself runs things with docker-compose. I suspect just installing Ruby and the dependencies (they're fairly minimal here) on the Mac is easier than figuring out how to use the Docker API to start up stuff from within a Docker container (I'm sure it can be done, just seems like a more difficult yak to shave)

gem "marc", "~> 1.2"
gem "faraday", "~> 2.7"
gem "faraday-follow_redirects"
gem "ht-pairtree", git: "../../ht-pairtree"
Copy link
Member Author

@aelkiss aelkiss Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be updated to point to https://github.com/hathitrust/ht-pairtree/ after hathitrust/ht-pairtree#1 is merged

@aelkiss
Copy link
Member Author

aelkiss commented Mar 23, 2023

@aelkiss aelkiss requested a review from moseshll March 23, 2023 15:42
@aelkiss
Copy link
Member Author

aelkiss commented Mar 23, 2023

@moseshll I think for you the task (when this is all ready) will be to try to pull it down & run through the instructions (which are not up to date yet)

@aelkiss
Copy link
Member Author

aelkiss commented Mar 23, 2023

Regarding DEV-663 and pageturner: There was some support here for pageturner, which this stomps on. For DEV-663 I think it will make sense to re-add the apache support here, using the stuff done for imgsrv where possible, and then clean up some of the docker & apache stuff in imgsrv?

Out of scope for now, but should be addressed in DEV-663: the links in the catalog for items don't go to :8888 where the babel apps are running.

@aelkiss aelkiss force-pushed the DEV-667-stage-item branch from f32bb53 to f3442b2 Compare March 23, 2023 20:20
@aelkiss aelkiss force-pushed the DEV-667-stage-item branch from f3442b2 to 5b3d9d9 Compare March 23, 2023 20:21
trying to get it all working w/ a clean checkout
@aelkiss
Copy link
Member Author

aelkiss commented Mar 23, 2023

Building the imgsrv apache from here isn't working; the most expedient thing might just be to move that part in here now (since I was thinking of doing that anyway)

aelkiss added 5 commits March 24, 2023 16:30
This attempts to reconcile earlier work here with later work in the
imgsrv repo and more recent work in this branch.

It uses:

- nginx for catalog, imgsrv fastcgi, and static files
- proxy to apache for cgi

So far working:

- catalog incl. CSS & JS (via shared checked-out common repo)
- imgsrv fcgi
- imgsrv cgi

I was also able to clone pt & see that it at least attempted it (it had
an error about missing GeoIP data)
* Add ssd to checkout list

* Enumerate directories to mount for apache - otherwise directories we
  have in the image (geoip, cache, etc) get masked. Could change this in
  the future if we move more of the infrastructure directly to this repo
  rather than relying on checkouts in the parent dir
* ensure usage is actually printed
* update ht-pairtree to make sure that namespace dir is created
  with correct prefix
@aelkiss
Copy link
Member Author

aelkiss commented Mar 27, 2023

At this point, the following is working:

  • Clone this repo
  • Run setup.sh
  • Run docker-compose build / docker-compose up
  • Given an item zip & mets - use the instructions from README.md to run stage_item
  • Verified that the item is indexed in the dev full-text solr
  • Verified that catalog record works
  • Verified pageturner + imgsrv works
  • Verified ssd works
  • Verified download works

Still to-do:

  • see why catalog links to pageturner use https instead of http
  • see why pageturner does not show catalog info
  • consider moving imgsrv docker config to this repo (as 'hathitrust-babel-base' or something)
  • clean up other docker config in imgsrv - see DEV-667: remove Docker machinery imgsrv#12
  • use database user that has permission to ht sessions table

Next steps:

  • Try adding ptsearch solr; mb, ls apps
  • Verify it all works w/o emulation on arm (we've done this for imgsrv and I think for the catalog separately, so it should, but worth verifying)

@aelkiss
Copy link
Member Author

aelkiss commented Mar 27, 2023

Note that we should not merge this (even if working) until these PRs are merged:

hathitrust/ht-pairtree#1
https://github.com/hathitrust/slip/pull/1
hathitrust/lss_solr_configs#5
hathitrust/imgsrv#11

@aelkiss
Copy link
Member Author

aelkiss commented Mar 27, 2023

I also see a complaint from pageturner that it can't read the pod table.

@aelkiss
Copy link
Member Author

aelkiss commented Mar 27, 2023

Solved the pt metadata issue - I think a combination of not using the branch that uses wt=xml for the vufind solr + caching

aelkiss added 2 commits March 28, 2023 13:38
* Don't give instructions to clutter parent dir
* Move dockerfile for perl apps here
@aelkiss aelkiss force-pushed the DEV-667-stage-item branch from 9b5b88d to 98d9d81 Compare March 28, 2023 19:12
@aelkiss
Copy link
Member Author

aelkiss commented Mar 28, 2023

As of right now the catalog links seem to magically be using http and not https... Not sure what's going on

aelkiss added 9 commits March 29, 2023 14:26
- ensures data dir is owned by current user
- mount log dir outside

We could try to use a Docker volume for this, but the problem is that it
still wouldn't be owned by the solr user by default. If we were using a
Dockerfile instead of mounting config directories in, we would have some
other options. There might be other ways to work around this in the
future, but this works for now.
Avoids issues with permissions w/ cache, logs, etc
the web server now runs as the user running setup.sh, so cache needs to
be writable by that user - it was being created & owned by root
* Fix mount for slip output
@aelkiss
Copy link
Member Author

aelkiss commented Mar 30, 2023

@moseshll @respinos I was able to run through the instructions in the README without additional fiddling, and I believe everything now works both over ssh and https, and without any permissions weirdness. I think we still need to get these PRs merged:

hathitrust/lss_solr_configs#5
https://github.com/hathitrust/pt/pull/4
https://github.com/hathitrust/ssd/pull/1

Once those are merged I will remove the branches from setup.sh here, squash the commits here, and then I think we should be ready to merge this!

@aelkiss
Copy link
Member Author

aelkiss commented Mar 30, 2023

As of right now the catalog links seem to magically be using http and not https... Not sure what's going on

This seems to be some issue with my browser - if I try in a new private/incognito window it's fine. Must be some sort of pinning to localhost to https 😩

@aelkiss
Copy link
Member Author

aelkiss commented Mar 30, 2023

Still unresolved; should track; probably don't need to solve right this minute (but should soon, and will need to for mb, etc; I'll make a note in that issue)

  • use database user that has permission to ht sessions table

Also @moseshll @carylwyatt when you have a chance it would be good to make sure this all works on ARM without emulation/rigmarole - I think it should but I don't know for sure.

aelkiss added 2 commits March 30, 2023 15:07
- build indexer rather than using image (which may not exist)
- take pt & ssd off of branches
@aelkiss
Copy link
Member Author

aelkiss commented Apr 3, 2023

Now just waiting on hathitrust/lss_solr_configs#5

@aelkiss
Copy link
Member Author

aelkiss commented Apr 4, 2023

Superceded by #5

@aelkiss aelkiss closed this Apr 4, 2023
@aelkiss aelkiss deleted the DEV-667-stage-item branch May 3, 2023 16:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants