Implementation of search features and the endpoint /api/v1/search for the o2r API.
The finder utilizes Elasticsearch to provide means for
- A simple auto-suggest search functionality,
- spatial search,
- temporal search,
- and other Elasticsearch queries.
The auto-suggest search is is not readily available with MongoDB (though it has full text search).
Since we don't want to worry about keeping things in sync, the finder simply re-indices the whole database at startup and then subscribes to changes in the MongoDB using node-elasticsearch-sync (for both steps).
The /api/v1/search endpoint allows two types of queries:
- 
Simple queries via GET: as an Elasticsearch query string 
- 
Complex queries via POST: using the Elasticsearch Query DSL 
For more details and examples see the Search API documentation.
The finder supports searching for special characters for these fields:
- metadata.o2r.identifier.doi
- metadata.o2r.identifier.doiurl
To support additional fields with special characters, the mapping in config/mapping.js has to be updated in order to copy the fields into the group field _special
- When doing a simple query via a query string, both the _specialand the_allfields are searched:
/api/v1/search?q=10.1006%2Fjeem.1994.1031
- When doing a complex query, the user has control over which fields are searched. To search both fields nest the queries like this:
"query": {
    "bool": {
        "should" : [
            {"query_string": {"default_field": "_all", "query": [...]}},
            {"query_string": {"default_field": "_special", "query": [...]}},
        ]
    }
}
Other possible options to search both fields are:
- whole database muncher (a cluster or instance of Elasticsearch)
- all compendia (collection in MongoDB, an index in Elasticsearch)
- textdocuments (detected via mime type of the files) as fields in Elasticsearch
 
- all jobs (collection in MongoDB, an index in Elasticsearch)
 
- all compendia (collection in MongoDB, an index in Elasticsearch)
The MongoDB id is stored as the entry id to allow deletion in Elasticsearch when an element is removed from MongoDB.
The "public" ID for the compendium is stored in compendium_id.
Example:
(...])
"hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
        {
            "_score": 1,
            "_source": {
                "user": "0000-0001-6230-4374",
                "metadata": {},
                "jobs": [],
                "created": "2017-08-21T14:31:27.376Z",
                "files": {},
                "compendium_id": "mQryh"
            }
            },
            {
            "_score": 1,
            "_source": {
                "user": "0000-0001-6230-4374",
                "metadata": {},
                "jobs": [],
                "created": "2017-08-21T14:31:47.623Z",
                "files": {},
                "compendium_id": "Ks1Bc"
            }
        },
    ]
    (...)
}
(...)Note: If you update the metadata structure of compendium or jobs and you already have indexed these in elasticsearch, you have to drop the elasticsearch o2r-index via
curl -XDELETE 'http://172.17.0.3:9200/o2r'Otherwise, new compendia will not be indexed anymore.
- Elasticsearch server
- Docker
- Node.js
- MondoDB, running with a replication set (!)
This project includes a Dockerfile which can be built and run as follows. This is not a complete configuration, useful for testing only.
docker build -t finder .
# start databases in containers (optional)
docker run --name mongodb -d mongo:3.4 mongod --replSet rso2r --smallfiles
docker exec $(docker ps -qf "name=mongodb" bash -c "sleep 5; mongo --verbose --host mongodb --eval 'printjson(rs.initiate()); printjson(rs.conf()); printjson(rs.status()); printjson(rs.slaveOk());'"
docker run --name es -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:5.6.3
docker run -it --link mongodb --link es -e ELASTIC_SEARCH_URL=es:9200 -e FINDER_MONGODB=mongodb://mongodb -e MONGO_OPLOG_URL=mongodb://mongodb/muncher -e MONGO_DATA_URL=mongodb://mongodb/muncher -e DEBUG=finder -p 8084:8084 finderThe image can then be configured via environment variables.
- FINDER_PORTRequired Port for HTTP requests, defaults to- 8084.
- FINDER_MONGODBRequired Location for the mongo db. Defaults to- mongodb://localhost:27017/. You will very likely need to change this (and maybe include the MongoDB port).
- FINDER_MONGODB_DATABASEWhich database inside the mongo db should be used. Defaults to- muncher.
- FINDER_MONGODB_COLL_COMPENDIAName of the MongoDB collection for compendia, default is- compendia.
- FINDER_MONGODB_COLL_JOBSName of the MongoDB collection for jobs, default is- jobs.
- FINDER_MONGODB_COLL_SESSIONName of the MongoDB collection for session information, default is- sessions(must match other microservices).
- FINDER_ELASTICSEARCH_INDEX_COMPENDIAName of the Elasticsearch index for compendia, default is- compendia
- FINDER_ELASTICSEARCH_INDEX_JOBSName of the Elasticsearch index for jobs, default is- jobs.
- SESSION_SECRETSecret used for session encryption, must match other services, default is- o2r.
- FINDER_STATUS_LOGSIZENumber of transformation results in the status log, default is- 20.
- node-elasticsearch-sync parameters
- ELASTIC_SEARCH_URLRequired, default is- http://localhost:9200.
- MONGO_OPLOG_URLRequired, defaults to- FINDER_MONGODB + FINDER_MONGODB_DATABASE, e.g.- mongodb://localhost/muncher.
- MONGO_DATA_URLRequired, defaults to- FINDER_MONGODB + FINDER_MONGODB_DATABASE, e.g.- mongodb://localhost/muncher.
- BATCH_COUNTRequired, defaults to- 20.
 
Start an Elasticsearch instance and exposing the default port on the host:
docker run -it --name elasticsearch -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" -p 9200:9200 docker.elastic.co/elasticsearch/elasticsearch:5.6.3Important: Starting with Elasticsearch 5, virtual memory configuration of the system (and in our case the host) requires some configuration, particularly of the vm.max_map_count setting, see https://www.elastic.co/guide/en/elasticsearch/reference/5.0/vm-max-map-count.html
You can then explore the state of Elasticsearch, e.g.
- http://localhost:9200/
- http://localhost:9200/_nodes
- http://localhost:9200/_cat/health?v
- http://localhost:9200/_cat/indices?v
Start finder (potentially adjust Elasticsearch container's IP, see docker inspect elasticsearch)
npm install
DEBUG=finder FINDER_ELASTICSEARCH=localhost:9200 npm start;You can set DEBUG=* to see MongoDB oplog messages.
Now check out the transferred documents:
- http://localhost:9200/o2r
- http://localhost:9200/o2r/compendia/_search?q=*&pretty
- http://localhost:9200/o2r/compendia/57b2eabfa0cd335b5d1192cc (use an ID from before)
- Looking at this response, you can also see the _versionfield, which is increased every time you restart finder (and full batch processing takes place) or a document is changed.
 
- Looking at this response, you can also see the 
Delete the index with
curl -XDELETE 'http://172.17.0.3:9200/o2r/'If you run the web service proxy from the project o2r-platform, you can run queries directly at the o2r API:
http://localhost/api/v1/search?q=*
The following code assumes the Docker host is available under IP 172.17.0.1 within the container.
 docker run -it -e DEBUG=finder -e FINDER_MONGODB=mongodb://172.17.0.1 -e ELASTIC_SEARCH_URL=http://172.17.0.1:9200 -p 8084:8084 finderRequired are running instances of Elasticsearch, MongoDB and the o2r-finder as described above.
To run the included tests, execute
npm testo2r-informer is licensed under Apache License, Version 2.0, see file LICENSE.
Copyright (C) 2017 - o2r project.