Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
effbf56
Test in Python 3.6
ajnelson-nist Apr 12, 2021
02e37ac
Fix timestamp test in Python 3.6
ajnelson-nist Apr 12, 2021
e59ea52
For Github Action CI, pass action's Python 3 in test call
ajnelson-nist Apr 12, 2021
c69d0fc
Merge pull request #1 from casework/feature/test_in_python_3_6
asovern-mitre Apr 14, 2021
b24dd96
Ensure XML Schema types are bound to xsd prefix
ajnelson-nist Apr 22, 2021
fbc542f
Regenerate sample file
ajnelson-nist Apr 22, 2021
88d8822
Merge pull request #2 from casework/import_compactpy_fixes
asovern-mitre Apr 23, 2021
f7d04ee
Recognize .jsonld extension
ajnelson-nist May 4, 2021
6ea9f57
Update tests to repeated expected .json behavior for .jsonld
ajnelson-nist May 4, 2021
01e1a97
Merge pull request #3 from casework/recognize_jsonld_extension
ajnelson-nist May 4, 2021
4436f49
Add tests for hexBinary
ajnelson-nist May 10, 2021
f530add
Merge pull request #4 from casework/AC-139
asovern-mitre May 10, 2021
cf31296
Update CASE-Examples-QC pointer to include CASE 0.4.0 updates
ajnelson-nist Jun 9, 2021
491aee6
Have clean recipe remove download flag files
ajnelson-nist Jun 9, 2021
034874c
Update rdf-toolkit pointer to use update to original download location
ajnelson-nist Jun 9, 2021
0e875c9
Remove xsd:long usage
ajnelson-nist Jun 9, 2021
4ec8086
Update demonstration output to show removed xsd:long
ajnelson-nist Jun 9, 2021
5e3f5d1
Bump implemented CASE version
ajnelson-nist Jun 9, 2021
b7ea257
Merge pull request #5 from casework/AC-183
ajnelson-nist Jun 9, 2021
37e3854
Add case_sparql_construct, case_sparql_select, and tests
ajnelson-nist Jun 11, 2021
8d3118e
Fix copy-paste error
ajnelson-nist Jun 11, 2021
1deba2f
Exclude pandas installation for Python < 3.7
ajnelson-nist Jun 11, 2021
c902db3
Merge pull request #6 from casework/AC-178
ajnelson-nist Jun 11, 2021
3dc1218
Use pip -e flag
ajnelson-nist Jun 11, 2021
8f26896
Merge pull request #7 from casework/develop_with_pip_editable
ajnelson-nist Jun 11, 2021
337ae08
Add JSON-LD output code path test for case_sparql_construct
ajnelson-nist Jun 11, 2021
09468f0
Merge pull request #8 from casework/AC-178
ajnelson-nist Jun 11, 2021
02369a2
Bump version
ajnelson-nist Jun 11, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8]
python-version: [ 3.6, 3.8 ]

steps:
- uses: actions/checkout@v2
Expand All @@ -38,4 +38,4 @@ jobs:
- name: Start from clean state
run: make clean
- name: Run tests
run: make check
run: make PYTHON3=python check
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@

SHELL := /bin/bash

PYTHON3 ?= $(shell which python3.9 2>/dev/null || which python3.8 2>/dev/null || which python3.7 2>/dev/null || which python3.6 2>/dev/null || which python3)

all:

.PHONY: \
Expand All @@ -38,13 +40,20 @@ all:
check: \
.git_submodule_init.done.log
$(MAKE) \
PYTHON3=$(PYTHON3) \
--directory tests \
check

clean:
@$(MAKE) \
--directory tests \
clean
@rm -f \
.git_submodule_init.done.log
@#Remove flag files that are normally set after deeper submodules and rdf-toolkit are downloaded.
@rm -f \
dependencies/CASE-Examples-QC/.git_submodule_init.done.log \
dependencies/CASE-Examples-QC/.lib.done.log

distclean: \
clean
Expand Down
34 changes: 32 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,36 @@ case_file --disable-hashes sample.txt.json sample.txt
```


### SPARQL executors

Two commands are provided to generate output from a SPARQL query and one or more input graphs. Input graphs can be any graph, such as instance data or supplementary ontology files that supply custom class definitions or other external ontologies.


#### `case_sparql_construct`

To use a SPARQL `CONSTRUCT` query to make a supplementary graph file from one or more input graphs:

```bash
case_sparql_construct output.json input.sparql input.json [input-2.json ...]
```


#### `case_sparql_select`

To use a SPARQL `SELECT` query to make a table from one or more input graphs:

```bash
# HTML output with Bootstrap classes
# (e.g. for Jekyll-backed websites)
case_sparql_select output.html input.sparql input.json [input-2.json ...]

# Markdown, Github-flavored
case_sparql_select output.md input.sparql input.json [input-2.json ...]
```

Note that `case_sparql_select` is not guaranteed to function with Pythons below version 3.7.


### `local_uuid`

This [module](case_utils/local_uuid.py) provides a wrapper UUID generator, `local_uuid()`. Its main purpose is making example data generate consistent identifiers, and intentionally includes mechanisms to make it difficult to activate this mode without awareness of the caller.
Expand All @@ -58,8 +88,8 @@ This project follows [SEMVER 2.0.0](https://semver.org/) where versions are decl

This repository supports the ontology versions that are linked as submodules in the [CASE Examples QC](https://github.com/ajnelson-nist/CASE-Examples-QC) repository. Currently, the ontology versions are:

* CASE - 0.3.0
* UCO - 0.5.0
* CASE - 0.4.0
* UCO - 0.6.0


## Repository locations
Expand Down
4 changes: 3 additions & 1 deletion case_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
#
# We would appreciate acknowledgement if the software is used.

__version__ = "0.1.0"
__version__ = "0.2.0"

import rdflib.util

Expand All @@ -37,6 +37,8 @@ def guess_format(fpath, fmap=None):
updated_fmap = {key:rdflib.util.SUFFIX_FORMAT_MAP[key] for key in rdflib.util.SUFFIX_FORMAT_MAP}
if not "json" in updated_fmap:
updated_fmap["json"] = "json-ld"
if not "jsonld" in updated_fmap:
updated_fmap["jsonld"] = "json-ld"
else:
updated_fmap = {k:fmap[k] for k in fmap}

Expand Down
4 changes: 2 additions & 2 deletions case_utils/case_file/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def create_file_node(graph, filepath, node_iri=None, node_prefix=DEFAULT_PREFIX,
graph.add((
n_file_facet,
NS_UCO_OBSERVABLE.sizeInBytes,
rdflib.Literal(file_stat.st_size, datatype=NS_XSD.long)
rdflib.Literal(int(file_stat.st_size))
))
graph.add((
n_file,
Expand Down Expand Up @@ -174,7 +174,7 @@ def create_file_node(graph, filepath, node_iri=None, node_prefix=DEFAULT_PREFIX,
graph.add((
n_contentdata_facet,
NS_UCO_OBSERVABLE.sizeInBytes,
rdflib.Literal(successful_hashdict["filesize"], datatype=NS_XSD.long)
rdflib.Literal(successful_hashdict["filesize"])
))

# Add confirmed hashes into graph.
Expand Down
87 changes: 87 additions & 0 deletions case_utils/case_sparql_construct/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/usr/bin/env python3

# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
#
# We would appreciate acknowledgement if the software is used.

"""
This script executes a SPARQL CONSTRUCT query, returning a graph of the generated triples.
"""

__version__ = "0.1.0"

import argparse
import os
import logging

import rdflib.plugins.sparql

import case_utils

_logger = logging.getLogger(os.path.basename(__file__))

def main():
parser = argparse.ArgumentParser()
parser.add_argument("-d", "--debug", action="store_true")
parser.add_argument("--disallow-empty-results", action="store_true", help="Raise error if no results are returned for query.")
parser.add_argument("--output-format", help="Override extension-based format guesser.")
parser.add_argument("out_graph")
parser.add_argument("in_sparql")
parser.add_argument("in_graph", nargs="+")
args = parser.parse_args()

logging.basicConfig(level=logging.DEBUG if args.debug else logging.INFO)

in_graph = rdflib.Graph()
for in_graph_filename in args.in_graph:
in_graph.parse(in_graph_filename, format=case_utils.guess_format(in_graph_filename))
_logger.debug("len(in_graph) = %d.", len(in_graph))

out_graph = rdflib.Graph()

# Inherit prefixes defined in input context dictionary.
nsdict = {k:v for (k,v) in in_graph.namespace_manager.namespaces()}
for prefix in sorted(nsdict.keys()):
out_graph.bind(prefix, nsdict[prefix])

_logger.debug("Running query in %r." % args.in_sparql)
construct_query_text = None
with open(args.in_sparql, "r") as in_fh:
construct_query_text = in_fh.read().strip()
assert not construct_query_text is None

construct_query_object = rdflib.plugins.sparql.prepareQuery(construct_query_text, initNs=nsdict)

# https://rdfextras.readthedocs.io/en/latest/working_with.html
construct_query_result = in_graph.query(construct_query_object)
_logger.debug("type(construct_query_result) = %r." % type(construct_query_result))
_logger.debug("len(construct_query_result) = %d." % len(construct_query_result))
for (row_no, row) in enumerate(construct_query_result):
if row_no == 0:
_logger.debug("row[0] = %r." % (row,))
out_graph.add(row)

output_format = None
if args.output_format is None:
output_format = case_utils.guess_format(args.out_graph)
else:
output_format = args.output_format

serialize_kwargs = {
"format": output_format
}
if output_format == "json-ld":
context_dictionary = {k:v for (k,v) in out_graph.namespace_manager.namespaces()}
serialize_kwargs["context"] = context_dictionary

out_graph.serialize(args.out_graph, **serialize_kwargs)

if __name__ == "__main__":
main()
116 changes: 116 additions & 0 deletions case_utils/case_sparql_select/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
#!/usr/bin/env python3

# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
#
# We would appreciate acknowledgement if the software is used.

"""
This script executes a SPARQL SELECT query, returning a table representation. The design of the workflow is based on this example built on SPARQLWrapper:
https://lawlesst.github.io/notebook/sparql-dataframe.html

Note that this assumes a limited syntax style in the outer SELECT clause of the query - only named variables, no aggregations, and a single space character separating all variable names. E.g.:

SELECT ?x ?y ?z
WHERE
{ ... }

The word "DISTINCT" will also be cut from the query, if present.

Should a more complex query be necessary, an outer, wrapping SELECT query would let this script continue to function.
"""

__version__ = "0.3.0"

import argparse
import binascii
import os
import logging

import pandas as pd
import rdflib.plugins.sparql

import case_utils

NS_XSD = rdflib.XSD

_logger = logging.getLogger(os.path.basename(__file__))

def main():
parser = argparse.ArgumentParser()
parser.add_argument("-d", "--debug", action="store_true")
parser.add_argument("--disallow-empty-results", action="store_true", help="Raise error if no results are returned for query.")
parser.add_argument("out_table", help="Expected extensions are .html for HTML tables or .md for Markdown tables.")
parser.add_argument("in_sparql")
parser.add_argument("in_graph", nargs="+")
args = parser.parse_args()

logging.basicConfig(level=logging.DEBUG if args.debug else logging.INFO)

graph = rdflib.Graph()
for in_graph_filename in args.in_graph:
graph.parse(in_graph_filename, format=case_utils.guess_format(in_graph_filename))

# Inherit prefixes defined in input context dictionary.
nsdict = {k:v for (k,v) in graph.namespace_manager.namespaces()}

select_query_text = None
with open(args.in_sparql, "r") as in_fh:
select_query_text = in_fh.read().strip()
_logger.debug("select_query_text = %r." % select_query_text)

# Build columns list from SELECT line.
select_query_text_lines = select_query_text.split("\n")
select_line = [line for line in select_query_text_lines if line.startswith("SELECT ")][0]
variables = select_line.replace(" DISTINCT", "").replace("SELECT ", "").split(" ")

tally = 0
records = []
select_query_object = rdflib.plugins.sparql.prepareQuery(select_query_text, initNs=nsdict)
for (row_no, row) in enumerate(graph.query(select_query_object)):
tally = row_no + 1
record = []
for (column_no, column) in enumerate(row):
if column is None:
column_value = ""
elif isinstance(column, rdflib.term.Literal) and column.datatype == NS_XSD.hexBinary:
# Use hexlify to convert xsd:hexBinary to ASCII.
# The render to ASCII is in support of this script rendering results for website viewing.
# .decode() is because hexlify returns bytes.
column_value = binascii.hexlify(column.toPython()).decode()
else:
column_value = column.toPython()
if row_no == 0:
_logger.debug("row[0]column[%d] = %r." % (column_no, column_value))
record.append(column_value)
records.append(record)
if tally == 0:
if args.disallow_empty_results:
raise ValueError("Failed to return any results.")

df = pd.DataFrame(records, columns=variables)

table_text = None
if args.out_table.endswith(".html"):
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_html.html
# Add CSS classes for CASE website Bootstrap support.
table_text = df.to_html(classes=("table", "table-bordered", "table-condensed"))
elif args.out_table.endswith(".md"):
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_markdown.html
# https://pypi.org/project/tabulate/
# Assume Github-flavored Markdown.
table_text = df.to_markdown(tablefmt="github")
if table_text is None:
raise NotImplementedError("Unsupported output extension for output filename %r.", args.out_table)

with open(args.out_table, "w") as out_fh:
out_fh.write(table_text)

if __name__ == "__main__":
main()
2 changes: 1 addition & 1 deletion dependencies/CASE-Examples-QC
Submodule CASE-Examples-QC updated 138 files
6 changes: 6 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,18 @@ classifiers =
# TODO The constraint on pyparsing can be removed when rdflib Issue #1190 is resolved.
# https://github.com/RDFLib/rdflib/issues/1190
install_requires =
# Note that numpy (pandas dependency) is only supported in Python >= 3.7.
pandas;python_version>='3.7'
pyparsing < 3.0.0
rdflib-jsonld
requests
tabulate
packages = find:
python_requires = >=3.6

[options.entry_points]
console_scripts =
case_file = case_utils.case_file:main
case_sparql_construct = case_utils.case_sparql_construct:main
# Note that numpy (pandas dependency, and pandas is dependency of case_sparql_select) is only supported in Python >= 3.7.
case_sparql_select = case_utils.case_sparql_select:main
Loading