Skip to content

Commit eb071de

Browse files
committed
Merge pull request #4 from Wordseer/update-to-coreNLP-3.5.2
Update to coreNLP 3.5.2
2 parents 754c062 + 6030814 commit eb071de

File tree

4 files changed

+16
-22
lines changed

4 files changed

+16
-22
lines changed

README.md

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
# A Python wrapper for the Java Stanford Core NLP tools
22

3-
This is a fork of Dustin Smith's [stanford-corenlp-python](https://github.com/dasmith/stanford-corenlp-python), a Python interface to [Stanford CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml). It can either use as python package, or run as a JSON-RPC server.
3+
This is a Wordseer-specific fork of Dustin Smith's [stanford-corenlp-python](https://github.com/dasmith/stanford-corenlp-python), a Python interface to [Stanford CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml). It can either use as python package, or run as a JSON-RPC server.
44

55
## Edited
6+
* Tested only with the current annotator configuration: not a general-purpose wrapper
7+
* Update to Stanford CoreNLP v3.5.2
68
* Added multi-threaded load balancing
7-
* Update to Stanford CoreNLP v3.2.0
89
* Fix many bugs & improve performance
910
* Using jsonrpclib for stability and performance
1011
* Can edit the constants as argument such as Stanford Core NLP directory
@@ -21,15 +22,6 @@ This is a fork of Dustin Smith's [stanford-corenlp-python](https://github.com/da
2122

2223
To use this program you must [download](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpack the zip file containing Stanford's CoreNLP package. By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.
2324

24-
25-
In other words:
26-
27-
sudo pip install pexpect unidecode jsonrpclib # jsonrpclib is optional
28-
git clone https://bitbucket.org/torotoki/corenlp-python.git
29-
cd corenlp-python
30-
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2013-06-20.zip
31-
unzip stanford-corenlp-full-2013-06-20.zip
32-
3325
Then, to launch a server:
3426

3527
python corenlp/corenlp.py
@@ -164,4 +156,5 @@ The function uses XML output feature of Stanford CoreNLP, and you can take all i
164156
* Robert Elwell [[email protected]]
165157
* Tristan Chong [[email protected]]
166158
* Aditi Muralidharan [[email protected]]
159+
* Ian MacFarland [[email protected]]
167160

corenlp/corenlp.py

100755100644
Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,8 @@ def parse_parser_results(text):
153153
"""
154154
results = {"sentences": []}
155155
state = STATE_START
156-
for line in unidecode(text.decode('utf-8')).split("\n"):
156+
lines = unidecode(text.decode('utf-8')).split("\n")
157+
for index, line in enumerate(lines):
157158
line = line.strip()
158159

159160
if line.startswith("Sentence #"):
@@ -170,15 +171,11 @@ def parse_parser_results(text):
170171
raise ParserError('Parse error. Could not find "[Text=" in: %s' % line)
171172
for s in WORD_PATTERN.findall(line):
172173
sentence['words'].append(parse_bracketed(s))
173-
state = STATE_TREE
174-
175-
elif state == STATE_TREE:
176-
if len(line) == 0:
174+
if not lines[index + 1].startswith("[Text="):
177175
state = STATE_DEPENDENCY
178-
sentence['parsetree'] = " ".join(sentence['parsetree'])
179-
else:
180-
sentence['parsetree'].append(remove_escapes(line))
176+
# skipping TREE because the new depparse annotator doesn't make a parse tree
181177

178+
182179
elif state == STATE_DEPENDENCY:
183180
if len(line) == 0:
184181
state = STATE_COREFERENCE

corenlp/default.properties

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
annotators = tokenize, ssplit, pos, lemma, parse
1+
annotators = tokenize, ssplit, pos, lemma, depparse
2+
3+
# specify Stanford Dependencies format for backwards compatibility
4+
# (new default is Universal Dependencies in 3.5.2)
5+
depparse.model = edu/stanford/nlp/models/parser/nndep/english_SD.gz
26

37
# A true-casing annotator is also available (see below)
48
#annotators = tokenize, ssplit, pos, lemma, truecase

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
PACKAGE = "corenlp"
55
NAME = "stanford-corenlp-python"
66
DESCRIPTION = "A Stanford Core NLP wrapper (wordseer fork)"
7-
AUTHOR = "Hiroyoshi Komatsu, Dustin Smith, Aditi Muralidharan"
7+
AUTHOR = "Hiroyoshi Komatsu, Dustin Smith, Aditi Muralidharan, Ian MacFarland"
88
AUTHOR_EMAIL = "[email protected]"
99
URL = "https://github.com/Wordseer/stanford-corenlp-python"
10-
VERSION = "3.3.9"
10+
VERSION = "3.3.10"
1111
INSTALLATION_REQS = ["unidecode >= 0.04.12", "xmltodict >= 0.4.6"]
1212

1313
PEXPECT = "pexpect >= 2.4"

0 commit comments

Comments
 (0)