Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/mllib-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,5 @@ depends on native Fortran routines. You may need to install the
if it is not already present on your nodes. MLlib will throw a linking error if it cannot
detect these libraries automatically.

To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.7 or newer
and Python 2.7.
To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.7 or newer.

2 changes: 1 addition & 1 deletion docs/python-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ Many of the methods also contain [doctests](http://docs.python.org/2/library/doc
# Libraries

[MLlib](mllib-guide.html) is also available in PySpark. To use it, you'll need
[NumPy](http://www.numpy.org) version 1.7 or newer, and Python 2.7. The [MLlib guide](mllib-guide.html) contains
[NumPy](http://www.numpy.org) version 1.7 or newer. The [MLlib guide](mllib-guide.html) contains
some example applications.

# Where to Go from Here
Expand Down
6 changes: 1 addition & 5 deletions python/pyspark/mllib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,7 @@
Python bindings for MLlib.
"""

# MLlib currently needs Python 2.7+ and NumPy 1.7+, so complain if lower

import sys
if sys.version_info[0:2] < (2, 7):
raise Exception("MLlib requires Python 2.7+")
# MLlib currently needs and NumPy 1.7+, so complain if lower

import numpy
if numpy.version.version < '1.7':
Expand Down
11 changes: 10 additions & 1 deletion python/pyspark/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
from itertools import chain, izip, product
import marshal
import struct
import sys
from pyspark import cloudpickle


Expand Down Expand Up @@ -113,6 +114,11 @@ class FramedSerializer(Serializer):
where C{length} is a 32-bit integer and data is C{length} bytes.
"""

def __init__(self):
# On Python 2.6, we can't write bytearrays to streams, so we need to convert them
# to strings first. Check if the version number is that old.
self._only_write_strings = sys.version_info[0:2] <= (2, 6)

def dump_stream(self, iterator, stream):
for obj in iterator:
self._write_with_length(obj, stream)
Expand All @@ -127,7 +133,10 @@ def load_stream(self, stream):
def _write_with_length(self, obj, stream):
serialized = self.dumps(obj)
write_int(len(serialized), stream)
stream.write(serialized)
if self._only_write_strings:
stream.write(str(serialized))
else:
stream.write(serialized)

def _read_with_length(self, stream):
length = read_int(stream)
Expand Down