-
Notifications
You must be signed in to change notification settings - Fork 90
Description
I am using a 2021 iMac with the Apple M1 chip and macOS Monterey 12.4.
So far to set up PySpark I have pip3 installed pyspark, plus cloned this repo and installed from the requirements.txt file, plus downloaded Java from their homepage. I'm using Python 3.8.9.
I added the path to the pip3 installation of pyspark to SPARK_HOME in my .zshrc and sourced it:
% echo $SPARK_HOME
/Users/julius/Library/Python/3.8/lib/python/site-packages/pyspark
I then executed the following command:
$SPARK_HOME/bin/spark-submit ./server_count.py \
--num_output_partitions 1 --log_level WARN \
./input/test_warc.txt servernames
I had to execute this from inside the cc-pyspark repo, otherwise the script could not find the program server_count.py.
It returns this error message:
julius@Juliuss-iMac cc-pyspark % $SPARK_HOME/bin/spark-submit ./server_count.py \
--num_output_partitions 1 --log_level WARN \
./input/test_warc.txt servernames
Traceback (most recent call last):
File "/Users/julius/cc-pyspark/server_count.py", line 1, in <module>
import ujson as json
ImportError: dlopen(/Users/julius/Library/Python/3.8/lib/python/site-packages/ujson.cpython-38-darwin.so, 0x0002): tried: '/Users/julius/Library/Python/3.8/lib/python/site-packages/ujson.cpython-38-darwin.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'))
22/07/06 15:04:13 INFO ShutdownHookManager: Shutdown hook called
22/07/06 15:04:13 INFO ShutdownHookManager: Deleting directory /private/var/folders/xv/yzpjb77s2qg14px8dc7g4m_80000gn/T/spark-80c476e9-b5ba-4710-b292-e367dd387ece
There's something wrong with my installation of "ujson", it is for arm, but PySpark is designed for x86? Is that correct?
What is the simplest way to fix this issue? Should I try to run PySpark in some kind of x86 emulation like Rosetta? Has PySpark not been designed for the M1 Chip?
Is there a chance this is the fault of my Java installation? I took the first one offered; it seemed to say x86, but when I tested running PySpark on its own, it seemed to work fine.
Thanks very much