Blog post about this analysis can be found here
Founded in 2008, Stack Overflow has evolved into a vital resource for developers worldwide, providing a platform for learning, knowledge sharing, collaboration, and career development. Every year, Stack Overflow conducts the largest global Developer Survey, collecting insights from over thousands of developers. This survey data, openly available, forms a valuable resource for in-depth data analysis, allowing us to explore real-world questions and challenges.
In this project, we focus on analyzing the 2022 Stack Overflow Developer Survey dataset. The 2022 survey gathered responses from over 70,000 developers, shedding light on how developers learn, the tools they use, and their preferences and demographics.
You can access the survey data in CSV format for each annual developer survey conducted since 2011 from the following link:
Stack Overflow Developer Surveys
Additional insights provided by Stack Overflow for the 2022 survey can be found here:
Stack Overflow Survey 2022 Insights
In this data science project, we aim to address several research questions:
-
What additional responsibilities do Data Scientists commonly take on in their current positions?
-
Which programming languages are most frequently utilised by Data Scientists?
-
Which programming languages do Data Scientist want to work with over the next year?
-
Does holding a higher degree correlate with earning a higher salary?
-
Is there a gender-based salary disparity among Data Scientists, with male Data Scientists earning higher salaries than their female counterparts?
-
Clone this repository to your local machine:
git clone https://github.com/your-username/stackoverflow-survey-analysis.git
-
Navigate to the project directory:
cd stackoverflow-survey-analysis -
Explore the Jupyter notebook
stackoverflow-survey-analysis.ipynbto follow the analysis process. -
Review the final reports and visualizations in the "Reports" directory for the project's key findings.
Python 3.11.5
To run the Jupyter notebooks and scripts in this project, you may need the following Python libraries:
- pandas
- matplotlib
- seaborn
- numpy
You can install these dependencies using pip:
pip install pandas matplotlib seaborn numpy