This repository contains simple and actual implementations of various machine learning algorithms from scratch. The goal is to provide clear, understandable code and explanations for these fundamental algorithms, helping learners better understand how they work internally.
The repository will be updated with the implementation and explanation of all the algorithms regularly. Kindly do take a look and feel free to address any issues or modifications.
Linear regression is a fundamental algorithm in machine learning, used for predicting continuous values based on the linear relationship between input variables and the target variable. In this implementation, we calculate the parameters (slope and intercept) using the least squares method.
- File:
LinearRegression.ipynb
- Key Concepts: Approximation, Gradient Descent, Least Squares, Model Fitting, etc...
- Results:
Logistic regression is used for binary classification tasks. It models the probability of an event occurring based on input features. This implementation uses the sigmoid function to output probabilities.
- File:
LogisticRegression.ipynb
- Key Concepts: Sigmoid Function, Cross-Entropy Loss, Gradient Descent, ROC curve & AUC, etc...
- Results:
Naive Bayes is a probabilistic classifier based on Bayes’ Theorem, which assumes that features are independent given the class. It is especially useful for text classification and spam filtering.
- File:
NaiveBayes.ipynb
- Key Concepts: Conditional Probability, Bayes' Theorem, Gaussian Naive Bayes
- Results:
KNN is a simple, instance-based learning algorithm where the class of a data point is determined by the majority class of its neighbors. The algorithm is lazy, meaning it doesn’t learn a model but makes decisions based on the proximity to labeled points.
- File:
KNN.ipynb
- Key Concepts: Euclidean Distance, Majority Voting, Lazy Learning
- Results:
The perceptron is a linear classifier and the foundation of neural networks. It updates its weights through an iterative process to classify linearly separable data.
- File:
Perceptron.ipynb
- Key Concepts: Linear Classifier, Weight Updates, Gradient Descent
SVM is a powerful classifier that finds the hyperplane that best separates data into classes. It aims to maximize the margin between classes, and uses kernel tricks for non-linear classification.
- File:
SupportVectorMachine.ipynb
- Key Concepts: Hyperplanes, Margin Maximization, Kernel Trick
A decision tree is a supervised learning algorithm that splits the data into subsets based on feature values to create a model that predicts the target variable. It builds a tree-like structure with decision nodes.
- File:
DecisionTree.ipynb
- Key Concepts: Entropy, Gini Index, Recursive Partitioning
Random Forest is an ensemble learning method that constructs a collection of decision trees and outputs the majority class prediction (classification) or average prediction (regression).
- File:
RandomForest.ipynb
- Key Concepts: Bagging, Feature Randomness, Ensemble Learning
PCA is a dimensionality reduction technique that transforms data into a set of orthogonal components, reducing the number of variables while retaining the maximum variance.
- File:
PrincipalComponentAnalysis.ipynb
- Key Concepts: Eigenvectors, Eigenvalues, Singular Value Decomposition (SVD)
K-Means is an unsupervised learning algorithm that clusters data points into a predefined number of clusters based on their similarity, measured by Euclidean distance.
- File:
K-Means-Clustering.ipynb
- Key Concepts: Centroids, Euclidean Distance, Iterative Optimization
AdaBoost is an ensemble method that combines multiple weak classifiers to create a strong classifier by giving more weight to misclassified instances in subsequent rounds.
- File:
AdaBoost.ipynb
- Key Concepts: Weak Learners, Weighted Voting, Iterative Training
LDA is a dimensionality reduction technique used to find the linear combination of features that best separates two or more classes.
- File:
LinearDiscriminantAnalysis.ipynb
- Key Concepts: Between-Class Variance, Within-Class Variance, Eigenvectors
To run this notebook, ensure the following prerequisites are installed:
- Scikit-Learn
- NumPy
- Matplotlib
Install the required dependencies with:
pip install -r requirements.txt
- Clone the repository:
git clone https://github.com/THAMIZH-ARASU/Implementing-ML-Algorithms-from-Scratch.git
- Open the notebook:
jupyter notebook LinearRegression.ipynb
This project is licensed under the MIT License. See the LICENSE
file for details.