Built an unsupervised Machine Learning pipeline to detect anomalies in Bitcoin transactions by selecting 19 key features from 700. Used PCA, t-SNE for dimensionality reduction, Isolation Forest for anomaly detection, and K-Means/DBSCAN for clustering. Applied Hampel filter for noise correction and evaluated performance using Random Forest-derived silhouette scores.
- Unsupervised Learning: No labeled data required.
 - Dimensionality Reduction: Visualization and structure discovery.
 - Clustering & Isolation: Identify anomalous transactions.
 - Feature Analysis: Understand key drivers of anomalies.
 
- Python 3.x
 - NumPy / Pandas
 - Scikit-learn
 - Matplotlib / Seaborn
 - t-SNE / PCA
 - Isolation Forest / DBSCAN / K-Means
 - Hampel Filter for outlier preprocessing
 
- Transaction data is cleaned and normalized.
 - Hampel filter is applied to remove extreme outliers and reduce noise.
 
- PCA is used to reduce feature space while retaining variance.
 - t-SNE helps in visualizing complex, high-dimensional patterns.
 
- K-Means Clustering for identifying common behavior groups.
 - DBSCAN for density-based anomaly detection and noise separation.
 - Silhouette Score is used to evaluate cluster quality.
 
- Isolation Forest detects anomalous transactions by isolating rare patterns.
 
- A Random Forest model ranks the most influential features post-clustering to help interpret anomaly causes (e.g., transaction value, frequency, mining difficulty, sentiment metrics).