Neural Network Language Model: Brain-Inspired AI

An Open-Source AI System Inspired by the Human Brain

This project implements a neural network-based language model that mimics aspects of human brain learning. Like the human brain, our model learns from interactions, refines itself with each new experience, and builds increasingly sophisticated representations of language.

Core Philosophy

Brain-Inspired Learning: Our neural network architecture draws inspiration from the human brain's ability to learn from context and adapt over time.
Continuous Improvement: The system refines itself with each interaction, gradually improving its understanding and predictions.
Open Source: We believe in the democratization of AI. This project is fully open-source, allowing anyone to use, modify, and contribute to its development.
Transparency: The inner workings of the model are accessible and understandable, unlike black-box commercial AI systems.

Key Features

Adaptive Learning: The model learns from each interaction, continuously refining its internal representations.
Context-Aware Predictions: Uses previous words as context to predict the next word, similar to how humans anticipate language.
Self-Attention Mechanism: Implements attention mechanisms inspired by how the human brain focuses on relevant information.
Interactive UI: A user-friendly interface for training, testing, and interacting with the model.
Real-time Visualization: Watch the learning process unfold with real-time loss graphs and metrics.

How It Works

Neural Architecture: The system uses a perceptron-based neural network with configurable hidden layers.
Word Embeddings: Words are converted into numerical vectors that capture semantic relationships.
Context Processing: The model analyzes sequences of words to understand context.
Adaptive Learning: With each interaction, the model adjusts its internal weights through backpropagation.
Self-Refinement: The system continuously improves its predictions based on feedback and new data.

UI Applications

The project includes several enhanced UI applications:

Complete MLP UI (complete_mlp_ui.py): The most comprehensive UI with all features:
- Training with visualization of training vs validation loss
- Next word prediction with probability display
- Text generation with temperature control
- Model saving and loading
Basic MLP UI (basic_mlp_ui.py): A simpler UI with basic functionality
- Training with progress tracking
- Model saving and loading
Standard MLP UI (run_standard_mlp.py): A launcher for the standard MLP UI

Running the UI Applications

# Activate the virtual environment
source mlp_env/bin/activate

# Run the complete UI with all features
python complete_mlp_ui.py

# Or run the basic UI
python basic_mlp_ui.py

# Or run the standard MLP UI
python run_standard_mlp.py

Learning from Interactions

The true power of this system lies in its ability to learn from interactions:

Each Training Session: Improves the model's understanding of language patterns
Every Prediction: Helps refine the internal representations
User Feedback: Can be incorporated to guide the learning process
Continuous Evolution: The model never stops learning and improving

Brain-Inspired Architecture

Our neural network architecture is inspired by key aspects of the human brain:

Perceptron as Neuron

The perceptron, our basic building block, mimics the behavior of biological neurons:

Inputs: Like dendrites receiving signals
Weights: Like synaptic strengths
Activation: Like neuron firing threshold
Output: Like axon transmitting signals

Multi-Layer Networks for Complex Representations

Similar to how the brain builds hierarchical representations:

Input Layer: Initial sensory processing
Hidden Layers: Abstract feature extraction (like association areas in the brain)
Output Layer: Decision making and prediction

Attention Mechanisms

Inspired by how the human brain selectively focuses on relevant information:

Self-Attention: Weighs the importance of different parts of the input
Multi-Head Attention: Processes information from multiple perspectives simultaneously

Advanced Tokenization

The system uses advanced tokenization algorithms that mimic how humans break down unfamiliar words:

Byte Pair Encoding (BPE)

Learns common subword patterns
Breaks down unknown words into familiar components
Similar to how humans recognize morphemes and word parts

WordPiece

Identifies meaningful word fragments
Handles compound words and complex vocabulary
Resembles human strategies for understanding new terminology

Project Structure

singleLayerPerceptron/
├── perceptron.py         # Main perceptron implementation
├── data_utils.py         # Utilities for data generation and visualization
├── tokenizers.py         # BPE and WordPiece tokenization implementations
├── embeddings.py         # Word embeddings implementation
├── simple_language_model.py # Simple language model implementation
├── multi_layer_perceptron.py # Multi-layer perceptron implementation
├── attention_perceptron.py # Attention-enhanced perceptron
├── complete_mlp_ui.py    # Comprehensive UI with all features
├── basic_mlp_ui.py       # Basic UI implementation
├── requirements.txt      # Required dependencies
└── README.md             # Project documentation

Future Directions

Enhanced Learning Algorithms: Implementing more sophisticated learning mechanisms
Multi-modal Learning: Extending beyond text to include images and other data types
Distributed Processing: Parallel processing inspired by the distributed nature of the brain
Memory Systems: Implementing short and long-term memory components
Emotional Intelligence: Adding affective computing capabilities

Contributing

We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or proposing new features, your help is appreciated.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Join us in building an AI system that truly learns and grows with each interaction, inspired by the remarkable capabilities of the human brain!

Single Layer Perceptron for Binary Classification and Language Modeling

This project implements a single layer perceptron for binary classification tasks and extends it to language modeling with advanced tokenization techniques. The perceptron is one of the simplest forms of artificial neural networks, consisting of a single neuron with adjustable weights and a bias. The project now includes advanced tokenization algorithms (Byte Pair Encoding and WordPiece) for handling out-of-vocabulary words in language modeling tasks.

New UI Applications

The project now includes several enhanced UI applications:

Complete MLP UI (complete_mlp_ui.py): The most comprehensive UI with all features:
- Training with visualization of training vs validation loss
- Next word prediction with probability display
- Text generation with temperature control
- Model saving and loading
Basic MLP UI (basic_mlp_ui.py): A simpler UI with basic functionality
- Training with progress tracking
- Model saving and loading
Standard MLP UI (run_standard_mlp.py): A launcher for the standard MLP UI

Running the New UI Applications

# Activate the virtual environment
source mlp_env/bin/activate

# Run the complete UI with all features
python complete_mlp_ui.py

# Or run the basic UI
python basic_mlp_ui.py

# Or run the standard MLP UI
python run_standard_mlp.py

UI Screenshots

Model Training

The training interface shows real-time visualization of training and validation loss:

Screenshot shows the training process with decreasing loss values, indicating the model is learning effectively.

Next Word Prediction

The prediction interface allows you to enter a context and see the most likely next words:

Screenshot shows prediction results for a sample context, with probabilities for each predicted word.

Text Generation

The text generation interface lets you generate coherent text from a starting context:

Screenshot shows generated text continuing from a user-provided context.

Model Architecture

The model architecture tab provides visualization of the neural network structure:

Screenshot shows the neural network layers and connections.

Project Structure

singleLayerPerceptron/
├── perceptron.py         # Main perceptron implementation
├── data_utils.py         # Utilities for data generation and visualization
├── tokenizers.py         # BPE and WordPiece tokenization implementations
├── embeddings.py         # Word embeddings implementation
├── simple_language_model.py # Simple language model implementation
├── multi_layer_perceptron.py # Multi-layer perceptron implementation
├── main.py               # Example script to run the perceptron
├── advanced_example.py   # Advanced examples showing perceptron limitations
├── real_world_example.py # Example using the Iris dataset
├── requirements.txt      # Required dependencies
├── setup.py              # Setup script for installation
├── install.sh            # Installation script
└── README.md             # This file

Architectural Overview

The single layer perceptron is a fundamental building block of neural networks. It works by:

Taking input features
Multiplying each by a weight
Summing the weighted inputs and adding a bias
Applying an activation function (in this case, a step function)

Perceptron Architecture

                    ┌───────────────────────────────────────┐
                    │           Single Perceptron           │
                    └───────────────────────────────────────┘
                                      │
                                      ▼
┌───────────┐       ┌───────┐       ┌───┐       ┌──────────┐       ┌───────────┐
│  Inputs   │──────▶│Weights│──────▶│Sum│──────▶│Activation│──────▶│  Output   │
│ x₁, x₂,... │       │w₁,w₂,...│       │   │       │ Function │       │    y     │
└───────────┘       └───────┘       └───┘       └──────────┘       └───────────┘
                        ▲             ▲
                        │             │
                    ┌───────┐     ┌───────┐
                    │       │     │       │
                    │Weight │     │ Bias  │
                    │Updates│     │       │
                    └───────┘     └───────┘

Learning Process

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Initialization │     │   Prediction    │     │ Weight Update   │
│                 │     │                 │     │                 │
│ - Random weights│────▶│ - Calculate net │────▶│ - Compare with  │
│ - Zero bias     │     │   input         │     │   actual label  │
└─────────────────┘     │ - Apply step    │     │ - Update weights│
                        │   function      │     │   and bias      │
                        └─────────────────┘     └─────────────────┘
                                                        │
                                                        │
                        ┌─────────────────┐             │
                        │  Convergence    │             │
                        │                 │◀────────────┘
                        │ - Check errors  │
                        │ - Stop if zero  │
                        │   or max epochs │
                        └─────────────────┘

Training Iteration

For each training iteration:

┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│ Input Sample  │     │  Calculate    │     │   Calculate   │     │  Update       │
│ (x₁, x₂, ...) │────▶│  Prediction   │────▶│    Error      │────▶│  Weights      │
│               │     │  ŷ            │     │  (y - ŷ)      │     │  and Bias     │
└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘

Mathematical Formulation

Net Input Calculation:
```
z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
```
where w₁, w₂, ..., wₙ are weights, x₁, x₂, ..., xₙ are input features, and b is the bias.
Activation Function (Step Function):
```
y = 1 if z ≥ 0
y = -1 if z < 0
```
Weight Update Rule:
```
wᵢ = wᵢ + η(y - ŷ)xᵢ
```
where η is the learning rate, y is the true label, and ŷ is the predicted label.
Bias Update Rule:
```
b = b + η(y - ŷ)
```

Usage

Installation

First, install the required dependencies:

# Using the installation script
./install.sh

# Or manually with pip
pip install -r requirements.txt

Basic Example

To run the basic example:

python3 main.py

This will:

Generate linearly separable data
Train a perceptron on this data
Visualize the decision boundary
Evaluate the model's performance

Advanced Example

To run the advanced example that demonstrates the perceptron's limitations:

python3 advanced_example.py

This will:

Train a perceptron on linearly separable data (should work well)
Train a perceptron on XOR data (will fail as it's not linearly separable)
Train a perceptron on moon-shaped data (will fail as it's not linearly separable)

The advanced example demonstrates the fundamental limitation of the single layer perceptron: it can only learn linearly separable patterns.

Real-world Example

To run the example using a real-world dataset (Iris):

python3 real_world_example.py

This example:

Loads the Iris dataset and converts it to a binary classification problem
Trains a perceptron to distinguish between Setosa and non-Setosa flowers
Visualizes the decision boundary
Evaluates the model's performance on training and test sets

Requirements

Python 3.6+
NumPy
Matplotlib
scikit-learn

Tokenization Algorithms

This project implements two advanced tokenization algorithms for handling out-of-vocabulary words in language modeling tasks:

Byte Pair Encoding (BPE)

BPE is a subword tokenization algorithm that iteratively merges the most frequent pairs of bytes or characters to form new tokens.

┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  Input Text   │     │  Character    │     │  Merge Most   │     │  Final        │
│               │────▶│  Tokenization │────▶│  Frequent     │────▶│  Vocabulary   │
│               │     │               │     │  Pairs        │     │               │
└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘

BPE Training Process:

Split words into individual characters
Count frequencies of adjacent character pairs
Merge the most frequent pair to create a new token
Repeat until vocabulary size is reached or frequency threshold is met

BPE Tokenization Process:

Start with characters of the word
Apply learned merges iteratively
If a word is unknown, it gets broken down into subword units

WordPiece

WordPiece is a subword tokenization algorithm used in models like BERT. It works by maximizing the language model likelihood of the training data.

┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  Input Text   │     │  Generate All │     │  Select Most  │     │  Final        │
│               │────▶│  Possible     │────▶│  Frequent     │────▶│  Vocabulary   │
│               │     │  Subwords     │     │  Subwords     │     │               │
└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘

WordPiece Training Process:

Initialize vocabulary with individual characters
Generate all possible subwords from the training corpus
Compute frequency of each subword
Add most frequent subwords to vocabulary until size limit is reached

WordPiece Tokenization Process:

Try to match the longest subword from the beginning of the word
If found, add to output and continue with remainder
If not found, back off to shorter subwords
Use special token (##) to mark non-initial subwords

Language Model Architecture

The language model extends the perceptron concept to predict the next word in a sequence based on context words:

┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  Tokenization │     │  Word Encoding│     │  Neural       │     │  Probability  │
│                │     │               │     │  Network      │     │  Distribution │
│ - BPE or      │────▶│ - Word        │────▶│ - Single or   │────▶│ - Softmax     │
│   WordPiece    │     │   Embeddings  │     │   Multi-layer │     │   function    │
└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘

Multi-Layer Perceptron Architecture

The multi-layer perceptron extends the single-layer model with hidden layers for more complex representations:

┌─────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────┐
│  Input   │     │  Hidden     │     │  Hidden     │     │  Output     │     │ Predicted│
│  Layer   │────▶│  Layer 1    │────▶│  Layer 2    │────▶│  Layer      │────▶│  Word    │
│          │     │             │     │  (Optional) │     │             │     │          │
└─────────┘     └─────────────┘     └─────────────┘     └─────────────┘     └─────────┘

Attention-Based Architecture

The attention-based model enhances the multi-layer perceptron with self-attention mechanisms for improved language modeling:

┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
│  Input   │     │  Word    │     │  Self-   │     │  Hidden  │     │  Output  │
│ Context  │────▶│Embeddings│────▶│ Attention│────▶│  Layers  │────▶│  Layer   │
│          │     │          │     │          │     │          │     │          │
└──────────┘     └──────────┘     └──────────┘     └──────────┘     └──────────┘

Key Components and Code Implementation

1. AttentionPerceptron Class

The AttentionPerceptron class extends MultiLayerPerceptron with attention mechanisms:

# From attention_perceptron.py
class AttentionPerceptron(MultiLayerPerceptron):
    """
    An extension of the MultiLayerPerceptron that incorporates self-attention mechanisms
    for improved language modeling capabilities.
    """
    
    def __init__(self, context_size=2, embedding_dim=50, hidden_layers=[64, 32], 
                 attention_dim=None, num_attention_heads=1, attention_dropout=0.1,
                 learning_rate=0.01, n_iterations=1000, random_state=42, 
                 tokenizer_type='wordpiece', vocab_size=10000, use_pretrained=False):
        # Initialize the parent class
        super().__init__(
            context_size=context_size,
            embedding_dim=embedding_dim,
            hidden_layers=hidden_layers,
            learning_rate=learning_rate,
            n_iterations=n_iterations,
            random_state=random_state,
            tokenizer_type=tokenizer_type,
            vocab_size=vocab_size,
            use_pretrained=use_pretrained
        )
        
        # Additional attention-specific parameters
        self.attention_dim = attention_dim if attention_dim is not None else embedding_dim
        self.num_attention_heads = num_attention_heads
        self.attention_dropout = attention_dropout
        self.attention_layer = None
        
        # Track attention weights for visualization and analysis
        self.attention_weights_history = []

2. WordEmbeddings Class

The WordEmbeddings class converts words to numerical vectors:

# From embeddings.py
def get_embedding(self, word: str) -> np.ndarray:
    """Get the embedding vector for a word."""
    # Check if word is in vocabulary
    if word in self.word_to_idx:
        idx = self.word_to_idx[word]
        # Check if embeddings dictionary exists and has this index
        if hasattr(self, 'embeddings') and self.embeddings and idx in self.embeddings:
            return self.embeddings[idx]
        else:
            # Generate random embedding if not found
            if not hasattr(self, 'embeddings') or not self.embeddings:
                self.embeddings = {}
            self.embeddings[idx] = np.random.randn(self.embedding_dim)
            return self.embeddings[idx]
    
    # Handle OOV words with various fallback mechanisms
    # If all else fails, return unknown token embedding
    unk_idx = self.special_tokens.get('<UNK>', 0)
    if hasattr(self, 'embeddings') and self.embeddings and unk_idx in self.embeddings:
        return self.embeddings[unk_idx]
    else:
        # Generate random embedding for unknown token
        if not hasattr(self, 'embeddings') or not self.embeddings:
            self.embeddings = {}
        self.embeddings[unk_idx] = np.random.randn(self.embedding_dim)
        return self.embeddings[unk_idx]

3. SelfAttention Class

The SelfAttention class implements the scaled dot-product attention mechanism:

# From self_attention.py
def forward(self, X, mask=None, training=False):
    """Forward pass through the self-attention mechanism."""
    batch_size, seq_length, input_dim = X.shape
    
    # Process each attention head
    all_head_outputs = []
    for h in range(self.num_heads):
        # Project inputs to query, key, value
        Q = np.dot(X, self.W_query[h]) + self.b_query[h]
        K = np.dot(X, self.W_key[h]) + self.b_key[h]
        V = np.dot(X, self.W_value[h]) + self.b_value[h]
        
        # Compute attention scores
        scores = np.matmul(Q, K.transpose(0, 2, 1))
        
        # Scale scores
        scores = scores / np.sqrt(self.head_dim)
        
        # Apply softmax to get attention weights
        attention_weights = self._softmax(scores)
        
        # Apply attention weights to values
        head_output = np.matmul(attention_weights, V)
        
        # Store for multi-head concatenation
        all_head_outputs.append(head_output)
        
        # Cache values for backpropagation
        self.cache[f'head_{h}'] = {
            'Q': Q, 'K': K, 'V': V,
            'scores': scores,
            'attention_weights': attention_weights
        }
    
    # Concatenate all head outputs
    concat_output = np.concatenate(all_head_outputs, axis=2)
    
    # Project to output dimension
    output = np.dot(concat_output, self.W_output) + self.b_output
    
    # Cache for backpropagation
    self.cache['concat_output'] = concat_output
    
    return output

Next Word Prediction Flow: Step-by-Step with Code

1. Input Processing

# From attention_perceptron.py - predict_next_word method
def predict_next_word(self, context):
    """Predict the next word given a context."""
    # Handle string input
    if isinstance(context, str):
        context = context.split()
    
    # Preprocess context
    context = [word.lower() for word in context]
    
    # Handle context length mismatch
    info = {"original_context": context.copy(), "adjusted_context": None, "adjustment_made": False}
    
    if len(context) < self.context_size:
        # If context is too short, pad with common words
        padding_needed = self.context_size - len(context)
        padding = ["the"] * padding_needed  # Use "the" as default padding
        context = padding + context
        info["adjusted_context"] = context
        info["adjustment_made"] = True
        info["adjustment_type"] = "padded_beginning"
        
    elif len(context) > self.context_size:
        # If context is too long, use the most recent words
        context = context[-self.context_size:]
        info["adjusted_context"] = context
        info["adjustment_made"] = True
        info["adjustment_type"] = "truncated_beginning"
    
    # Check if all words are in vocabulary
    unknown_words = []
    for i, word in enumerate(context):
        if word not in self.word_to_idx:
            unknown_words.append((i, word))
    
    # Handle unknown words
    if unknown_words:
        for idx, word in unknown_words:
            # Try to tokenize the unknown word if we have a tokenizer
            if self.tokenizer:
                # Tokenize the word
                subwords = self.tokenizer.tokenize(word)
                
                # If we got valid subwords, use the first one that's in our vocabulary
                found_replacement = False
                for subword in subwords:
                    if subword in self.word_to_idx:
                        context[idx] = subword
                        found_replacement = True
                        break
                
                # If no valid subwords found, use <UNK> token
                if not found_replacement:
                    unk_token = list(self.embeddings.special_tokens.keys())[0]  # <UNK> token
                    context[idx] = unk_token
            else:
                # If no tokenizer, use <UNK> token
                unk_token = list(self.embeddings.special_tokens.keys())[0]  # <UNK> token
                context[idx] = unk_token

2. Word Embedding

# From attention_perceptron.py - predict_next_word method (continued)
# Get embeddings for context words
context_embeddings = []
for word in context:
    word_idx = self.word_to_idx.get(word, self.embeddings.special_tokens['<UNK>'])
    
    # Get embedding with fallback mechanisms
    if hasattr(self.embeddings, 'embeddings') and self.embeddings.embeddings:
        if word_idx in self.embeddings.embeddings:
            embedding = self.embeddings.embeddings[word_idx]
        else:
            # Generate random embedding if not found
            embedding = np.random.randn(self.embedding_dim)
            self.embeddings.embeddings[word_idx] = embedding
    else:
        # Initialize embeddings dictionary if not available
        self.embeddings.embeddings = {}
        embedding = np.random.randn(self.embedding_dim)
        self.embeddings.embeddings[word_idx] = embedding
    
    context_embeddings.append(embedding)

# Convert to numpy array and add batch dimension
context_embeddings = np.array([context_embeddings])  # shape: (1, context_size, embedding_dim)

3. Self-Attention Mechanism

# From attention_perceptron.py - _forward_with_attention method
def _forward_with_attention(self, X):
    """Forward pass with attention mechanism."""
    # Initialize attention layer if not already done
    if self.attention_layer is None:
        self.attention_layer = SelfAttention(
            input_dim=self.embedding_dim,
            attention_dim=self.attention_dim,
            num_heads=self.num_attention_heads,
            dropout_rate=self.attention_dropout,
            random_state=self.random_state
        )
    
    # Apply self-attention to the sequence
    # X shape: (batch_size, context_size, embedding_dim)
    attention_output = self.attention_layer.forward(X)
    
    # Get attention weights from the cache
    # We'll use the weights from the first head for visualization
    attention_weights = self.attention_layer.cache['head_0']['attention_weights']
    
    # Flatten the attention output for the dense layers
    # Shape: (batch_size, context_size * attention_dim)
    batch_size = X.shape[0]
    flattened = attention_output.reshape(batch_size, -1)

4. Multi-Layer Perceptron

# From attention_perceptron.py - _forward_with_attention method (continued)
# Forward pass through dense layers
activations = [flattened]

# Hidden layers with ReLU activation
for i in range(len(self.weights) - 1):
    z = np.dot(activations[-1], self.weights[i]) + self.biases[i]
    a = self._relu(z)
    activations.append(a)

# Output layer with softmax activation
z_out = np.dot(activations[-1], self.weights[-1]) + self.biases[-1]
predictions = self._softmax(z_out)

return predictions, attention_weights

5. Prediction

# From attention_perceptron.py - predict_next_word method (continued)
# Forward pass with attention
y_pred, attention_weights = self._forward_with_attention(context_embeddings)

# Get the word with the highest probability
predicted_idx = np.argmax(y_pred[0])
predicted_word = self.idx_to_word[predicted_idx]

# Add prediction info
info["prediction"] = predicted_word
info["attention_weights"] = attention_weights[0].tolist()

return predicted_word, info

Multi-Word Prediction

# From attention_perceptron.py
def predict_next_n_words(self, initial_context, n=5):
    """Predict the next n words given an initial context."""
    # Handle string input
    if isinstance(initial_context, str):
        initial_context = initial_context.split()
    
    # Get the first prediction and info
    next_word, info = self.predict_next_word(initial_context)
    
    # Use the adjusted context from the info
    context = info["adjusted_context"] if info["adjustment_made"] else info["original_context"]
    
    # Predict n words
    predicted_words = [next_word]
    for i in range(1, n):
        # Update context - remove oldest word and add the predicted word
        context = context[1:] + [next_word]
        
        # Predict next word
        next_word, step_info = self.predict_next_word(context)
        predicted_words.append(next_word)
    
    return predicted_words

Visualization of Attention Weights

# From attention_perceptron.py
def plot_attention_weights(self, context=None, ax=None):
    """Plot attention weights for visualization."""
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 6))
    
    if context is not None:
        # Get attention weights for specific context
        if isinstance(context, str):
            context = context.split()
        
        # Ensure context is the right length
        if len(context) < self.context_size:
            padding_needed = self.context_size - len(context)
            padding = ["<PAD>"] * padding_needed
            context = padding + context
        elif len(context) > self.context_size:
            context = context[-self.context_size:]
        
        # Get embeddings and predict
        _, info = self.predict_next_word(context)
        attention_weights = np.array(info["attention_weights"])
        
        # Plot attention heatmap
        im = ax.imshow(attention_weights, cmap="YlOrRd")
        
        # Set labels
        ax.set_xticks(np.arange(len(context)))
        ax.set_yticks(np.arange(len(context)))
        ax.set_xticklabels(context)
        ax.set_yticklabels(context)
        
        # Add colorbar
        plt.colorbar(im, ax=ax)
        
        ax.set_title(f"Attention Weights for Context: '{' '.join(context)}'")
    
    return ax

Advantages Over Simple Perceptron

Contextual Understanding
- Simple perceptron treats all context words equally
- Attention model weighs words based on their relevance, as shown in the attention weights
Capturing Long-Range Dependencies
- Simple perceptron struggles with longer contexts
- Attention model can capture relationships regardless of distance through the attention mechanism
Prediction Quality
- Simple perceptron: "the cat sat on the cat sat on the cat..."
- Attention model: "the cat sat on the mat while the dog walked by"
Interpretability
- Attention weights show which words influence the prediction
- Can be visualized as a heat map for analysis using the plot_attention_weights method

Usage Example

# Load the model
model = AttentionPerceptron()
model.load("model_output/attention_model.pkl")

# Predict next word
context = ["the", "cat", "sat", "on"]
next_word, info = model.predict_next_word(context)
print(f"Predicted next word: {next_word}")

# Generate a sequence
sequence = model.predict_next_n_words(context, n=5)
print(f"Generated sequence: {' '.join(context + sequence)}")

# Visualize attention weights
model.plot_attention_weights(context)

Integration with Language Model

The tokenization algorithms integrate with the language model as follows:

┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  Raw Text     │     │  Tokenized    │     │  Neural       │     │  Predicted    │
│  Input        │────▶│  Subword      │────▶│  Network      │────▶│  Next Token   │
│               │     │  Units        │     │  Processing   │     │               │
└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘
                                │
                                ▼
                      ┌───────────────────┐
                      │ Out-of-Vocabulary │
                      │ Word Handling     │
                      └───────────────────┘

Benefits of Subword Tokenization:

Handles unknown words by breaking them into known subword units
Reduces vocabulary size while maintaining coverage
Captures morphological patterns in the language
Improves model performance on rare words and morphologically rich languages

Using Tokenizers with Language Models

# Import the tokenizer and language model
from tokenizers import BPETokenizer, WordPieceTokenizer
from multi_layer_perceptron import MultiLayerPerceptron

# Create and train a tokenizer
tokenizer = BPETokenizer(vocab_size=5000, min_frequency=2)
tokenizer.fit(training_text)

# Tokenize the training data
tokenized_text = tokenizer.tokenize(training_text)

# Create and train the language model
model = MultiLayerPerceptron(context_size=2, hidden_layers=[64, 32])
model.fit(tokenized_text)

# Generate text
generated_text = model.generate(context=['the', 'cat'], n_words=10)
print(generated_text)

Enhanced Word Embeddings

This project now uses advanced word embeddings with subword tokenization for representing words in the language model. Word embeddings are dense vector representations that capture semantic relationships between words, and subword tokenization helps handle out-of-vocabulary (OOV) words effectively.

Enhanced Embedding Architecture

┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  Tokenization │     │  Vocabulary   │     │  Embedding    │     │  Semantic     │
│               │     │  Mapping      │     │  Vectors      │     │  Space        │
│ BPE/WordPiece │────▶│ token → index │────▶│ index → vec   │────▶│ vec → meaning │
└───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘
        ▲                                           ▲
        │                                           │
┌───────────────┐                         ┌───────────────┐
│  OOV Words    │                         │  Pretrained   │
│  Handling     │                         │  Models       │
└───────────────┘                         └───────────────┘

Benefits of Enhanced Word Embeddings

Dimensionality Reduction: Instead of sparse one-hot vectors with vocabulary-size dimensions, embeddings use dense vectors with much lower dimensionality (typically 50-300).
Semantic Relationships: Words with similar meanings have similar vector representations, enabling the model to generalize better.
OOV Word Handling: Subword tokenization (BPE or WordPiece) breaks unknown words into known subword units, allowing the model to handle any word.
Improved Performance: Embeddings capture contextual information, leading to better language model performance.
Efficient Computation: Dense vectors require less memory and computational resources than sparse one-hot vectors.
Transfer Learning: Pre-trained embeddings from open-source models transfer knowledge from large-scale training.

Open-Source Embedding Models

The enhanced implementation supports multiple open-source embedding models (with graceful fallbacks if dependencies are not available):

Word2Vec: Google's word embeddings trained on Google News (300 dimensions)
GloVe: Stanford's Global Vectors for Word Representation
FastText: Facebook's embeddings with subword information
BERT/RoBERTa: Transformer-based contextual embeddings

Subword Tokenization

Two subword tokenization algorithms are implemented (with simple fallbacks if dependencies are not available):

Byte Pair Encoding (BPE): Iteratively merges the most frequent pairs of characters or subwords
WordPiece: Similar to BPE but uses a different merging strategy based on likelihood

Cross-Platform Compatibility

The implementation is designed to work across different platforms:

Core functionality works with just NumPy and standard libraries
Enhanced features are enabled when optional dependencies are available
Graceful fallbacks ensure the code runs even without all dependencies
Compatible with Apple Silicon (M1/M2/M3) and other architectures

Implementation Details

The enhanced embeddings.py module provides a WordEmbeddings class that:

Loads embeddings from open-source models instead of downloading raw files
Initializes subword tokenizers (BPE or WordPiece) for handling OOV words
Handles special tokens like <UNK>, <PAD>, <BOS>, and <EOS>
Provides methods to retrieve and update embeddings during training
Supports finding semantically similar words using cosine similarity
Generates embeddings for OOV words by combining subword embeddings
Integrates with transformer models for contextual embeddings

Usage Example

from embeddings import WordEmbeddings

# Initialize with Word2Vec and BPE tokenization
embeddings = WordEmbeddings(
    embedding_dim=300,
    use_pretrained=True,
    pretrained_source='word2vec',
    tokenizer_type='bpe',
    subword_vocab_size=10000
)

# Build vocabulary from text
embeddings.build_vocabulary(words)

# Get embedding for a word (works even for OOV words)
vector = embeddings.get_embedding("unprecedented")

# Find similar words
similar_words = embeddings.get_similar_words("computer", top_n=5)

Contributing

We welcome contributions to the Single Layer Perceptron project! Here's how you can contribute:

Getting Started

Fork the Repository
- Click the "Fork" button at the top right of this repository

Clone Your Fork

git clone https://github.com/YOUR-USERNAME/singleLayerPerceptron.git
cd singleLayerPerceptron

Set Up Development Environment

# Create and activate a virtual environment
python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

# Install dependencies
./install.sh  # On Windows: install.bat
# Or manually: pip install -r requirements.txt

Making Changes

Create a Branch

git checkout -b feature/your-feature-name

Make Your Changes
- Write code that follows the project's style
- Add or update tests as necessary
- Update documentation to reflect your changes
Run Tests
```
python -m unittest discover tests
```

Submitting Changes

Commit Your Changes

git add .
git commit -m "Add a descriptive commit message"

Push to Your Fork

git push origin feature/your-feature-name

Create a Pull Request
- Go to the original repository
- Click "New Pull Request"
- Select your fork and branch
- Provide a clear description of your changes

Contribution Guidelines

Code Style: Follow PEP 8 guidelines for Python code
Documentation: Update docstrings and README.md as needed
Tests: Add tests for new features and ensure all tests pass
Commit Messages: Write clear, concise commit messages
Pull Requests: Keep PRs focused on a single feature or bug fix

Areas for Contribution

Implementing new perceptron variants
Enhancing tokenization algorithms
Improving UI applications
Optimizing performance
Adding new examples or datasets
Fixing bugs
Improving documentation

Code of Conduct

Be respectful and inclusive
Provide constructive feedback
Focus on the best outcome for the project

We appreciate your interest in improving the Single Layer Perceptron project!

Limitations

Perceptron Limitations

The single layer perceptron can only learn linearly separable patterns
It cannot solve problems like XOR without additional layers (which would make it a multi-layer perceptron)

Tokenization Limitations

BPE Limitations:
- May create subword units that don't align with linguistic morphemes
- Performance depends on the quality and size of the training corpus
- Requires careful tuning of vocabulary size and merge frequency thresholds
WordPiece Limitations:
- Similar to BPE, may not create linguistically meaningful subwords
- The greedy longest-match-first approach may not always be optimal
- Special token handling (## prefix) adds complexity to the tokenization process

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
static		static
templates		templates
.gitignore		.gitignore
ATTENTION_ARCHITECTURE.md		ATTENTION_ARCHITECTURE.md
ATTENTION_UI_GUIDE.md		ATTENTION_UI_GUIDE.md
EMBEDDING_ENHANCEMENTS.md		EMBEDDING_ENHANCEMENTS.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README_ATTENTION.md		README_ATTENTION.md
README_BRAIN_INSPIRED.md		README_BRAIN_INSPIRED.md
README_MLP.md		README_MLP.md
UI_GUIDE.md		UI_GUIDE.md
advanced_example.py		advanced_example.py
animated_perceptron.py		animated_perceptron.py
app.py		app.py
attention_example.py		attention_example.py
attention_perceptron.py		attention_perceptron.py
attention_perceptron_ui.py		attention_perceptron_ui.py
attention_perceptron_ui.py.bak		attention_perceptron_ui.py.bak
attention_perceptron_ui.py.bak2		attention_perceptron_ui.py.bak2
attention_weights.png		attention_weights.png
basic_embedding_example.py		basic_embedding_example.py
basic_mlp_ui.py		basic_mlp_ui.py
causal_attention_weights.png		causal_attention_weights.png
causal_multi_head_attention_weights.png		causal_multi_head_attention_weights.png
complete_mlp_ui.py		complete_mlp_ui.py
custom_language_model.py		custom_language_model.py
custom_tokenizers.py		custom_tokenizers.py
data_utils.py		data_utils.py
data_visualization.png		data_visualization.png
decision_boundary.png		decision_boundary.png
demo_language_model.py		demo_language_model.py
embedding_comparison.py		embedding_comparison.py
embedding_example.py		embedding_example.py
embeddings.py		embeddings.py
gender_classification.py		gender_classification.py
gender_data_visualization.png		gender_data_visualization.png
gender_decision_boundary.png		gender_decision_boundary.png
gender_training_errors.png		gender_training_errors.png
install.sh		install.sh
interactive_language_model.py		interactive_language_model.py
interactive_perceptron.py		interactive_perceptron.py
language_model_loss.png		language_model_loss.png
language_model_ui.py		language_model_ui.py
main.py		main.py
mlp_cli.py		mlp_cli.py
multi_head_attention_demo.py		multi_head_attention_demo.py
multi_head_attention_weights.png		multi_head_attention_weights.png
multi_layer_perceptron.py		multi_layer_perceptron.py
multi_layer_perceptron_ui.py		multi_layer_perceptron_ui.py
no_matplotlib_ui.py		no_matplotlib_ui.py
perceptron.py		perceptron.py
real_world_example.py		real_world_example.py
requirements.txt		requirements.txt
run.py		run.py
run_attention_ui.py		run_attention_ui.py
run_direct_mlp.py		run_direct_mlp.py
run_language_model_ui.py		run_language_model_ui.py
run_mlp_ui.py		run_mlp_ui.py
run_web_app.py		run_web_app.py
self_attention.py		self_attention.py
setup.py		setup.py
setup_environment.bat		setup_environment.bat
setup_environment.sh		setup_environment.sh
simple_attention_demo.py		simple_attention_demo.py
simple_language_model.py		simple_language_model.py
simple_mlp_ui.py		simple_mlp_ui.py
simple_perceptron.py		simple_perceptron.py
singleLayerPerceptron.iml		singleLayerPerceptron.iml
single_layer_perceptron.py		single_layer_perceptron.py
single_layer_perceptron_ui.py		single_layer_perceptron_ui.py
take_screenshots.md		take_screenshots.md
test_attention_fix.py		test_attention_fix.py
test_dimension_fix.py		test_dimension_fix.py
test_mlp_fixes.py		test_mlp_fixes.py
test_model_loading.py		test_model_loading.py
train_language_model.py		train_language_model.py
trained_language_model_loss.png		trained_language_model_loss.png
training_errors.png		training_errors.png

MayankSingh-coder/octopus-prime

Folders and files

Latest commit

History

Repository files navigation

Neural Network Language Model: Brain-Inspired AI

An Open-Source AI System Inspired by the Human Brain

Core Philosophy

Key Features

How It Works

UI Applications

Running the UI Applications

Learning from Interactions

Brain-Inspired Architecture

Perceptron as Neuron

Multi-Layer Networks for Complex Representations

Attention Mechanisms

Advanced Tokenization

Byte Pair Encoding (BPE)

WordPiece

Project Structure

Future Directions

Contributing

License

Single Layer Perceptron for Binary Classification and Language Modeling

New UI Applications

Running the New UI Applications

UI Screenshots

Model Training

Next Word Prediction

Text Generation

Model Architecture

Project Structure

Architectural Overview

Perceptron Architecture

Learning Process

Training Iteration

Mathematical Formulation

Usage

Installation

Basic Example

Advanced Example

Real-world Example

Requirements

Tokenization Algorithms

Byte Pair Encoding (BPE)

WordPiece

Language Model Architecture

Multi-Layer Perceptron Architecture

Attention-Based Architecture

Key Components and Code Implementation

1. AttentionPerceptron Class

2. WordEmbeddings Class

3. SelfAttention Class

Next Word Prediction Flow: Step-by-Step with Code

1. Input Processing

2. Word Embedding

3. Self-Attention Mechanism

4. Multi-Layer Perceptron

5. Prediction

Multi-Word Prediction

Visualization of Attention Weights

Advantages Over Simple Perceptron

Usage Example

Integration with Language Model

Using Tokenizers with Language Models

Enhanced Word Embeddings

Enhanced Embedding Architecture

Benefits of Enhanced Word Embeddings

Open-Source Embedding Models

Subword Tokenization

Cross-Platform Compatibility

Implementation Details

Usage Example

Contributing

Getting Started

Making Changes

Submitting Changes

Contribution Guidelines

Areas for Contribution

Packages