A versatile, high-performance Natural Language Processing (NLP) toolkit written entirely in Go (Golang). The project provides a command-line utility for training and utilizing foundational NLP models, including Word2Vec embeddings, a sophisticated Mixture of Experts (MoE) model, and a practical Intent Classifier.
Note: This project is currently in a beta stage and is under active development. The API and functionality are subject to change. Accuracy is not the primary focus at this stage, as the main goal is to explore and implement these NLP models in Go.
- π Project Website
- β¨ Key Features
- π Getting Started
- π οΈ Usage
- βοΈ Project Structure
- π Data & Configuration
- πΊοΈ Roadmap
- Future Direction: Semantic Parsing and Reasoning
- π€ Contributing
- π License
- π Special Thanks
- Why Go?
The application is structured as a dispatcher that runs specialized modules for various NLP tasks:
- Word2Vec Training: Generate high-quality distributed word representations (embeddings) from a text corpus.
- Mixture of Experts (MoE) Architecture: Train a powerful MoE model, designed for improved performance, scalability, and handling of complex sequential or structural data.
- Intent Classification: Develop a model for accurately categorizing user queries into predefined semantic intents.
- Efficient Execution: Built in Go, leveraging its performance and concurrency features for fast training and inference.
You need a working Go environment (version 1.25 or higher is recommended) installed on your system.
-
Clone the repository:
git clone https://github.com/golangast/nlptagger.git cd nlptagger
You can build the executable from the root of the project directory:
go build .
This will create an nlptagger
executable in the current directory.
The main executable (nlptagger
or main.go
) controls all operations using specific command-line flags. All commands should be run from the root directory of the project.
Use the respective flags to initiate the training process. Each flag executes a separate module located in the cmd/
directory.
Model | Flag | Command |
---|---|---|
Word2Vec | --train-word2vec |
go run main.go --train-word2vec |
Mixture of Experts (MoE) | --train-moe |
go run main.go --train-moe |
Intent Classifier | --train-intent-classifier |
go run main.go --train-intent-classifier |
To run predictions using a previously trained MoE model, use the --moe_inference
flag and pass the input query string.
Action | Flag | Command Example |
---|---|---|
MoE Inference | --moe_inference |
go run main.go --moe_inference "schedule a meeting with John for tomorrow at 2pm" |
The example/main.go
program demonstrates how to parse a natural language query, generate a workflow, and execute it. This showcases the core capabilities of the nlptagger
for understanding and acting upon user commands.
To run the example, use the following command with a query:
go run ./example/main.go -query "create folder jack with a go webserver jill"
You can also run it interactively:
go run ./example/main.go
Then, enter queries at the prompt.
Expected Output (for the query "create folder jack with a go webserver jill"):
Processing query: "create folder jack with a go webserver jill"
--- Generated Workflow (after inference and validation) ---
Node ID: Filesystem::Folder-jack-0, Operation: CREATE, Resource Type: Filesystem::Folder, Resource Name: jack, Properties: map[permissions:493], Command: , Dependencies: []
Node ID: Filesystem::File-jill-0, Operation: CREATE, Resource Type: Filesystem::File, Resource Name: jill, Properties: map[permissions:493], Command: , Dependencies: [Filesystem::Folder-jack-0]
Node ID: file-createfile-0, Operation: WRITE_FILE, Resource Type: , Resource Name: , Properties: map[], Command: , Dependencies: [Filesystem::File-jill-0]
This project is more than just command-line tools. It's a collection of Go packages. You can use these packages in your own Go projects.
Example usage is in the /example folder.
package main
import (
"bufio"
"flag"
"fmt"
"log"
"os"
"strings"
"nlptagger/neural/parser"
"nlptagger/neural/workflow"
)
var (
query = flag.String("query", "", "Natural language query for the parser")
)
func main() {
flag.Parse()
// Create parser and executor instances
p := parser.NewParser()
executor := workflow.NewExecutor()
// Process initial query from flag, if provided
if *query != "" {
processAndExecuteQuery(*query, p, executor)
}
// Start interactive loop
reader := bufio.NewReader(os.Stdin)
for {
fmt.Print("\nEnter a query (e.g., \"create folder jack with a go webserver jill\"): ")
input, _ := reader.ReadString('\n')
input = strings.TrimSpace(input)
if input == "exit" || input == "quit" {
break
}
if input != "" {
processAndExecuteQuery(input, p, executor)
}
}
}
func processAndExecuteQuery(q string, p *parser.Parser, executor *workflow.Executor) {
log.Printf("Processing query: \"%s\"", q)
// Parse the query into a workflow
// The parser now handles semantic validation and inference internally.
wf, err := p.Parse(q)
if err != nil {
log.Printf("Error parsing query: %v", err)
return
}
fmt.Println("\n--- Generated Workflow (after inference and validation) ---")
for _, node := range wf.Nodes {
fmt.Printf("Node ID: %s, Operation: %s, Resource Type: %s, Resource Name: %s, Properties: %v, Command: %s, Dependencies: %v\n",
node.ID, node.Operation, node.Resource.Type, node.Resource.Name, node.Resource.Properties, node.Command, node.Dependencies)
}
// Execute the generated workflow
if err := executor.ExecuteWorkflow(wf); err != nil {
log.Printf("Error executing workflow: %v", err)
return
}
}
The neural/
and tagger/
directories contain the reusable components. Import them as needed.
The project is a collection of tools. Its structure reflects this.
nlptagger/
βββ main.go # Dispatches to common tools.
βββ go.mod # Go module definition.
βββ cmd/ # Each subdirectory is a command-line tool.
β βββ train_word2vec/ # Example: Word2Vec training.
β βββ moe_inference/ # Example: MoE inference.
βββ neural/ # Core neural network code.
βββ tagger/ # NLP tagging components.
βββ trainingdata/ # Sample data for training.
βββ gob_models/ # Saved models.
- Data Structure: Training modules look for data files in the
trainingdata/
directory. For example,intent_data.json
is used for intent classification training. - Configuration: Model hyperparameters (learning rate, epochs, vector size, etc.) are currently hardcoded within their respective training modules in the
cmd/
directory. This is an area for future improvement. - Model Output: Trained models are saved as
.gob
files to thegob_models/
directory by default.
This project is under active development. Here are some of the planned features and improvements:
- Implement comprehensive unit and integration tests.
- Add more NLP tasks (e.g., Named Entity Recognition, Part-of-Speech tagging).
- Externalize model configurations from code into files (e.g., YAML, JSON).
- Improve model accuracy and performance.
- Enhance documentation with more examples and API references.
- Create a more user-friendly command-line interface.
Instead of merely tagging words, the NLP layer would generate an Abstract Semantic Graph (ASG) or Structured Object that represents the complete meaning, including implicit details, constraints, and dependencies.
Current NLP Output (Intent Recognition):
Identified Elements | Values in Query |
---|---|
Parent Intent | webserver_creation |
Child Intent | create |
Object Types | folder, webserver |
Names | jack, jill |
This abstraction provides the foundation for truly intelligent command generation:
A. Reasoning and Inference The new layer can handle implicit and contextual details (Reasoning). Example Query: "Make 'jill' in 'jack' and expose the service publicly." Inference: The system automatically infers that a "publicly exposed service" implies setting the webserver's port to be publicly accessible and potentially generating an extra LoadBalancer resource (if using a cloud execution backend).
B. Dependency Resolution
The NLP can identify causal and temporal relationships (Dependency).
Example Query: "Set up my Go server, but only after you create the database."
Semantic Output: The output graph would establish a depends_on
relationship between the Deployment::GoWebserver
and the Data::PostgreSQL
resource, ensuring the command executor runs them in the correct sequence.
We welcome contributions! Please feel free to open issues for bug reports or feature requests, or submit pull requests for any enhancements.
- Fork the repository.
- Create a new branch (
git checkout -b feature/AmazingFeature
). - Commit your changes (
git commit -m '''Add AmazingFeature'''
). - Push to the branch (
git push origin feature/AmazingFeature
). - Open a Pull Request.
Note on Tests: There is currently a lack of automated tests. Contributions in this area are highly encouraged and appreciated!
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
- The Go Team and contributors for creating and maintaining Go.
Go is a great choice for this project for several reasons:
- Stability: The language has a strong compatibility promise. What you learn now will be useful for a long time. (Go 1 Compatibility Promise)
- Simplicity and Readability: Go's simple syntax makes it easy to read and maintain code.
- Performance: Go is a compiled language with excellent performance, which is crucial for NLP tasks.
- Concurrency: Go's built-in concurrency features make it easy to write concurrent code for data processing and model training.
- Strong Community and Ecosystem: Go has a growing community and a rich ecosystem of libraries and tools. (Go User Community)