Bharat MiniGPT 350M: A Custom GPT-Style LLM Built from Scratch in India

12th May 2026 Harshvardhan Mishra

The AI industry today is dominated by massive language models from companies like OpenAI, Google, and Meta. Most public AI projects are either fine-tuned versions of existing models or lightweight wrappers built on top of already available architectures.

However, some developers are taking a far more challenging route — building transformer architectures and training pipelines from scratch.

One such project is Bharat MiniGPT 350M, developed by HVM Smart Solutions (Harshvardhan Mishra).

Unlike many “custom AI models” available online, Bharat MiniGPT 350M is not simply a fine-tuned GPT-2 or LLaMA derivative. Its transformer architecture, training logic, attention system, normalization layers, and dataset streaming pipeline were manually implemented in PyTorch before later being integrated into the Hugging Face ecosystem.

What is Bharat MiniGPT 350M?

Bharat MiniGPT 350M is a custom decoder-only Transformer-based causal language model designed for foundational LLM experimentation, architecture research, and large-scale language model training.

The project focuses on understanding and implementing the core mechanics behind modern GPT-style systems, including:

Transformer architecture engineering
Attention optimization
Language model pretraining
Training stability
Efficient inference systems
KV-cache compatible generation
Gradient checkpointing
Streaming datasets
Hugging Face compatibility

Rather than being a production chatbot, the project is currently an evolving base pretrained model focused on foundational AI engineering.

Model Specifications

Feature	Details
Model Name	Bharat MiniGPT 350M
Parameters	~350 Million
Architecture	Custom Decoder-only Transformer
Training Tokens	3 Billion
Framework	PyTorch
HF Compatibility	Added Later
Developer	Harshvardhan Mishra
Organization	HVM Smart Solutions

Architecture Overview

The model uses several modern transformer design concepts commonly seen in advanced LLM architectures.

Component	Details
Transformer Layers	24
Attention Heads	16
Embedding Size	1024
Context Length	768 Tokens
Vocabulary Size	50,257
Positional Encoding	RoPE
Normalization	RMSNorm
Feed Forward Network	SwiGLU
Attention	SDPA / Flash Attention Compatible
Weight Tying	Yes
Precision	FP16

A Truly Custom Transformer Implementation

One of the most important aspects of Bharat MiniGPT 350M is that many core transformer systems were implemented manually instead of relying entirely on prebuilt abstractions.

This includes:

Custom RMSNorm implementation
Manual RoPE positional embedding logic
Custom SwiGLU feed-forward blocks
Self-written attention modules
Decoder transformer blocks
Custom token generation pipeline
Streaming dataset architecture
Manual cosine LR scheduler
Gradient checkpointing integration

The project later added Hugging Face compatibility for easier deployment and ecosystem support.

This distinction is important because many public “custom models” are actually fine-tunes of existing transformer implementations, while Bharat MiniGPT involved architecture-level engineering from the ground up.

Custom RMSNorm Implementation

The project includes a manually implemented RMSNorm layer instead of relying solely on built-in transformer utilities.

RMSNorm has become increasingly popular in modern LLMs because it is computationally lightweight and can improve training stability.

Manual RoPE Positional Embeddings

Rotary Position Embedding (RoPE) was also manually implemented inside the project.

RoPE is widely used in modern transformer architectures because it helps models better capture positional relationships within sequences and improves long-context behavior.

SwiGLU Feed Forward Layers

The feed-forward network uses SwiGLU activation logic implemented directly in PyTorch.

SwiGLU-based architectures are commonly used in newer generation language models because they improve expressiveness and learning efficiency.

Attention System and Flash Attention Compatibility

The attention module was manually implemented using scaled dot-product attention combined with RoPE integration.

The architecture is also compatible with Flash Attention-style optimizations, which can significantly improve inference and training efficiency.

Hugging Face Compatibility Was Added Later

A key technical detail of the project is that Bharat MiniGPT was initially built as a standalone PyTorch transformer system.

Later, Hugging Face compatibility was integrated to support:

Easier inference
Standardized loading
generate() support
Deployment workflows
Community model sharing
Integration with HF tooling

This means the original focus of the project was architecture engineering and training infrastructure rather than simply wrapping an existing HF model.

Training Data

The model was trained using a weighted mixture of large-scale datasets:

Dataset	Weight
HuggingFaceFW/fineweb (sample-10BT)	40%
HuggingFaceFW/fineweb-edu (sample-10BT)	30%
Wikimedia Wikipedia	30%
TinyStories and some book corpus	5-10% (short time period)

The project also includes a custom streaming dataset pipeline for handling large-scale token generation efficiently.

Training Configuration

Setting	Value
Optimizer	AdamW
Learning Rate	3e-4
Minimum LR	3e-5
Warmup Steps	51,200
LR Scheduler	Cosine Decay
Gradient Accumulation	128
Mixed Precision	FP16
Gradient Clipping	1.0

Engineering Features

The project includes several advanced engineering features:

Custom GPT architecture
RoPE positional embeddings
RMSNorm normalization
SwiGLU feed-forward layers
Flash Attention compatible SDPA
Gradient checkpointing
Weight tying
Streaming datasets
KV-cache compatible generation
Mixed precision FP16 training
Manual checkpoint recovery system

Current Stage: Base Pretrained Model

It is important to understand that Bharat MiniGPT 350M is currently:

A base pretrained model
Trained on 3B tokens
Not instruction-tuned yet
Not RLHF aligned
Still under active experimentation

This means the model is not intended to directly compete with systems like ChatGPT, Gemini, or Claude at its current stage.

The focus right now is foundational language learning and transformer experimentation.

Benchmark Results

The project was evaluated using the EleutherAI LM Evaluation Harness.

Task	Metric	Score
ARC Easy	acc	0.3312
HellaSwag	acc	0.2650
PIQA	acc	0.5631

These results represent the current 3B-token pretrained checkpoint.

Why Projects Like This Matter

Building a transformer model from scratch is significantly more difficult than simply fine-tuning an existing model.

It requires solving multiple engineering challenges, including:

Training stability
Memory optimization
Gradient scaling
Precision handling
Attention efficiency
Dataset streaming
Checkpoint recovery
Generation stability
GPU memory management

Independent projects like Bharat MiniGPT help expand practical AI engineering knowledge and experimentation.

Future Improvements Planned

Several future improvements are planned for the project:

Better Tokenizer Strategy

Tokenizer quality directly affects language understanding and output coherence.

Larger Training Token Count

Additional pretraining beyond 3B tokens could significantly improve model capability.

Instruction Tuning

Future conversational fine-tuning may improve assistant-like behavior.

Better Inference Optimization

Future ONNX, quantization, and KV-cache optimizations are possible.

Indian Language Expansion

Support for Hindi and other Indian languages may improve over time.

Lightweight Models Still Matter

While the AI industry is focused heavily on massive multi-billion parameter models, smaller models still offer important advantages:

Lower hardware requirements
Faster experimentation
Easier debugging
Edge AI deployment potential
Browser inference possibilities
Lower inference costs

This is one reason compact transformer research remains valuable.

Building a 350M Parameter LLM on Free Kaggle T4 GPUs

One of the most impressive aspects of Bharat MiniGPT 350M is that the model was trained without expensive AI supercomputers or large enterprise GPU clusters. Instead, the project was developed primarily using the free-tier environment on Kaggle with NVIDIA T4 GPUs. Despite limited hardware resources, the project achieved 3 billion token pretraining through heavy optimization techniques such as gradient accumulation, FP16 mixed precision training, streaming datasets, checkpoint recovery systems, and memory-efficient transformer engineering. This demonstrates that modern LLM experimentation is no longer limited only to large corporations with massive budgets — independent developers can still build meaningful AI systems by combining efficient engineering with persistence and smart optimization strategies.

Final Thoughts

Bharat MiniGPT 350M represents an interesting independent AI engineering effort coming from India.

Its transformer architecture, RoPE implementation, RMSNorm layers, attention system, and training pipeline were manually developed in PyTorch before later being adapted for Hugging Face compatibility.

Although the model is still in its early pretrained stage and requires further refinement, it demonstrates how independent developers can explore foundational LLM engineering beyond simple fine-tuning workflows.