Tuesday, May 12, 2026
AI/MLUseful Stuff

Bharat MiniGPT 350M: A Custom GPT-Style LLM Built from Scratch in India

The AI industry today is dominated by massive language models from companies like OpenAI, Google, and Meta. Most public AI projects are either fine-tuned versions of existing models or lightweight wrappers built on top of already available architectures.

However, some developers are taking a far more challenging route — building transformer architectures and training pipelines from scratch.

One such project is Bharat MiniGPT 350M, developed by HVM Smart Solutions (Harshvardhan Mishra).

Unlike many “custom AI models” available online, Bharat MiniGPT 350M is not simply a fine-tuned GPT-2 or LLaMA derivative. Its transformer architecture, training logic, attention system, normalization layers, and dataset streaming pipeline were manually implemented in PyTorch before later being integrated into the Hugging Face ecosystem.


What is Bharat MiniGPT 350M?

Bharat MiniGPT 350M is a custom decoder-only Transformer-based causal language model designed for foundational LLM experimentation, architecture research, and large-scale language model training.

The project focuses on understanding and implementing the core mechanics behind modern GPT-style systems, including:

  • Transformer architecture engineering
  • Attention optimization
  • Language model pretraining
  • Training stability
  • Efficient inference systems
  • KV-cache compatible generation
  • Gradient checkpointing
  • Streaming datasets
  • Hugging Face compatibility

Rather than being a production chatbot, the project is currently an evolving base pretrained model focused on foundational AI engineering.


Model Specifications

FeatureDetails
Model NameBharat MiniGPT 350M
Parameters~350 Million
ArchitectureCustom Decoder-only Transformer
Training Tokens3 Billion
FrameworkPyTorch
HF CompatibilityAdded Later
DeveloperHarshvardhan Mishra
OrganizationHVM Smart Solutions

Architecture Overview

The model uses several modern transformer design concepts commonly seen in advanced LLM architectures.

ComponentDetails
Transformer Layers24
Attention Heads16
Embedding Size1024
Context Length768 Tokens
Vocabulary Size50,257
Positional EncodingRoPE
NormalizationRMSNorm
Feed Forward NetworkSwiGLU
AttentionSDPA / Flash Attention Compatible
Weight TyingYes
PrecisionFP16

A Truly Custom Transformer Implementation

One of the most important aspects of Bharat MiniGPT 350M is that many core transformer systems were implemented manually instead of relying entirely on prebuilt abstractions.

This includes:

  • Custom RMSNorm implementation
  • Manual RoPE positional embedding logic
  • Custom SwiGLU feed-forward blocks
  • Self-written attention modules
  • Decoder transformer blocks
  • Custom token generation pipeline
  • Streaming dataset architecture
  • Manual cosine LR scheduler
  • Gradient checkpointing integration

The project later added Hugging Face compatibility for easier deployment and ecosystem support.

This distinction is important because many public “custom models” are actually fine-tunes of existing transformer implementations, while Bharat MiniGPT involved architecture-level engineering from the ground up.


Custom RMSNorm Implementation

The project includes a manually implemented RMSNorm layer instead of relying solely on built-in transformer utilities.

RMSNorm has become increasingly popular in modern LLMs because it is computationally lightweight and can improve training stability.


Manual RoPE Positional Embeddings

Rotary Position Embedding (RoPE) was also manually implemented inside the project.

RoPE is widely used in modern transformer architectures because it helps models better capture positional relationships within sequences and improves long-context behavior.


SwiGLU Feed Forward Layers

The feed-forward network uses SwiGLU activation logic implemented directly in PyTorch.

SwiGLU-based architectures are commonly used in newer generation language models because they improve expressiveness and learning efficiency.


Attention System and Flash Attention Compatibility

The attention module was manually implemented using scaled dot-product attention combined with RoPE integration.

The architecture is also compatible with Flash Attention-style optimizations, which can significantly improve inference and training efficiency.


Hugging Face Compatibility Was Added Later

A key technical detail of the project is that Bharat MiniGPT was initially built as a standalone PyTorch transformer system.

Later, Hugging Face compatibility was integrated to support:

  • Easier inference
  • Standardized loading
  • generate() support
  • Deployment workflows
  • Community model sharing
  • Integration with HF tooling

This means the original focus of the project was architecture engineering and training infrastructure rather than simply wrapping an existing HF model.


Training Data

The model was trained using a weighted mixture of large-scale datasets:

DatasetWeight
HuggingFaceFW/fineweb (sample-10BT)40%
HuggingFaceFW/fineweb-edu (sample-10BT)30%
Wikimedia Wikipedia30%
TinyStories and some book corpus5-10% (short time period)

The project also includes a custom streaming dataset pipeline for handling large-scale token generation efficiently.


Training Configuration

SettingValue
OptimizerAdamW
Learning Rate3e-4
Minimum LR3e-5
Warmup Steps51,200
LR SchedulerCosine Decay
Gradient Accumulation128
Mixed PrecisionFP16
Gradient Clipping1.0

Engineering Features

The project includes several advanced engineering features:

  • Custom GPT architecture
  • RoPE positional embeddings
  • RMSNorm normalization
  • SwiGLU feed-forward layers
  • Flash Attention compatible SDPA
  • Gradient checkpointing
  • Weight tying
  • Streaming datasets
  • KV-cache compatible generation
  • Mixed precision FP16 training
  • Manual checkpoint recovery system

Current Stage: Base Pretrained Model

It is important to understand that Bharat MiniGPT 350M is currently:

  • A base pretrained model
  • Trained on 3B tokens
  • Not instruction-tuned yet
  • Not RLHF aligned
  • Still under active experimentation

This means the model is not intended to directly compete with systems like ChatGPT, Gemini, or Claude at its current stage.

The focus right now is foundational language learning and transformer experimentation.


Benchmark Results

The project was evaluated using the EleutherAI LM Evaluation Harness.

TaskMetricScore
ARC Easyacc0.3312
HellaSwagacc0.2650
PIQAacc0.5631

These results represent the current 3B-token pretrained checkpoint.


Why Projects Like This Matter

Building a transformer model from scratch is significantly more difficult than simply fine-tuning an existing model.

It requires solving multiple engineering challenges, including:

  • Training stability
  • Memory optimization
  • Gradient scaling
  • Precision handling
  • Attention efficiency
  • Dataset streaming
  • Checkpoint recovery
  • Generation stability
  • GPU memory management

Independent projects like Bharat MiniGPT help expand practical AI engineering knowledge and experimentation.


Future Improvements Planned

Several future improvements are planned for the project:

Better Tokenizer Strategy

Tokenizer quality directly affects language understanding and output coherence.

Larger Training Token Count

Additional pretraining beyond 3B tokens could significantly improve model capability.

Instruction Tuning

Future conversational fine-tuning may improve assistant-like behavior.

Better Inference Optimization

Future ONNX, quantization, and KV-cache optimizations are possible.

Indian Language Expansion

Support for Hindi and other Indian languages may improve over time.


Lightweight Models Still Matter

While the AI industry is focused heavily on massive multi-billion parameter models, smaller models still offer important advantages:

  • Lower hardware requirements
  • Faster experimentation
  • Easier debugging
  • Edge AI deployment potential
  • Browser inference possibilities
  • Lower inference costs

This is one reason compact transformer research remains valuable.

Building a 350M Parameter LLM on Free Kaggle T4 GPUs

One of the most impressive aspects of Bharat MiniGPT 350M is that the model was trained without expensive AI supercomputers or large enterprise GPU clusters. Instead, the project was developed primarily using the free-tier environment on Kaggle with NVIDIA T4 GPUs. Despite limited hardware resources, the project achieved 3 billion token pretraining through heavy optimization techniques such as gradient accumulation, FP16 mixed precision training, streaming datasets, checkpoint recovery systems, and memory-efficient transformer engineering. This demonstrates that modern LLM experimentation is no longer limited only to large corporations with massive budgets — independent developers can still build meaningful AI systems by combining efficient engineering with persistence and smart optimization strategies.


Final Thoughts

Bharat MiniGPT 350M represents an interesting independent AI engineering effort coming from India.

Its transformer architecture, RoPE implementation, RMSNorm layers, attention system, and training pipeline were manually developed in PyTorch before later being adapted for Hugging Face compatibility.

Although the model is still in its early pretrained stage and requires further refinement, it demonstrates how independent developers can explore foundational LLM engineering beyond simple fine-tuning workflows.

Harshvardhan Mishra

Hi, I'm Harshvardhan Mishra. Tech enthusiast and IT professional with a B.Tech in IT, PG Diploma in IoT from CDAC, and 6 years of industry experience. Founder of HVM Smart Solutions, blending technology for real-world solutions. As a passionate technical author, I simplify complex concepts for diverse audiences. Let's connect and explore the tech world together! If you want to help support me on my journey, consider sharing my articles, or Buy me a Coffee! Thank you for reading my blog! Happy learning! Linkedin

Leave a Reply

Your email address will not be published. Required fields are marked *