The Resource Archive

A curated library of datasets, models, and research for Nepali language technology. Search, filter, and explore.

📚Submit Resource or Correction

Add new resources to the archive or suggest corrections to existing entries

Nepali Text Corpus (IRIISNEPAL)

A comprehensive collection of approximately 6.4 million articles (27.5 GB) from news, blogs, and other online platforms. It is described as the largest text dataset on the Nepali Language.

By: Unknown

Dataset2024
View Source
License: MIT

np20ng

A multi-class Nepali text classification dataset consisting of over 200,000 news documents categorized into 20 different Nepali news groups.

By: Unknown

Dataset
View Source
License: Apache-2.0

Nepali Corpus (C4 Multilingual)

A massive, cleaned subset of the Common Crawl containing approximately 3.2 billion tokens (13 GB) of Nepali web text.

By: dirkgr, @adarob

Dataset2021
View Source
License: ODC-BY

16NepaliNews Corpus

A collection of 14,364 news documents partitioned across 16 different categories, inspired by the 20 Newsgroups dataset.

By: sndsabin

Dataset2017
View Source
License: GPL-3.0

39K Nepali Wikipedia Articles

A cleaned dataset of 39,000 articles from Nepali Wikipedia, providing a source of formal, encyclopedic text, with a train and test set.

By: Gaurav

Dataset2018
View Source
License: Not Specified

A LARGE SCALE NEPALI TEXT CORPUS

A large-scale text corpus for the Nepali language, available via IEEE Dataport.

By: Community

Dataset2019
View Source
License: Not Specified

CC100 Nepali

A monolingual dataset from the Common Crawl, part of a larger collection covering 100 languages.

By: Community

Dataset2020
View Source
License: Not Specified

350K Nepali Sentences

A collection of 350,000 Nepali sentences from various sources.

By: Unknown

Dataset2021
View Source
License: Not Specified

Nepali Abstractive Summarization Corpus

A corpus of 286,000 article-title pairs from news sources, suitable for training abstractive summarization models.

By: Community

Dataset
View Source
License: Not Specified

Nepali NER (EBIQUITY)

A dataset for Named Entity Recognition, released in two versions with IO and BIO tagging schemes. Version 2 is recommended.

By: Unknown

Dataset2019
View Source
License: MIT

Large Nepali ASR training data set

A large dataset for Automatic Speech Recognition containing approximately 157,000 transcribed utterances collected by Google.

By: Unknown

Dataset2018
View Source
License: CC BY-SA 4.0

High quality TTS data for Nepali

A multi-speaker dataset for Text-to-Speech synthesis containing around 2,000 high-quality transcribed sentences.

By: Unknown

Dataset2018
View Source
License: CC BY-SA 4.0

FLoRes Evaluation Datasets for Low-Resource Machine Translation

Standardized evaluation datasets for low-resource Nepali-English machine translation, based on Wikipedia.

By: Unknown

Dataset2019
View Source
License: CC BY-SA 4.0

nepal-brihat-sabdakosh-json

A structured JSON dump of all 122,000 words from the Nepali Brihat Sabdakosh (a comprehensive dictionary).

By: bikashpadhikari

Dataset
View Source
License: Not Specified

LINCE: Nepali-English Code Switching

A dataset containing Nepali-English code-switched language, valuable for studying language mixing phenomena.

By: Community

Dataset2020
View Source
License: Not Specified

DHCD dataset

A dataset of Devnagari (Nepali) handwritten characters for handwritten character recognition tasks.

By: Prasanna1991

Dataset
View Source
License: Not Specified

Nepali Characters Dataset (NCD)

A dataset containing images of Nepali characters.

By: InspiringLab

Dataset
View Source
License: Not Specified

Nepali Fonts OCR Dataset

A dataset for Optical Character Recognition (OCR) of various Nepali fonts.

By: Unknown

Dataset
View Source
License: Not Specified

Nepali Handwritten Digits

A dataset containing images of Nepali handwritten digits.

By: kcnishan

Dataset
View Source
License: Not Specified

Nepali Stopwords

A list of common stop words in the Nepali language.

By: sanjaalcorps

Dataset
View Source
License: Not Specified

IRIIS-RESEARCH/RoBERTa_Nepali_125M

A 110-million-parameter RoBERTa-based model trained on a 27.5 GB Nepali corpus. Designed for NLU tasks like classification and NER.

By: Unknown

Model2024
View Source
License: MIT

NepBERTa

A BERT-based NLU model trained on an extensive monolingual corpus of 0.8B words. Released with the Nep-gLUE benchmark for evaluation.

By: Unknown

Model2022
View Source
License: Not Specified

NepaliGPT: A Generative Language Model for the Nepali Language

A generative large language model (GPT) for Nepali, trained on a large custom corpus called the Devanagari Corpus.

By: Unknown

Model2025
View Source
License: bsd-3-clause-clear

patrakar (Nepali News Classifier)

A DistilBERT model fine-tuned for classifying Nepali news into 9 categories.

By: sahajrajmalla

Model2022
View Source
License: MIT

Nepali-DistilBERT

A DistilBERT language model trained on the OSCAR Nepali corpus and fine-tuned for sentiment analysis.

By: dexhrestha

License: Not Specified

Transformer-Based Nepali Language Model

A text generation model for Nepali, trained on the Oscar corpus, with objectives including spelling correction and feature extraction.

By: Unknown

License: MIT

fastText Embeddings

300-dimensional word vectors for 157 languages, including Nepali, trained on Common Crawl and Wikipedia using the CBOW method.

By: Unknown

Model2018
View Source
License: CC BY-SA 3.0

NPVec1

A suite of 25 state-of-the-art word embeddings for Nepali, derived from a large corpus using GloVe, Word2Vec, fastText, and BERT.

By: Unknown

Model2021
View Source
License: Not Specified

300-D Word Embeddings (Word2Vec) for Nepali Language

A pre-trained Word2Vec model with 300-dimensional vectors for over 0.5 million Nepali words, trained on a 90M-word news corpus.

By: rabindralamsal

Model2019
View Source
License: MIT

ELMo Embeddings

Contextualized word embeddings for many South Asian languages, including Nepali.

By: Unknown

License: Not Specified

Byte Pair Embeddings (BPEmb)

Subword embeddings for 275 languages, including Nepali, trained on Wikipedia.

By: Community

License: Not Specified

wav2vec2-nepali

A fine-tuned wav2vec2 model for Nepali Automatic Speech Recognition.

By: Unknown

License: Not Specified

Nepali NLP Toolkit

A comprehensive Python library for various NLP tasks including embeddings, tokenization, stemming, summarization, OCR, and translation.

By: sushil79g

License: MIT

Indic NLP Library

A library providing common NLP utilities for various Indic languages, including Nepali.

By: anoopkunchukuttan

License: MIT

Nepali Lemmatizer

A tool specifically for lemmatization of Nepali words.

By: dpakpdl

License: Not Specified

nepali-spell

A spell corrector for Nepali that uses Edit Distance to predict correct words.

By: nepali-bhasa

License: GPL-3.0

NepaliLipi

An application for text prediction and transliteration from Roman script to Devanagari.

By: AchillesKarki

Application
View Source
License: Not Specified

Nepdict

An English-Nepali dictionary application built in Python for the terminal.

By: Unknown

Application
View Source
License: GPL-3.0

Improving Nepali Document Classification by Neural Network

This paper compares different text classification methods for Nepali and demonstrates that using word2vec with a neural network improves performance.

By: Unknown

Paper2016
View Source
License: Not Specified

A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language

This paper proposes a deep learning-based Part-of-Speech (POS) tagger for Nepali text, achieving over 99% accuracy.

By: Unknown

Paper2018
View Source
License: Not Specified

A Computational Analysis of Nepali Morphology: A Model For Natural Language Processing

A dissertation on the computational analysis of Nepali morphology using a finite-state approach to create a morphological analyzer.

By: Unknown

Paper2011
View Source
License: Not Specified

A Morphological Analyzer and a Stemmer for Nepali

This paper discusses the design, implementation, and linguistic aspects of a Morphological Analyzer and a stemmer for Nepali.

By: Unknown

Paper2007
View Source
License: Not Specified