Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consist of population of three years in Tamil Nadu.
This file consist of information about the places, population , district and position of place.
This is done during the internship at Tact Labs. Thanks to Aishwarya who aided me in collecting the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population: Tamil Nadu data was reported at 77.222 Person mn in 2025. This records an increase from the previous number of 76.993 Person mn for 2024. Population: Tamil Nadu data is updated yearly, averaging 66.611 Person mn from Mar 1994 (Median) to 2025, with 32 observations. The data reached an all-time high of 77.222 Person mn in 2025 and a record low of 57.670 Person mn in 1994. Population: Tamil Nadu data remains active status in CEIC and is reported by Ministry of Statistics and Programme Implementation. The data is categorized under Global Database’s India – Table IN.GBG001: Population. [COVID-19-IMPACT]
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
To provide a detailed description of your dataset, let's go over each feature based on your dataset structure and the columns in the file. I'll also explain potential meanings for each column and what could be inferred from them.
Based on the columns mentioned in your dataset (Tamil_nadu_taxi_trips_cleaned.csv), here's a detailed description of each:
Date_Time:
Pickup_Location:
Drop_Location:
Pickup_Location, this could be represented as a location name or area code.Distance_km:
Fare_INR:
No_of_Passengers:
Travel_Time_hrs:
Tips_INR:
Tourist_Place_Nearby:
Weather_Condition:
Vehicle_Type:
Date_Time, Pickup_Location, Drop_Location, Distance_km, etc.). Filling these appropriate...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vital Statistics: Natural Growth Rate: per 1000 Population: Tamil Nadu data was reported at 7.700 NA in 2020. This records a decrease from the previous number of 8.100 NA for 2019. Vital Statistics: Natural Growth Rate: per 1000 Population: Tamil Nadu data is updated yearly, averaging 8.600 NA from Dec 1997 (Median) to 2020, with 23 observations. The data reached an all-time high of 11.400 NA in 2001 and a record low of 7.700 NA in 2020. Vital Statistics: Natural Growth Rate: per 1000 Population: Tamil Nadu data remains active status in CEIC and is reported by Office of the Registrar General & Census Commissioner, India. The data is categorized under India Premium Database’s Demographic – Table IN.GAH004: Vital Statistics: Natural Growth Rate: by States.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vital Statistics: Birth Rate: per 1000 Population: Tamil Nadu data was reported at 13.800 NA in 2020. This records a decrease from the previous number of 14.200 NA for 2019. Vital Statistics: Birth Rate: per 1000 Population: Tamil Nadu data is updated yearly, averaging 15.900 NA from Dec 1997 (Median) to 2020, with 23 observations. The data reached an all-time high of 19.300 NA in 2000 and a record low of 13.800 NA in 2020. Vital Statistics: Birth Rate: per 1000 Population: Tamil Nadu data remains active status in CEIC and is reported by Office of the Registrar General & Census Commissioner, India. The data is categorized under India Premium Database’s Demographic – Table IN.GAH002: Vital Statistics: Birth Rate: by States.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This comprehensive dataset provides detailed information on Tamil Nadu state elections spanning from 1971 to 2021. It encompasses data from multiple legislative assembly elections, capturing a wide range of variables essential for political and social analysis.
Applications
Column Description - ac_no : Assembly Consistuency Number - ac_name: Assembly Consistuency Name - winning_cand: Name of Winning Candidate - party: Name of Party - totelectors: Total number of electors in the consistuency - tot votes: Total votes secured by winning candidate - poll_percentage: Percentage of polls polled - margin: Margin difference between winner and runner - winning_percentage: Percentage of marginal win - district: Name of district
Acknowledgement
The data has been extracted from official eci website and cross-checked with other sites for validation. If you use this work or want to appreciate me you can drop a hi to linkedIn.com/in/srinrealyf
Keywords
Tamil Nadu, India, Election, Election Data, Legislative Election, MLA Election
If you find this dataset helpful, consider upvoting the dataset and drop your comments for any feedback
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset is the statistical information on schools in different districts of the state of Tamil Nadu, India.
The dataset contains the following information:
The data was collected from the open data portal of the Tamil Nadu government from the following locations:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Census: Population: Tamil Nadu data was reported at 72,147,030.000 Person in 03-01-2011. This records an increase from the previous number of 62,405,679.000 Person for 03-01-2001. Census: Population: Tamil Nadu data is updated decadal, averaging 31,903,000.000 Person from Mar 1901 (Median) to 03-01-2011, with 12 observations. The data reached an all-time high of 72,147,030.000 Person in 03-01-2011 and a record low of 19,252,630.000 Person in 03-01-1901. Census: Population: Tamil Nadu data remains active status in CEIC and is reported by Office of the Registrar General & Census Commissioner, India. The data is categorized under India Premium Database’s Demographic – Table IN.GAB002: Census: Population: by States.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Tamil Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of Tamil language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.
This dataset includes over 6,000 high-quality scripted audio prompts recorded in Tamil, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.
The prompts span a broad range of healthcare-specific interactions, such as:
To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:
These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.
Every audio recording is accompanied by a verbatim, manually verified transcription.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This dataset is part of a larger mission to transform Tamil into a high-resource language in the field of Natural Language Processing (NLP). As one of the oldest and most culturally rich languages, Tamil has a unique linguistic structure, yet it remains underrepresented in the NLP landscape. This dataset, extracted from Tamil Wikipedia, serves as a foundational resource to support Tamil language processing, text mining, and machine learning applications.
- Text Data: This dataset contains over 569,000 articles in raw text form, extracted from Tamil Wikipedia. The collection is ideal for language model training, word frequency analysis, and text mining.
- Scripts and Processing Tools: Code snippets are provided for processing .bz2 compressed files, generating word counts, and handling data for NLP applications.
Despite having a documented lexicon of over 100,000 words, only a fraction of these are actively used in everyday communication. The largest available Tamil treebank currently holds only 10,000 words, limiting the scope for training accurate language models. This dataset aims to bridge that gap by providing a robust, open-source corpus for researchers, developers, and linguists who want to work on Tamil language technologies.
- Language Modeling: Train or fine-tune models like BERT, GPT, or LSTM-based language models for Tamil. - Linguistic Research: Analyze Tamil morphology, syntax, and vocabulary usage. - Data Augmentation: Use the raw text to generate augmented data for multilingual NLP applications. - Word Embeddings and Semantic Analysis: Create embeddings for Tamil words, useful in multilingual setups or standalone applications.
I believe that advancing Tamil in NLP cannot be a solo effort. Contributions in the form of additional data, annotations, or even new tools for Tamil language processing are welcome! By working together, we can make Tamil a truly high-resource language in NLP.
This dataset is based on content from Tamil Wikipedia and is shared under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC BY-SA 3.0). Proper attribution to Wikipedia is required when using this data.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Tamil Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Tamil language, advancing the field of artificial intelligence.
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Tamil. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Tamil people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
This fully labeled Tamil Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in Tamil are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Tamil Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
Facebook
TwitterAccording to the 76th round of the NSO survey conducted between July and December 2018, a higher percentage of men had disabilities compared to women in India. Specifically in Tamil Nadu, two percent of men had multiple disabilities, while this was at 1.9 percent among females. The National Statistical Office (NSO) is the statistical wing of the Ministry of Statistics and Programme Implementation (MOSPI), mainly responsible for laying down standards for statistical analysis, data collection, and implementation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Census: Population: Tamil Nadu: Female data was reported at 36,009,055.000 Person in 03-01-2011. This records an increase from the previous number of 31,004,770.000 Person for 03-01-2001. Census: Population: Tamil Nadu: Female data is updated decadal, averaging 15,945,649.000 Person from Mar 1901 (Median) to 03-01-2011, with 12 observations. The data reached an all-time high of 36,009,055.000 Person in 03-01-2011 and a record low of 9,833,232.000 Person in 03-01-1901. Census: Population: Tamil Nadu: Female data remains active status in CEIC and is reported by Office of the Registrar General & Census Commissioner, India. The data is categorized under India Premium Database’s Demographic – Table IN.GAB002: Census: Population: by States.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Tamil Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of Tamil language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic Tamil speech data.
This dataset features over 6,000 high-quality scripted monologue recordings in Tamil. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.
The dataset covers a wide variety of general conversation scenarios, including:
To enhance authenticity, the prompts include:
Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.
Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.
Rich metadata is included for detailed filtering and analysis:
This dataset can power a variety of Tamil language AI technologies, including:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Delve into the agriculture of Tamil Nadu with this dataset, offering comprehensive insights into crop production. Covering various crops, the dataset provides valuable information on yields, trends, and patterns, serving as a crucial resource for researchers, policymakers, and stakeholders interested in understanding the agricultural landscape of Tamil Nadu.
Facebook
TwitterNon working population of Tamil Nadu surged by 13.71% from 34,527,397 persons in 2001 to 39,262,349 persons in 2011. Since the 13.71% jump in 2011, non working population remained stable by 0.00% in 2011.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Dataset consist of population statistics by census years of cities and towns in Tamil Nadu obtained from various sources.
This Dataset consist of 6 columns - Name of city or town, Status of that city/town, District of that city/town and 3 columns of population statistics by census years(1991-03-01, 2001-03-01, 2011-03-01)
This is done during the internship at Tact Labs. Thanks to Aishwarya who helped me in collecting the dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There were no large datasets in the Tamil language apart from the Tamil wiki dataset (120k articles), So I decided to create my own. This dataset is the result of it!
The data is acquired by scrapping the publicly available articles published on Dinamalar.com, which is a well-known newspaper in Tamil nadu, India. The dataset contains articles from 2009 - 2019.
This dataset exists because of Dinamalar.com. All thanks to them.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coronavirus disease 2019 (COVID-19), also known as the coronavirus, or COVID, is a contagious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was identified in Wuhan, China, in December 2019. The disease has since spread worldwide, leading to an ongoing pandemic. And this data is all about the COVID 19 Vaccination Coverage across Health Unit district wise in Tamil Nadu.
Data columns (total 39 columns): Column
0 S.No
1 Health Unit District
2 Achievement towards vaccination of 1st Dosage Covishield to HCW
3 Achievement towards vaccination of 2nd Dosage Covishield to HCW
4 Achievement towards vaccination of 1st Dosage Covishield to FLW
5 Achievement towards vaccination of 2nd Dosage Covishield to FLW
6 Achievement towards vaccination of 1st Dosage Covishield to beneficiaries of 18 years and less than 44 years age group
7 Achievement towards vaccination of 2nd Dosage Covishield to beneficiaries of 18 years and less than 44 years age group
8 Achievement towards vaccination of 1st Dosage Covishield to beneficiaries of 45 years and less than 60 years age group with Comorbidities
9 Achievement towards vaccination of 2nd Dosage Covishield to beneficiaries of 45 years and less than 60 years age group with Comorbidities
10 Achievement towards vaccination of 1st Dosage Covishield to 60+ years beneficiaries with Comorbidities
11 Achievement towards vaccination of 2nd Dosage Covishield to 60+ years beneficiaries with Comorbidities
12 Total Achievement of vaccination to beneficiaries under 1st Dose of Covishield
13 Total Achievement of vaccination to beneficiaries under 2nd Dose of Covishield
14 Achievement towards vaccination of 1st Dosage Covaxin to HCW
15 Achievement towards vaccination of 2nd Dosage Covaxin to HCW
16 Achievement towards vaccination of 1st Dosage Covaxin to FLW
17 Achievement towards vaccination of 2nd Dosage Covaxin to FLW
18 Achievement towards vaccination of 1st Dosage Covaxin to beneficiaries of 18 years and less than 44 years age group
19 Achievement towards vaccination of 2nd Dosage Covaxin to beneficiaries of 18 years and less than 44 years age group
20 Achievement towards vaccination of 1st Dosage Covaxin to beneficiaries of 45 years and less than 60 years age group with Comorbidities
21 Achievement towards vaccination of 2nd Dosage Covaxin to beneficiaries of 45 years and less than 60 years age group with Comorbidities
22 Achievement towards vaccination of 1st Dosage Covaxin to 60+ years beneficiaries with Comorbidities
23 Achievement towards vaccination of 2nd Dosage Covaxin to 60+ years beneficiaries with Comorbidities
24 Total Achievement of vaccination to beneficiaries under 1st Dose of Covaxin
25 Total Achievement of vaccination to beneficiaries under 2nd Dose of Covaxin ...
Facebook
TwitterThis dataset will show the male population in Arani village where is in South eastern side of the Tamil Nadu State.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consist of population of three years in Tamil Nadu.
This file consist of information about the places, population , district and position of place.
This is done during the internship at Tact Labs. Thanks to Aishwarya who aided me in collecting the dataset.