100+ datasets found
  1. m

    Fruits Dataset for Classification

    • data.mendeley.com
    Updated Feb 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS GTS (2025). Fruits Dataset for Classification [Dataset]. http://doi.org/10.17632/rg254yr63x.1
    Explore at:
    Dataset updated
    Feb 11, 2025
    Authors
    GTS GTS
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    About Dataset (strawberries, peaches, pomegranates) Photo requirements: 1-White background 2-.jpg 3- Image size 300*300 The number of photos required is 250 photos of each fruit when it is fresh and 250 photos of each Fruit Dataset for Classification when it is rotten. Total 1500 images

    Diverse Collection With a diverse collection of Product images, the files provides an excellent foundation for developing and testing machine learning models designed for image recognition and allocation. Each image is captured under different lighting conditions and backgrounds, offering a realistic challenge for algorithms to overcome.

    Real-World Applications The variability in the dataset ensures that models trained on it can generalize well to real-world scenarios, making them robust and reliable. The dataset includes common fruits such as apples, bananas, oranges, and strawberries, among others, allowing for comprehensive training and evaluation.

    Industry Use Cases One of the significant advantages of using the Fruits Dataset for Classification is its applicability in various fields such as agriculture, retail, and the food industry. In agriculture, it can help automate the process of fruit sorting and grading, enhancing efficiency and reducing labor costs. In retail, it can be used to develop automated checkout systems that accurately identify fruits, streamlining the purchasing process.

    Educational Value The dataset is also valuable for educational purposes, providing students and educators with a practical tool to learn and teach machine learning concepts. By working with this dataset, learners can gain hands-on experience in data preprocessing, model training, and evaluation.

    Conclusion The Fruits Dataset for Classification is a versatile and indispensable resource for advancing the field of image classification. Its diverse and high-quality images, coupled with practical applications, make it a go-to dataset for researchers, developers, and educators aiming to improve and innovate in machine learning and computer vision.

    This dataset is sourced from Kaggle.

  2. ECE657AW20-ASG4-Coronavirus

    • kaggle.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MarkCrowley (2025). ECE657AW20-ASG4-Coronavirus [Dataset]. https://www.kaggle.com/markcrowley/ece657aw20asg4coronavirus/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Kaggle
    Authors
    MarkCrowley
    Description

    COVID-19 Data for Analysis and Machine Learning

    There are lots of datasets online, more growing every day, to help us all get a handle on this pandemic. Here are just a few links to data we've found that students in ECE 657A, and anyone else who finds their way here, can play with and practice their machine learning skills. The main dataset is the COVID-19 dataset from John Hopkins university. This data is perfect for time series analysis and Recurrent Neural Networks, the final topic in the course. This dataset will be left public so anyone can see it but to join you must request the link from Prof. Crowley or be in the ECE 657A W20 course at the University of Waterloo.

    For ECE 657A W20 Students

    Your bonus grade for assignment 4 comes from creating a kernel from this dataset and writing up some useful analysis and publishing that notebook. You can do any kind of analysis you like but some good places to start are - Analysis: feature extraction and analysis of the data to look for patterns that aren't evident from the original features (this is hard for the simple spread/infection/death data since there aren't that many features) - Other Data: utilize any other datasets in your kernels by loading data about the countries themselves (population, density, wealthy etc.) or their responses to the situation. Tip: If you open a New Notebook related to this dataset you can easily add new data available on Kaggle and link that to you analysis. - HOW'S MY FLATTENING COVID19 DATASET - This dataset has a lot more files and includes a lot of what I was talking about, so if you produce good kernels there you can also count them for your asg4 grade. https://www.kaggle.com/howsmyflattening/covid19-challenges - Predict: make predictions about confirmed cases, deaths, recoveries or other metrics for the future. You can test you models by training on the past and predicting on the following days, then post a prediction for tomorrow or the next few days given ALL the data up to this point. Hopefully the datasets we've linked here will updated automatically so your kernels would update as well. - Create Tasks: you can make your own "Tasks" as part of this kaggle and propose your own solution to it. Then others can try solving it as well. - Groups: students can do this assignment either in the same groups they had for assignment 3 or individually.

    Suggest other datasets

    We're happy to add other relevant data to this Kaggle, in particular it would be great to integrate live data on the following: - Progression of each country/region/city in "days since X Level" such as Days since 100 confirmed cases, see the link for a great example such a dataset being plotted. I haven't see a live link to a csv of that data, but we could generate. - Mitigation Policies enacted by local governments in each city/region/country. These are dates when that region first enacted Level 1, 2, 3, 4 containment, or started encouraging social distancing or the date when they closed different levels of schools, pubs, restaurants etc. - The hidden positives: this would be a dataset, or method for estimating, as described by Emtiyaz Khan in this twitter thread. The idea is, how many unreported or unconfirmed cases are there in any region, and can we build an estimate of that number using other regions with widespread testing as a baseline and the death rates which are like an observation of a process with a hidden variable or true infection rate. - Paper discussing one way to compute this : https://cmmid.github.io/topics/covid19/severity/global_cfr_estimates.html

  3. N

    College Springs, IA Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). College Springs, IA Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b2297cea-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Iowa, College Springs
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of College Springs by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of College Springs across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of male population, with 56.68% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the College Springs is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of College Springs total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for College Springs Population by Race & Ethnicity. You can refer the same here

  4. Performance vs. Predicted Performance

    • kaggle.com
    Updated Dec 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calathea21 (2022). Performance vs. Predicted Performance [Dataset]. http://doi.org/10.34740/kaggle/dsv/4752670
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Calathea21
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains information about high school students and their actual and predicted performance on an exam. Most of the information, including some general information about high school students and their grade for an exam, was based on an already existing dataset, while the predicted exam performance was based on a human experiment. In this experiment, participants were shown short descriptions of the students (based on the information in the original data) and had to rank and grade according to their expected performance. Prior to this task some participants were exposed to some "Stereotype Activation", suggesting that boys perform less well in school than girls.

    Description of *original_data.csv*

    Based on this dataset (which is also available on kaggle), we extracted a number of student profiles that participants had to make grade predictions for. For more information about this dataset we refer to the corresponding kaggle page: https://www.kaggle.com/datasets/uciml/student-alcohol-consumption

    Note that we performed some preprocessing on the original data:

    • The original data consisted of two parts: the information about students following a Maths course and the information about students following a Portuguese course. Since in both datasets the same type of information was recorded, we merged both datasets and added a column "subject", to show which course each student belongs to

    • We excluded all data where G3 = 0 (i.e. the grade for the last exam = 0)

    • From original_data.csv we randomly sampled 856 students that participants in our study had to make grade predictions for.

    Description of *CompleteDataAndBiases.csv*

    index - this column corresponds to the indeces in the file "original_data.csv". Through these indices, it is possible to add columns from the original data to the dataset with the grade prediction

    ParticipantID - the ID of the participant who made the performance predictions for the corresponding student. Predictions needed to be made for 856 students, and each participant made 8 predictions total. Thus there are 107 different participant IDs

    name - to make the prediction task more engaging for participants, each of the 8 student profiles, that participants had to grade & rank was randomly matched to one of four boy/girl's names (depending on the sex of the student)

    sex - the sex of each student, either female (F) or male (M). For benchmarking fair ML algorithms, this can be used as the sensitive attribute. We assume that in the fair version of the decision variable ("Pass"), no sex discrimination occurs. The biased versions of the variable ("Predicted Pass") are mostly discriminatory towards male students.

    studytime - this variable is taken from the original dataset and denotes how long a student studied for their exam. In the original data this variable consisted of four levels (less than 2 hours vs. 2-5 hours vs. 5-10 hours vs. more than 10 hours). We binned the latter two levels together and encoded this column numerically from 1-3.

    freetime - Originally, this variable ranged from 1 (very low) to 5 (very high). We binned this variable into three categories, where level 1 and 2 are binned, as well as level 4 and 5.

    romantic - Binary variable, denoting whether the student is in a romantic relationship or not.

    Walc - This variable shows how much alcohol each student consumes in the weekend. Originally it ranged from 1 to 5 (5 corresponding to the highest alcohol consumption), but we binned the last two levels together.

    goout - This variable shows how often a student goes out in a week. Originally it ranged from 1 to 5 (5 corresponding to going out very often), but we binned the last two levels together.

    Parents_edu - This variable was not present in the original dataset. Instead, the original dataset consisted of two variables "mum_edu" and "dad_edu". We obtained "Parents_edu" by taking the higher one of both. The variable consist of 4 levels, whereas 4 = highest level of education.

    absences - This variable shows the number of absences per student. Originally it ranged from 0 - 93, but because large number of absences were infrequent we binned all absences of >=7 into one level.

    reason - The reason for why a student chose to go to the school in question. The levels are close to home, school's reputation, school's curricular and other

    G3 - The actual grade each student received for the final exam of the course, ranging from 0-20.

    Pass - A binary variable showing whether G3 is a passing grade (i.e. >=10) or not.

    Predicted Grade - The grade the student was predicted to receive in our experiment

    Predicted Rank - In our ex...

  5. The Simpsons character classification, an example

    • kaggle.com
    zip
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan (2020). The Simpsons character classification, an example [Dataset]. https://www.kaggle.com/jfgm2018/the-simpsons-dataset-compilation-49-characters
    Explore at:
    zip(1531502158 bytes)Available download formats
    Dataset updated
    Apr 27, 2020
    Authors
    Juan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    After reading the inspiring work from Alexandre Attia (https://medium.com/alex-attia-blog/the-simpsons-character-recognition-using-keras-d8e1796eae36), and with the goal to learn how to benefit from the machine learning potential, I made my own approach to "The Simpsons" character recognition and detection problem.

    Most part of the work collecting and labeling the images was already done by Alexandre (https://www.kaggle.com/alexattia/the-simpsons-characters-dataset). Also, his approach was clearly explained, and I would say, it is in most senses, the best I found. My approach will be far more simple, and also less efficient, as I was considering a rough approach to the problem, from the point my point of view, a complete beginner.

    At the time I began making this project (2019), the set from Alexandre was not as complete as it is nowadays. For this reason, I complete some of the characters from the original 22 when I first downloaded the character set up to 49 characters.

    This approach is based on Haar Cascades for character recognition and CNN for clasification.

    Content

    The CNN to recognize data characters needs the help of a Haar Cascade. Haar Cascades are not the best approach to this problem, but as they are easy to train and we can use the dataset for the CNN and the Haar Cascade, it was adopted for this version. In order to make Haar Casacades more efficient, datasets were processed so only the faces of the characters were used for training. That means also that our CNN will recognize the characters only by the face and not the shape of their bodies.

    Haar Cascades were trained in the usual way, using opencv and the procedure stated elsewhere (there are several works about Haar Cascades). To make a brief review of the process: 1. Create the set of images that will be used as "positives" images, i.e., those that contains the face of the character. Usually storage in a folder called "info". 2. Create a file ("info.lst") with a list of all the images that will be used for the Haar Cascade training. In this step, data were processed so only the face was in the image. As in the "info.lst" it should be the name of the file and the coordinates of the region of interest, that means that all the image is our region of interest. As we have enough images in this case, positive images creation from negative images (those where the face was not in) has not been done. 3. Create a folder with the set of all negative images (usually, "neg"). 4. Run the command from opencv with the desired options (all these options are explained on https://docs.opencv.org/master/dc/d88/tutorial_traincascade.html). For example:

    opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 6000 -numNeg 3000 -numStages 50 -w 64 -h 64 -precalcValBufSize 16384 -precalIdxBufSize 16384

    Once trained, the Haar Cascade will give a "cascade.xml" file as a result. In order to test the CNN we will use the trained CNN with the trained cascade, through a python program. The Haar Cascade will take the image and recognize that there is a face. The face will be "cut" from the image and send to the CNN model that will recognize the character.

    Acknowledgements

    I would to acknowledge the really good approach and motivation from Alexandre Attia and the help of Javier Sotres that makes possible the training of this test.

    Inspiration

    The main insipiration of this test was to make a quick approach to the problem of feature recognition, although using a funnier dataset. The best conclusion we could make from this effort is that machine learning could be easily implemented for quick routines in a laboratory, where images are one of the most important sources of data.

  6. Job Offers Web Scraping Search

    • kaggle.com
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Offers Web Scraping Search

    Targeted Results to Find the Optimal Work Solution

    By [source]

    About this dataset

    This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

    • Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

    • Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

    • Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

    • Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

      All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

    Research Ideas

    • Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
    • The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
    • It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  7. F

    Italian Closed Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Italian Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/italian-closed-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    The Italian Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Italian language, advancing the field of artificial intelligence.

    Dataset Content:

    This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Italian. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Italian people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled Italian Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    The Italian versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Italian Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.

  8. F

    Bahasa Open Ended Classification Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Bahasa Open Ended Classification Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-open-ended-classification-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the Bahasa Open Ended Classification Prompt-Response Dataset—an extensive collection of 3000 meticulously curated prompt and response pairs. This dataset is a valuable resource for training Language Models (LMs) to classify input text accurately, a crucial aspect in advancing generative AI.

    Dataset Content:

    This open-ended classification dataset comprises a diverse set of prompts and responses where the prompt contains input text to be classified and may also contain task instruction, context, constraints, and restrictions while completion contains the best classification category as response. Both these prompts and completions are available in Bahasa language. As this is an open-ended dataset, there will be no options given to choose the right classification category as a part of the prompt.

    These prompt and completion pairs cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more. Each prompt is accompanied by a response, providing valuable information and insights to enhance the language model training process. Both the prompt and response were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This open-ended classification prompt and completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains prompts and responses with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Prompt Diversity:

    To ensure diversity, this open-ended classification dataset includes prompts with varying complexity levels, ranging from easy to medium and hard. Additionally, prompts are diverse in terms of length from short to medium and long, creating a comprehensive variety. The classification dataset also contains prompts with constraints and persona restrictions, which makes it even more useful for LLM training.

    Response Formats:

    To accommodate diverse learning experiences, our dataset incorporates different types of responses depending on the prompt. These formats include single-word, short phrase, and single sentence type of response. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled Bahasa Open Ended Classification Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt length, prompt complexity, domain, response, response type, and rich text presence.

    Quality and Accuracy:

    Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

    The Bahasa version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom open-ended classification prompt and completion data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Bahasa Open Ended Classification Prompt-Completion Dataset to enhance the classification abilities and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

  9. WikiTableQuestions (Semi-structured Tables Q&A)

    • kaggle.com
    Updated Nov 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). WikiTableQuestions (Semi-structured Tables Q&A) [Dataset]. https://www.kaggle.com/datasets/thedevastator/investigation-of-semi-structured-tables-wikitabl
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Investigation of Semi-Structured Tables: WikiTableQuestions

    A Dataset of Complex Questions on Semi-Structured Wikipedia Tables

    By [source]

    About this dataset

    The WikiTableQuestions dataset poses complex questions about the contents of semi-structured Wikipedia tables. Beyond merely testing a model's knowledge retrieval capabilities, these questions require an understanding of both the natural language used and the structure of the table itself in order to provide a correct answer. This makes the dataset an excellent testing ground for AI models that aim to replicate or exceed human-level intelligence

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use the WikiTableQuestions dataset, you will need to first understand the structure of the dataset. The dataset is comprised of two types of files: questions and answers. The questions are in natural language, and are designed to test a model's ability to understand the table structure, understand the natural language question, and reason about the answer. The answers are in a list format, and provide additional information about each table that can be used to answer the questions.

    To start working with the WikiTableQuestions dataset, you will need to download both the questions and answers files. Once you have downloaded both files, you can begin working with the dataset by loading it into a pandas dataframe. From there, you can begin exploring the data and developing your own models for answering the questions.

    Happy Kaggling!

    Research Ideas

    • The WikiTableQuestions dataset can be used to train a model to answer complex questions about semi-structured Wikipedia tables.

    • The WikiTableQuestions dataset can be used to train a model to understand the structure of semi-structured Wikipedia tables.

    • The WikiTableQuestions dataset can be used to train a model to understand the natural language questions and reason about the answers

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: 0.csv

    File: 1.csv

    File: 10.csv

    File: 11.csv

    File: 12.csv

    File: 14.csv

    File: 15.csv

    File: 17.csv

    File: 18.csv

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  10. w

    Dataset of books called Eat in : the best food is made at home

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Eat in : the best food is made at home [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Eat+in+%3A+the+best+food+is+made+at+home
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Eat in : the best food is made at home. It features 7 columns including author, publication date, language, and book publisher.

  11. Z

    Dataset for: The Evolution of the Manosphere Across the Web

    • data.niaid.nih.gov
    Updated Aug 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emiliano De Cristofaro (2020). Dataset for: The Evolution of the Manosphere Across the Web [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4007912
    Explore at:
    Dataset updated
    Aug 30, 2020
    Dataset provided by
    Manoel Horta Ribeiro
    Gianluca Stringhini
    Jeremy Blackburn
    Summer Long
    Stephanie Greenberg
    Savvas Zannettou
    Barry Bradlyn
    Emiliano De Cristofaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Evolution of the Manosphere Across the Web

    We make available data related to subreddit and standalone forums from the manosphere.

    We also make available Perspective API annotations for all posts.

    You can find the code in GitHub.

    Please cite this paper if you use this data:

    @article{ribeiroevolution2021, title={The Evolution of the Manosphere Across the Web}, author={Ribeiro, Manoel Horta and Blackburn, Jeremy and Bradlyn, Barry and De Cristofaro, Emiliano and Stringhini, Gianluca and Long, Summer and Greenberg, Stephanie and Zannettou, Savvas}, booktitle = {{Proceedings of the 15th International AAAI Conference on Weblogs and Social Media (ICWSM'21)}}, year={2021} }

    1. Reddit data

    We make available data for forums and for relevant subreddits (56 of them, as described in subreddit_descriptions.csv). These are available, 1 line per post in each subreddit Reddit in /ndjson/reddit.ndjson. A sample for example is:

    { "author": "Handheld_Gaming", "date_post": 1546300852, "id_post": "abcusl", "number_post": 9.0, "subreddit": "Braincels", "text_post": "Its been 2019 for almost 1 hour And I am at a party with 120 people, half of them being foids. The last year had been the best in my life. I actually was happy living hope because I was redpilled to the death.

    Now that I am blackpilled I see that I am the shortest of all men and that I am the only one with a recessed jaw.

    Its over. Its only thanks to my age old friendship with chads and my social skills I had developed in the past year that a lot of men like me a lot as a friend.

    No leg lengthening syrgery is gonna save me. Ignorance was a bliss. Its just horror now seeing that everyone can make out wirth some slin hoe at the party.

    I actually feel so unbelivably bad for turbomanlets. Life as an unattractive manlet is a pain, I cant imagine the hell being an ugly turbomanlet is like. I would have roped instsntly if I were one. Its so unfair.

    Tallcels are fakecels and they all can (and should) suck my cock.

    If I were 17cm taller my life would be a heaven and I would be the happiest man alive.

    Just cope and wait for affordable body tranpslants.", "thread": "t3_abcusl" }

    1. Forums

    We here describe the .sqlite and .ndjson files that contain the data from the following forums.

    (avfm) --- https://d2ec906f9aea-003845.vbulletin.net (incels) --- https://incels.co/ (love_shy) --- http://love-shy.com/lsbb/ (redpilltalk) --- https://redpilltalk.com/ (mgtow) --- https://www.mgtow.com/forums/ (rooshv) --- https://www.rooshvforum.com/ (pua_forum) --- https://www.pick-up-artist-forum.com/ (the_attraction) --- http://www.theattractionforums.com/

    The files are in folders /sqlite/ and /ndjson.

    2.1 .sqlite

    All the tables in the sqlite. datasets follow a very simple {key:value} format. Each key is a thread name (for example /threads/housewife-is-like-a-job.123835/) and each value is a python dictionary or a list. This file contains three tables:

    idx each key is the relative address to a thread and maps to a post. Each post is represented by a dict:

    "type": (list) in some forums you can add a descriptor such as [RageFuel] to each topic, and you may also have special types of posts, like sticked/pool/locked posts.
    "title": (str) title of the thread; "link": (str) link to the thread; "author_topic": (str) username that created the thread; "replies": (int) number of replies, may differ from number of posts due to difference in crawling date; "views": (int) number of views; "subforum": (str) name of the subforum; "collected": (bool) indicates if raw posts have been collected; "crawled_idx_at": (str) datetime of the collection.

    processed_posts each key is the relative address to a thread and maps to a list with posts (in order). Each post is represented by a dict:

    "author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.

    raw_posts each key is the relative address to a thread and maps to a list with unprocessed posts (in order). Each post is represented by a dict:

    "post_raw": (binary) raw html binary; "crawled_at": (str) datetime of the collection.

    2.2 .ndjson

    Each line consists of a json object representing a different comment with the following fields:

    "author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.

    1. Perspective

    We also run each post and reddit post through perspective, the files are located in the /perspective/ folder. They are compressed with gzip. One example output

    { "id_post": 5200, "hate_output": { "text": "I still can\u2019t wrap my mind around both of those articles about these c~~~s sleeping with poor Haitian Men. Where\u2019s the uproar?, where the hell is the outcry?, the \u201cpig\u201d comments or the \u201ccreeper comments\u201d. F~~~ing hell, if roles were reversed and it was an article about Men going to Europe where under 18 sex in legal, you better believe they would crucify the writer of that article and DEMAND an apology by the paper that wrote it.. This is exactly what I try and explain to people about the double standards within our modern society. A bunch of older women, wanna get their kicks off by sleeping with poor Men, just before they either hit or are at menopause age. F~~~ing unreal, I\u2019ll never forget going to Sweden and Norway a few years ago with one of my buddies and his girlfriend who was from there, the legal age of consent in Norway is 16 and in Sweden it\u2019s 15. I couldn\u2019t believe it, but my friend told me \u201c hey, it\u2019s normal here\u201d . Not only that but the age wasn\u2019t a big different in other European countries as well. One thing i learned very quickly was how very Misandric Sweden as well as Denmark were.", "TOXICITY": 0.6079781, "SEVERE_TOXICITY": 0.53744453, "INFLAMMATORY": 0.7279288, "PROFANITY": 0.58842486, "INSULT": 0.5511079, "OBSCENE": 0.9830818, "SPAM": 0.17009115 } }

    1. Working with sqlite

    A nice way to read some of the files of the dataset is using SqliteDict, for example:

    from sqlitedict import SqliteDict processed_posts = SqliteDict("./data/forums/incels.sqlite", tablename="processed_posts")

    for key, posts in processed_posts.items(): for post in posts: # here you could do something with each post in the dataset pass

    1. Helpers

    Additionally, we provide two .sqlite files that are helpers used in the analyses. These are related to reddit, and not to the forums! They are:

    channel_dict.sqlite a sqlite where each key corresponds to a subreddit and values are lists of dictionaries users who posted on it, along with timestamps.

    author_dict.sqlite a sqlite where each key corresponds to an author and values are lists of dictionaries of the subreddits they posted on, along with timestamps.

    These are used in the paper for the migration analyses.

    1. Examples and particularities for forums

    Although we did our best to clean the data and be consistent across forums, this is not always possible. In the following subsections we talk about the particularities of each forum, directions to improve the parsing which were not pursued as well as give some examples on how things work in each forum.

    6.1 incels

    Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.

    types: for the incel forums the special types associated with each thread in the idx table are “Sticky”, “Pool”, “Closed”, and the custom types added by users, such as [LifeFuel]. These last ones are all in brackets. You can see some examples of these in the on the example thread page.

    quotes: quotes in this forum were quite nice and thus, all quotations are deterministic.

    6.2 LoveShy

    Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.

    types: no types were parsed. There are some rules in the forum, but not significant.

    quotes: quotes were obtained from exact text+author match, or author match + a jaccard

  12. N

    Great Bend, ND Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Great Bend, ND Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b235eaca-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North Dakota, Great Bend
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Great Bend by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Great Bend across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of male population, with 64.29% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Great Bend is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Great Bend total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Great Bend Population by Race & Ethnicity. You can refer the same here

  13. A

    ‘Jiffs house price prediction dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Jiffs house price prediction dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-jiffs-house-price-prediction-dataset-458f/1a7ff5ac/?iid=048-724&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Jiffs house price prediction dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/elakiricoder/jiffs-house-price-prediction-dataset on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    I have previously shared a classification based dataset to classify the gender which is liked by those who are new to machine learning as it give a pretty good accuracy, which encouraged me to create a regression dataset to predict continues values. I have tried many real world datasets for regression problems which are predicting with lower accuracy and high error rate. As a beginner, I have struggled and worried why and how the dataset performs poorly. This is another main reason why I created this dataset. Although this is a made up dataset, I have considered all the features when deciding the price of the property. If you are a beginner, you would love to try this as the results are stunning..

    Content

    Since this is a populated data, I will straightaway explain the features and the label. FEATURES 1. land_size_sqm - This the total size of the land in square meters. 2. house_size_sqm - This is the area in which house is located within the land. This is measured in square meters. 3. no_of_rooms - This indicates the number of rooms available in the house. 4. no_of_bathrooms - This shows the number of total bathrooms made in the house. 5. large_living_room - This indicates whether the house includes a larger living room or not. The assumption is that all the houses contain a living room. This feature attempts to classify whether it's large or small where '1' means large and '0' means small. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 6. parking_space - This indicates whether there is a parking space or not. '1' represents the parking available while '0' represents no parking space available. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 7. front_garden - This shows whether there is a garden available in front of the house. '1' means the garden available and '0' means no garden available. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 8. swimming_pool - This shows the availability of the swimming pool at the house. 1 represents the availability of the swimming pool while 0 represents the non availability of the same. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 9. distance_to_school_km - This shows the distance from the house to the nearest school in Kilometers. 10. wall_fence - This shows whether there is a wall fence or not. '1' mean there is wall fence and '0' means no wall fence. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 11. **house_age_or_renovated **- This is either the age of the house in years or the period from the date of renovation. 12. water_front - this indicates whether the house is located in front of the water or not. 1 means waterfront and 0 means its not located near the water. However in the categorical dataset, 1 and 0 are represented with 'yes' and 'No' respectively. 13. distance_to_supermarket_km - what is the distance to the nearest supermarket in kilometers.

    LABEL property_value - This is the price of the property

    Following features are only available in the "house price dataset original v2 cleaned" and "house price dataset original v2 with categorical features" data only. 14. crime_rate - its in float and falls between 0 and 7. lesser the better 15. room_size - As the name suggests, it explains the size of the room. 0 is being 'small', 1 is being 'medium', 2 is 'large' and 3 is being 'Extra large'. However in the categorical dataset, these values are categorical and self explanatory.

    Acknowledgements

    I spent around 3 hours creating this dataset. Enjoy..

    Inspiration

    Share your notebooks to see which algorithm predicts the house price precisely.

    --- Original source retains full ownership of the source dataset ---

  14. N

    Median Household Income by Racial Categories in Good Thunder, MN (, in 2023...

    • neilsberg.com
    csv, json
    Updated Mar 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Median Household Income by Racial Categories in Good Thunder, MN (, in 2023 inflation-adjusted dollars) [Dataset]. https://www.neilsberg.com/research/datasets/e0a3edf5-f665-11ef-a994-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 1, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Good Thunder, Minnesota
    Variables measured
    Median Household Income for Asian Population, Median Household Income for Black Population, Median Household Income for White Population, Median Household Income for Some other race Population, Median Household Income for Two or more races Population, Median Household Income for American Indian and Alaska Native Population, Median Household Income for Native Hawaiian and Other Pacific Islander Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the median household income within each racial category idetified by the US Census Bureau, we conducted an initial analysis and categorization of the data. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). It is important to note that the median household income estimates exclusively represent the identified racial categories and do not incorporate any ethnicity classifications. Households are categorized, and median incomes are reported based on the self-identified race of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the median household income across different racial categories in Good Thunder. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.

    Key observations

    Based on our analysis of the distribution of Good Thunder population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 96.45% of the total residents in Good Thunder. Notably, the median household income for White households is $72,045. Interestingly, White is both the largest group and the one with the highest median household income, which stands at $72,045.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race of the head of household: This column presents the self-identified race of the household head, encompassing all relevant racial categories (excluding ethnicity) applicable in Good Thunder.
    • Median household income: Median household income, adjusting for inflation, presented in 2023-inflation-adjusted dollars

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Good Thunder median household income by race. You can refer the same here

  15. N

    Good Hope township, Itasca County, Minnesota Population Breakdown by Gender

    • neilsberg.com
    csv, json
    Updated Sep 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Good Hope township, Itasca County, Minnesota Population Breakdown by Gender [Dataset]. https://www.neilsberg.com/research/datasets/648c95b8-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 14, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Minnesota, Itasca County, Good Hope Township
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Good Hope township by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Good Hope township across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of male population, with 60.0% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Good Hope township is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Good Hope township total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Good Hope township Population by Gender. You can refer the same here

  16. N

    Great Falls, MT Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Great Falls, MT Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/d062c807-c980-11ee-9145-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montana, Great Falls
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Great Falls by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Great Falls across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of female population, with 50.51% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Great Falls is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Great Falls total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Great Falls Population by Race & Ethnicity. You can refer the same here

  17. N

    Portland, IN Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Portland, IN Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b24d64e4-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Portland, IN
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Portland by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Portland across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of male population, with 51.65% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Portland is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Portland total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Portland Population by Race & Ethnicity. You can refer the same here

  18. N

    Zanesville, IN Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Zanesville, IN Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b25ec9e2-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Zanesville
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Zanesville by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Zanesville across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of female population, with 54.97% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Zanesville is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Zanesville total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Zanesville Population by Race & Ethnicity. You can refer the same here

  19. N

    Hope, ID Population Breakdown by Gender Dataset: Male and Female Population...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Hope, ID Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b239ba8f-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hope
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Hope by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Hope across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of female population, with 54.88% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Hope is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Hope total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Hope Population by Race & Ethnicity. You can refer the same here

  20. N

    Rosedale, IN Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Rosedale, IN Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b25073ca-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rosedale
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Rosedale by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Rosedale across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of male population, with 51.11% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Rosedale is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Rosedale total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Rosedale Population by Race & Ethnicity. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GTS GTS (2025). Fruits Dataset for Classification [Dataset]. http://doi.org/10.17632/rg254yr63x.1

Fruits Dataset for Classification

Explore at:
Dataset updated
Feb 11, 2025
Authors
GTS GTS
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

About Dataset (strawberries, peaches, pomegranates) Photo requirements: 1-White background 2-.jpg 3- Image size 300*300 The number of photos required is 250 photos of each fruit when it is fresh and 250 photos of each Fruit Dataset for Classification when it is rotten. Total 1500 images

Diverse Collection With a diverse collection of Product images, the files provides an excellent foundation for developing and testing machine learning models designed for image recognition and allocation. Each image is captured under different lighting conditions and backgrounds, offering a realistic challenge for algorithms to overcome.

Real-World Applications The variability in the dataset ensures that models trained on it can generalize well to real-world scenarios, making them robust and reliable. The dataset includes common fruits such as apples, bananas, oranges, and strawberries, among others, allowing for comprehensive training and evaluation.

Industry Use Cases One of the significant advantages of using the Fruits Dataset for Classification is its applicability in various fields such as agriculture, retail, and the food industry. In agriculture, it can help automate the process of fruit sorting and grading, enhancing efficiency and reducing labor costs. In retail, it can be used to develop automated checkout systems that accurately identify fruits, streamlining the purchasing process.

Educational Value The dataset is also valuable for educational purposes, providing students and educators with a practical tool to learn and teach machine learning concepts. By working with this dataset, learners can gain hands-on experience in data preprocessing, model training, and evaluation.

Conclusion The Fruits Dataset for Classification is a versatile and indispensable resource for advancing the field of image classification. Its diverse and high-quality images, coupled with practical applications, make it a go-to dataset for researchers, developers, and educators aiming to improve and innovate in machine learning and computer vision.

This dataset is sourced from Kaggle.

Search
Clear search
Close search
Google apps
Main menu