100+ datasets found

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

zenodo.org
explore.openaire.eu
+1more

csv, zip

Updated Jan 27, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. http://doi.org/10.5281/zenodo.5907002

Explore at:

zip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5907002

Dataset updated

Jan 27, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

This archive contains the ApacheJIT dataset presented in the paper "ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction" as well as the replication package. The paper is submitted to MSR 2022 Data Showcase Track.

The datasets are available under directory dataset. There are 4 datasets in this directory.

1. apachejit_total.csv: This file contains the entire dataset. Commits are specified by their identifier and a set of commit metrics that are explained in the paper are provided as features. Column buggy specifies whether or not the commit introduced any bug into the system.
2. apachejit_train.csv: This file is a subset of the entire dataset. It provides a balanced set that we recommend for models that are sensitive to class imbalance. This set is obtained from the first 14 years of data (2003 to 2016).
3. apachejit_test_large.csv: This file is a subset of the entire dataset. The commits in this file are the commits from the last 3 years of data. This set is not balanced to represent a real-life scenario in a JIT model evaluation where the model is trained on historical data to be applied on future data without any modification.
4. apachejit_test_small.csv: This file is a subset of the test file explained above. Since the test file has more than 30,000 commits, we also provide a smaller test set which is still unbalanced and from the last 3 years of data.

In addition to the dataset, we also provide the scripts using which we built the dataset. These scripts are written in Python 3.8. Therefore, Python 3.8 or above is required. To set up the environment, we have provided a list of required packages in file requirements.txt. Additionally, one filtering step requires GumTree [1]. For Java, GumTree requires Java 11. For other languages, external tools are needed. Installation guide and more details can be found here.

The scripts are comprised of Python scripts under directory src and Python notebooks under directory notebooks. The Python scripts are mainly responsible for conducting GitHub search via GitHub search API and collecting commits through PyDriller Package [2]. The notebooks link the fixed issue reports with their corresponding fixing commits and apply some filtering steps. The bug-inducing candidates then are filtered again using gumtree.py script that utilizes the GumTree package. Finally, the remaining bug-inducing candidates are combined with the clean commits in the dataset_construction notebook to form the entire dataset.

More specifically, git_token.py handles GitHub API token that is necessary for requests to GitHub API. Script collector.py performs GitHub search. Tracing changed lines and git annotate is done in gitminer.py using PyDriller. Finally, gumtree.py applies 4 filtering steps (number of lines, number of files, language, and change significance).

References:

1. GumTree

* https://github.com/GumTreeDiff/gumtree

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14,Vasteras, Sweden - September 15 - 19, 2014. 313–324

2. PyDriller

* https://pydriller.readthedocs.io/en/latest/

* Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE2018). Association for Computing Machinery, New York, NY, USA, 908–911

P
Meta-Dataset Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle, Meta-Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/meta-dataset
Explore at:
Authors
Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle
Description
The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).

All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.
Z
Data from: #PraCegoVer dataset
data.niaid.nih.gov
Updated Jan 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
Explore at:
Dataset updated
Jan 19, 2023
Dataset provided by
Sandra Avila
Esther Luna Colombini
Gabriel Oliveira dos Santos
Description
Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

Dataset Structure

PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

containing the images. The file dataset.json comprehends a list of json objects with the attributes:

user: anonymized user that made the post;

filename: image file name;

raw_caption: raw caption;

caption: clean caption;

date: post date.

Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

Download Instructions

If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

python download_dataset.py --access_token=
TMDb Top 10,000 Popular Movies Dataset
kaggle.com
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Balaka Biswas (2020). TMDb Top 10,000 Popular Movies Dataset [Dataset]. https://www.kaggle.com/balaka18/tmdb-top-10000-popular-movies-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Balaka Biswas
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Introduction

This is dataset of the 10,000 most popular movies across the world, irrespective of language and recency. These have been extracted using TMDb API.

About the Dataset

What is TMDB's API? The closed-source API service is for those people interested in using their movies, TV shows or actor images and/or data in their application. TMDb's API is a system that they provide for developers and their team to programmatically fetch and use TMDb's data and/or images. Their API is free to use as long as you attribute TMDb as the source of the data and/or images. Also, they update their API from time to time.

This dataset lists 10,000 most popular movies across the globe. Information held inside the dataset - A. Dataset 1 : Movies dataset - 1. title - Title of the Movie in English. 2. overview - A small summary of the plot. 3. original_lang - Original language it was shot in. 4. rel_date - Date of release. 5. popularity - Popularity. 6. vote_count - Votes received. 7. vote_average - Average of all votes received.

B. Dataset 2 : Genres dataset 1. id 2. Movie ID 3. Genre
LinkedIn Datasets
brightdata.com
.json, .csv, .xlsx
Updated Dec 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 17, 2021
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
P
LAS&T: Large Shape & Texture Dataset Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAS&T: Large Shape & Texture Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/las-t-large-shape-texture-dataset
Explore at:
Description
Large Shape and Texture dataset (LAS&T) is a giant dataset of shapes and textures for tasks of visual shapes and textures identification and retrieval from single image.

LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures

Overview The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

The dataset divided to several parts 3D shape recognition and retrieval.

2D shape recognition and retrieval.

3D Materials recognition and retrieval.

2D Texture recognition and retrieval.

Each can be used independently for training and testing.

Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images

3D shape recognition real-world images benchmark
c
Fox News dataset is for analyzing media trends and narratives
crawlfeeds.com
csv, zip
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

Key Features of the Fox News Dataset

Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.

Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.

Format: Provided in CSV format for seamless integration into analytical and research tools.

Why Use This Dataset?

This large dataset is ideal for:

Text Classification: Develop machine learning models to classify and categorize news content.

Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.

Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.

Trend Analysis: Identify shifts in public discourse and media focus over time.

Explore More News Datasets

Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
T
imdb_reviews
tensorflow.org
kaggle.com
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
Explore at:
Dataset updated
Sep 20, 2024
Description
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imdb_reviews', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
w
Dataset of highest price of stocks over time for THK.MC
workwithdata.com
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of highest price of stocks over time for THK.MC [Dataset]. https://www.workwithdata.com/datasets/stocks-daily?col=date%2Chighest_price%2Cstock&f=1&fcol0=stock&fop0=%3D&fval0=THK.MC
Explore at:
Dataset updated
May 6, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about stocks per day. It has 245 rows and is filtered where the stock is THK.MC. It features 3 columns: stock, and highest price.
"9,565 Top-Rated Movies Dataset"
kaggle.com
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harshit@85 (2024). "9,565 Top-Rated Movies Dataset" [Dataset]. https://www.kaggle.com/datasets/harshit85/9565-top-rated-movies-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Harshit@85
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About the Dataset

Title: 9,565 Top-Rated Movies Dataset

Description:
This dataset offers a comprehensive collection of 9,565 of the highest-rated movies according to audience ratings on the Movie Database (TMDb). The dataset includes detailed information about each movie, such as its title, overview, release date, popularity score, average vote, and vote count. It is designed to be a valuable resource for anyone interested in exploring trends in popular cinema, analyzing factors that contribute to a movie’s success, or building recommendation engines.

Key Features: - Title: The official title of each movie. - Overview: A brief synopsis or description of the movie's plot. - Release Date: The release date of the movie, formatted as YYYY-MM-DD. - Popularity: A score indicating the current popularity of the movie on TMDb, which can be used to gauge current interest. - Vote Average: The average rating of the movie, based on user votes. - Vote Count: The total number of votes the movie has received.

Data Source: The data was sourced from the TMDb API, a well-regarded platform for movie information, using the /movie/top_rated endpoint. The dataset represents a snapshot of the highest-rated movies as of the time of data collection.

Data Collection Process: - API Access: Data was retrieved programmatically using TMDb’s API. - Pagination Handling: Multiple API requests were made to cover all pages of top-rated movies, ensuring the dataset’s comprehensiveness. - Data Aggregation: Collected data was aggregated into a single, unified dataset using the pandas library. - Cleaning: Basic data cleaning was performed to remove duplicates and handle missing or malformed data entries.

Potential Uses: - Trend Analysis: Analyze trends in movie ratings over time or compare ratings across different genres. - Recommendation Systems: Build and train models to recommend movies based on user preferences. - Sentiment Analysis: Perform text analysis on movie overviews to understand common themes and sentiments. - Statistical Analysis: Explore the relationship between popularity, vote count, and average ratings.

Data Format: The dataset is provided in a structured tabular format (e.g., CSV), making it easy to load into data analysis tools like Python, R, or Excel.

Usage License: The dataset is shared under [appropriate license], ensuring that it can be used for educational, research, or commercial purposes, with proper attribution to the data source (TMDb).

This description provides a clear and detailed overview, helping potential users understand the dataset's content, origin, and potential applications.
w
Dataset of highest price of stocks over time for GGX.AX
workwithdata.com
Updated May 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of highest price of stocks over time for GGX.AX [Dataset]. https://www.workwithdata.com/datasets/stocks-daily?col=date%2Chighest_price%2Cstock&f=1&fcol0=stock&fop0=%3D&fval0=GGX.AX
Explore at:
Dataset updated
May 6, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about stocks per day. It has 5,065 rows and is filtered where the stock is GGX.AX. It features 3 columns: stock, and highest price.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Z
Data from: Caravan - A global community dataset for large-sample hydrology
data.niaid.nih.gov
zenodo.org
Updated Jan 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erickson, Tyler (2025). Caravan - A global community dataset for large-sample hydrology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6522634
Explore at:
Dataset updated
Jan 16, 2025
Dataset provided by
Gilon, Oren
Gudmundsson, Lukas
Erickson, Tyler
Hassidim, Avinatan
Kratzert, Frederik
Nearing, Grey
Klotz, Daniel
Nevo, Sella
Addor, Nans
Matias, Yossi
Gauch, Martin
Shalev, Guy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

Channel Log:

23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan

10 May 2023: Version 1.1 - No data change, just update data description.

17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.

16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").
N
Median Household Income Variation by Family Size in Big Bend Township,...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Big Bend Township, Minnesota: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1aafb9d9-73fd-11ee-949f-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Big Bend Township, Minnesota
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Big Bend Township, Minnesota, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Big Bend township did not include 6, or 7-person households. Across the different household sizes in Big Bend township the mean income is $113,198, and the standard deviation is $40,645. The coefficient of variation (CV) is 35.91%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $71,610. It then further increased to $134,438 for 5-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/big-bend-township-mn-median-household-income-by-household-size.jpeg" alt="Big Bend Township, Minnesota median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Big Bend township median household income. You can refer the same here
N
Median Household Income Variation by Family Size in Big Sandy, TX:...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Big Sandy, TX: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1aaff022-73fd-11ee-949f-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Texas, Big Sandy
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Big Sandy, TX, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Big Sandy did not include 5, 6, or 7-person households. Across the different household sizes in Big Sandy the mean income is $53,721, and the standard deviation is $16,288. The coefficient of variation (CV) is 30.32%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $30,099. It then further increased to $66,110 for 4-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/big-sandy-tx-median-household-income-by-household-size.jpeg" alt="Big Sandy, TX median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Big Sandy median household income. You can refer the same here
N
Income Distribution by Quintile: Mean Household Income in Grant township,...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Grant township, Grand Traverse County, Michigan [Dataset]. https://www.neilsberg.com/research/datasets/949a4a4c-7479-11ee-949f-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, Grand Traverse County, Grant Township
Variables measured
Income Level, Mean Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the mean household income for each of the five quintiles in Grant township, Grand Traverse County, Michigan, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

Key observations

Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 22,447, while the mean income for the highest quintile (20% of households with the highest income) is 203,020. This indicates that the top earners earn 9 times compared to the lowest earners.

*Top 5%: * The mean household income for the wealthiest population (top 5%) is 313,194, which is 154.27% higher compared to the highest quintile, and 1395.26% higher compared to the lowest quintile.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Income Levels:

Lowest Quintile

Second Quintile

Third Quintile

Fourth Quintile

Highest Quintile

Top 5 Percent

Variables / Data Columns

Income Level: This column showcases the income levels (As mentioned above).

Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grant township median household income. You can refer the same here
Efficient Keyword-Based Search for Top-K Cells in Text Cube - Dataset - NASA...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Efficient Keyword-Based Search for Top-K Cells in Text Cube - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/efficient-keyword-based-search-for-top-k-cells-in-text-cube
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.
N
Median Household Income by Racial Categories in Big Spring, TX (, in 2023...
neilsberg.com
csv, json
Updated Mar 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Median Household Income by Racial Categories in Big Spring, TX (, in 2023 inflation-adjusted dollars) [Dataset]. https://www.neilsberg.com/research/datasets/e0909ab4-f665-11ef-a994-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Mar 1, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Big Spring, Texas
Variables measured
Median Household Income for Asian Population, Median Household Income for Black Population, Median Household Income for White Population, Median Household Income for Some other race Population, Median Household Income for Two or more races Population, Median Household Income for American Indian and Alaska Native Population, Median Household Income for Native Hawaiian and Other Pacific Islander Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the median household income within each racial category idetified by the US Census Bureau, we conducted an initial analysis and categorization of the data. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). It is important to note that the median household income estimates exclusively represent the identified racial categories and do not incorporate any ethnicity classifications. Households are categorized, and median incomes are reported based on the self-identified race of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the median household income across different racial categories in Big Spring. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.

Key observations

Based on our analysis of the distribution of Big Spring population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 64.12% of the total residents in Big Spring. Notably, the median household income for White households is $70,684. Interestingly, despite the White population being the most populous, it is worth noting that Asian households actually reports the highest median household income, with a median income of $113,125. This reveals that, while Whites may be the most numerous in Big Spring, Asian households experience greater economic prosperity in terms of median household income.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race of the head of household: This column presents the self-identified race of the household head, encompassing all relevant racial categories (excluding ethnicity) applicable in Big Spring.

Median household income: Median household income, adjusting for inflation, presented in 2023-inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Big Spring median household income by race. You can refer the same here
N
Median Household Income Variation by Family Size in Big Springs, NE:...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Big Springs, NE: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1aaff48c-73fd-11ee-949f-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Big Springs, Nebraska
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Big Springs, NE, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Big Springs did not include 3, 6, or 7-person households. Across the different household sizes in Big Springs the mean income is $57,424, and the standard deviation is $22,664. The coefficient of variation (CV) is 39.47%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $26,482. It then further increased to $80,056 for 5-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/big-springs-ne-median-household-income-by-household-size.jpeg" alt="Big Springs, NE median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Big Springs median household income. You can refer the same here
N
Median Household Income Variation by Family Size in Big Rock, IL:...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Big Rock, IL: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1aafe838-73fd-11ee-949f-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Big Rock, Illinois
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Big Rock, IL, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Big Rock did not include 6, or 7-person households. Across the different household sizes in Big Rock the mean income is $118,294, and the standard deviation is $44,511. The coefficient of variation (CV) is 37.63%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $41,435. It then further increased to $150,215 for 5-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/big-rock-il-median-household-income-by-household-size.jpeg" alt="Big Rock, IL median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Big Rock median household income. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. http://doi.org/10.5281/zenodo.5907002

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

Explore at:

zip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5907002

Dataset updated

Jan 27, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

The datasets are available under directory dataset. There are 4 datasets in this directory.

References:

1. GumTree

* https://github.com/GumTreeDiff/gumtree

2. PyDriller

* https://pydriller.readthedocs.io/en/latest/

Clear search

Close search

Google apps

Main menu

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

Meta-Dataset Dataset

Data from: #PraCegoVer dataset

PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

TMDb Top 10,000 Popular Movies Dataset

Introduction

About the Dataset

LinkedIn Datasets

LAS&T: Large Shape & Texture Dataset Dataset

Fox News dataset is for analyzing media trends and narratives

Key Features of the Fox News Dataset

Why Use This Dataset?

Explore More News Datasets

imdb_reviews

Dataset of highest price of stocks over time for THK.MC

"9,565 Top-Rated Movies Dataset"

About the Dataset

Dataset of highest price of stocks over time for GGX.AX

Geonames - All Cities with a population > 1000

Data from: Caravan - A global community dataset for large-sample hydrology

Median Household Income Variation by Family Size in Big Bend Township,...

About this dataset

Content

Inspiration

Recommended for further research

Median Household Income Variation by Family Size in Big Sandy, TX:...

About this dataset

Content

Inspiration

Recommended for further research

Income Distribution by Quintile: Mean Household Income in Grant township,...

About this dataset

Content

Inspiration

Recommended for further research

Efficient Keyword-Based Search for Top-K Cells in Text Cube - Dataset - NASA...

Median Household Income by Racial Categories in Big Spring, TX (, in 2023...

About this dataset

Content

Inspiration

Recommended for further research

Median Household Income Variation by Family Size in Big Springs, NE:...

About this dataset

Content

Inspiration

Recommended for further research

Median Household Income Variation by Family Size in Big Rock, IL:...

About this dataset

Content

Inspiration

Recommended for further research

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction