Facebook
TwitterThis dataset presents a rich collection of physicochemical parameters from 147 reservoirs distributed across the conterminous U.S. One hundred and eight of the reservoirs were selected using a statistical survey design and can provide unbiased inferences to the condition of all U.S. reservoirs. These data could be of interest to local water management specialists or those assessing the ecological condition of reservoirs at the national scale. These data have been reviewed in accordance with U.S. Environmental Protection Agency policy and approved for publication. This dataset is not publicly accessible because: It is too large. It can be accessed through the following means: https://portal-s.edirepository.org/nis/mapbrowse?scope=edi&identifier=2033&revision=1. Format: This dataset presents water quality and related variables for 147 reservoirs distributed across the U.S. Water quality parameters were measured during the summers of 2016, 2018, and 2020 – 2023. Measurements include nutrient concentration, algae abundance, dissolved oxygen concentration, and water temperature, among many others. Dataset includes links to other national and global scale data sets that provide additional variables.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Global Retail Sales Data provided here is a self-generated synthetic dataset created using Random Sampling techniques provided by the Numpy Package. The dataset emulates information regarding merchandise sales through a retail website set up by a popular fictional influencer based in the US between the '23-'24 period. The influencer would sell clothing, ornaments and other products at variable rates through the retail website to all of their followers across the world. Imagine that the influencer executes high levels of promotions for the materials they sell, prompting more ratings and reviews from their followers, pushing more user engagement.
This dataset is placed to help with practicing Sentiment Analysis or/and Time Series Analysis of sales, etc. as they are very important topics for Data Analyst prospects. The column description is given as follows:
Order ID: Serves as an identifier for each order made.
Order Date: The date when the order was made.
Product ID: Serves as an identifier for the product that was ordered.
Product Category: Category of Product sold(Clothing, Ornaments, Other).
Buyer Gender: Genders of people that have ordered from the website (Male, Female).
Buyer Age: Ages of the buyers.
Order Location: The city where the order was made from.
International Shipping: Whether the product was shipped internationally or not. (Yes/No)
Sales Price: Price tag for the product.
Shipping Charges: Extra charges for international shipments.
Sales per Unit: Sales cost while including international shipping charges.
Quantity: Quantity of the product bought.
Total Sales: Total sales made through the purchase.
Rating: User rating given for the order.
Review: User review given for the order.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterHydrographic and Impairment Statistics (HIS) is a National Park Service (NPS) Water Resources Division (WRD) project established to track certain goals created in response to the Government Performance and Results Act of 1993 (GPRA). One water resources management goal established by the Department of the Interior under GRPA requires NPS to track the percent of its managed surface waters that are meeting Clean Water Act (CWA) water quality standards. This goal requires an accurate inventory that spatially quantifies the surface water hydrography that each bureau manages and a procedure to determine and track which waterbodies are or are not meeting water quality standards as outlined by Section 303(d) of the CWA. This project helps meet this DOI GRPA goal by inventorying and monitoring in a geographic information system for the NPS: (1) CWA 303(d) quality impaired waters and causes; and (2) hydrographic statistics based on the United States Geological Survey (USGS) National Hydrography Dataset (NHD). Hydrographic and 303(d) impairment statistics were evaluated based on a combination of 1:24,000 (NHD) and finer scale data (frequently provided by state GIS layers).
Facebook
TwitterThe North American Dataset contains sets of Maximum, Minimum and Average Temperature data and Precipitation data that are either (1) raw (non-adjusted though flagged for possible quality issues), (2) adjusted due to time of observation bias (TOB) or (3) put through the Pairwise Homogenization Algorithm (PHA). These files contain North American stations and its data are measured in hundredths of degrees Celsius (without decimal place) for temperature and tenths of millimeters (without decimal place) for Precipitation. Each file includes the entire available Period of Record.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following datasets are based on the children and youth (under age 21) beneficiary population and consist of aggregate Mental Health Service data derived from Medi-Cal claims, encounter, and eligibility systems. These datasets were developed in accordance with California Welfare and Institutions Code (WIC) § 14707.5 (added as part of Assembly Bill 470 on 10/7/17). Please contact BHData@dhcs.ca.gov for any questions or to request previous years’ versions of these datasets. Note: The Performance Dashboard AB 470 Report Application Excel tool development has been discontinued. Please see the Behavioral Health reporting data hub at https://behavioralhealth-data.dhcs.ca.gov/ for access to dashboards utilizing these datasets and other behavioral health data.
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Kiawah Island population by age. The dataset can be utilized to understand the age distribution and demographics of Kiawah Island.
The dataset constitues the following three datasets
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides a unique resource for researchers and data scientists interested in the global dynamics of the COVID-19 pandemic. It focuses on the impact of different SARS-CoV-2 variants and mutations on the duration of local epidemics. By combining variant information with epidemiological data, this dataset allows for a comprehensive analysis of factors influencing the trajectory of the pandemic.
Data Source: The data combines information from the Johns Hopkins University COVID-19 dataset (confirmed_cases.csv and deaths_cases.csv) and the covariants.org dataset (variants.csv). The dataset you see here is the combination of two datasets from Johns Hopkins University and covariants.org.
This dataset is designed for a diverse set of analytical questions. Here are some ideas to inspire the Kaggle community:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains daily stock price data for Microsoft and several key companies that have significantly contributed to its growth and success. The dataset includes historical data from 1980 to 2024 for the following companies:
This dataset is ideal for: - Financial analysis: Study stock price trends over time and compare performance across companies. - Time series forecasting: Predict future stock prices using historical data. - Market correlation analysis: Analyze the relationships between Microsoft and its key affiliated companies in different market conditions.
Feel free to use this dataset for your financial and stock market projects, analysis, or machine learning models!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details
The V vs. Q dataset was compiled with a resolution of 0.01 for the triplets and C/25 charges. This accounts for more than 5,000 different paths. Each path was simulated with at most 0.85% increases for each The training dataset, therefore, contains more than 700,000 unique voltage vs. capacity curves.
4 Variables are included, see read me file for details and example how to use. Cell info: Contains information on the setup of the mechanistic model Qnorm: normalize capacity scale for all voltage curves pathinfo: index for simulated conditions for all voltage curves volt: voltage data. Each column corresponds to the voltage simulated under the conditions of the corresponding line in pathinfo.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Calls for Service dataset includes police service requests for which patrol officers, traffic officers, bike officers and, on occasion, detectives will be dispatched to public safety response. It also includes self-initiated calls for service where an officer witnesses a violation or suspicious activity for which they would respond. This item represents a consolidated item of all records.Why the Datasets are Organized into Separate Layers In January of 2022, the Tempe Police Department completed a major transition in how crimes data is reported, moving from the FBI Uniform Crime Report program to the enhanced National-Incident Based Reporting System, or NIBRS. NIBRS is now the required reporting method for the FBI. The Uniform Crime Report (UCR) Program's traditional Summary Reporting System (SRS) was limited in comparison to NIBRS, which offers more detailed data collection that provides a deeper understanding of crime and its circumstances. NIBRS captures a wider range of details on crime incidents and can reflect separate offenses occurring during the same event, including information on victims, known offenders, relationships between victims and offenders, arrestees, and property involved in the crimes. With greater specificity in reporting offenses, NIBRS provides for more accurate and detailed crime-related information, and helps give context to specific crime issues while affording greater analytic capability of crime. Below is the link to Tempe-specific NIBRS reports. Use the drop-down filters to select Tempe PD, the year, and the type of report. Because of these differences, trends and numbers between the two systems should not be directly compared. That’s why we treat 2022 and later (NIBRS) separately from 2021 and earlier (UCR). To make the older data easier to browse, we grouped the data from 2021 and earlier into year ranges instead of showing it all at once. This helps with performance and loading speed due to the large count of records. For detailed guidance on interpreting calls for service data, as well as data scope and limitations, please refer to the User Guide.Data DictionaryAdditional InformationContact Email: PD_DataRequest@tempe.govContact Phone: N/ALink: N/AData Source: Versaterm Informix RMSData Source Type: Informix and/or SQL ServerPreparation Method: Automated processPublish Frequency: DailyPublish Method: Automatic
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
The Booking Hotel Listings Dataset provides a structured and in-depth view of accommodations worldwide, offering essential data for travel industry professionals, market analysts, and businesses. This dataset includes key details such as hotel names, locations, star ratings, pricing, availability, room configurations, amenities, guest reviews, sustainability features, and cancellation policies.
With this dataset, users can:
Analyze market trends to understand booking behaviors, pricing dynamics, and seasonal demand.
Enhance travel recommendations by identifying top-rated hotels based on reviews, location, and amenities.
Optimize pricing and revenue strategies by benchmarking property performance and availability patterns.
Assess guest satisfaction through sentiment analysis of ratings and reviews.
Evaluate sustainability efforts by examining eco-friendly features and certifications.
Designed for hospitality businesses, travel platforms, AI-powered recommendation engines, and pricing strategists, this dataset enables data-driven decision-making to improve customer experience and business performance.
Use Cases
Booking Hotel Listings in Greece
Gain insights into Greece’s diverse hospitality landscape, from luxury resorts in Santorini to boutique hotels in Athens. Analyze review scores, availability trends, and traveler preferences to refine booking strategies.
Booking Hotel Listings in Croatia
Explore hotel data across Croatia’s coastal and inland destinations, ideal for travel planners targeting visitors to Dubrovnik, Split, and Plitvice Lakes. This dataset includes review scores, pricing, and sustainability features.
Booking Hotel Listings with Review Scores Greater Than 9
A curated selection of high-rated hotels worldwide, ideal for luxury travel planners and market researchers focused on premium accommodations that consistently exceed guest expectations.
Booking Hotel Listings in France with More Than 1000 Reviews
Analyze well-established and highly reviewed hotels across France, ensuring reliable guest feedback for market insights and customer satisfaction benchmarking.
This dataset serves as an indispensable resource for travel analysts, hospitality businesses, and data-driven decision-makers, providing the intelligence needed to stay competitive in the ever-evolving travel industry.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
We'll customize a Flickr dataset to align with your unique requirements, incorporating data on image categories, user comments, photographic trends, popular photographers, demographic insights, usage statistics, and other relevant metrics.
Leverage our Flickr datasets for various applications to strengthen strategic planning and market analysis. Examining these datasets enables organizations to understand visual content preferences and photography trends, facilitating refined image curation and marketing campaigns. Tailor your access to the complete dataset or specific subsets according to your business needs.
Popular use cases include optimizing image libraries based on user engagement, refining marketing strategies through targeted audience segmentation, and identifying and predicting trends to maintain a competitive edge in the visual content and digital media market.
Facebook
TwitterThe NYS Department of Environmental Conservation (DEC) collects and maintains several datasets on the locations, distribution and status of species of plants and animals. Information on distribution by county from the following three databases was extracted and compiled into this dataset. First, the New York Natural Heritage Program biodiversity database: Rare animals, rare plants, and significant natural communities. Significant natural communities are rare or high-quality wetlands, forests, grasslands, ponds, streams, and other types of habitats. Next, the 2nd NYS Breeding Bird Atlas Project database: Birds documented as breeding during the atlas project from 2000-2005. And last, DEC’s NYS Reptile and Amphibian Database: Reptiles and amphibians; most records are from the NYS Amphibian & Reptile Atlas Project (Herp Atlas) from 1990-1999.
Facebook
TwitterMetadata for the OpenFEMA API data set fields. It contains descriptions, data types, and other attributes for each field.rnrnIf you have media inquiries about this dataset please email the FEMA News Desk FEMA-News-Desk@dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open government program please contact the OpenFEMA team via email OpenFEMA@fema.dhs.gov.
Facebook
TwitterThis dataset contains data for the Healthcare Payments Data (HPD) Snapshot visualization. The Enrollment data file contains counts of claims and encounter data collected for California's statewide HPD Program. It includes counts of enrollment records, service records from medical and pharmacy claims, and the number of individuals represented across these records. Aggregate counts are grouped by payer type (Commercial, Medi-Cal, or Medicare), product type, and year. The Medical data file contains counts of medical procedures from medical claims and encounter data in HPD. Procedures are categorized using claim line procedure codes and grouped by year, type of setting (e.g., outpatient, laboratory, ambulance), and payer type. The Pharmacy data file contains counts of drug prescriptions from pharmacy claims and encounter data in HPD. Prescriptions are categorized by name and drug class using the reported National Drug Code (NDC) and grouped by year, payer type, and whether the drug dispensed is branded or a generic.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains mapping and table of the location of all Pay and Display Meters in Fingal County Councils pay and display system around the county of Fingal. We Currently have 146 Meters in use .e have Pay and Display in the following areas: Malahide, Skerries, Balbriggan, Swords, Rush and Clonsilla.Pay & Display operates from 8.00 am to 6.00 pm Monday to Saturday inclusive. However, this can vary for particular areas, check the signage at your location.Charges can vary for different areas for both on-street and car parks. However, the charge is usually €1.20 per hour or €3 per day in the long-term parking areas.Traffic Wardens ticket illegally parked vehicles in a Pay and Display area. It is your responsibility to ensure that a valid parking ticket/permit/disc is displayed and clearly visible on your vehicle.
Facebook
TwitterThis dataset presents a rich collection of physicochemical parameters from 147 reservoirs distributed across the conterminous U.S. One hundred and eight of the reservoirs were selected using a statistical survey design and can provide unbiased inferences to the condition of all U.S. reservoirs. These data could be of interest to local water management specialists or those assessing the ecological condition of reservoirs at the national scale. These data have been reviewed in accordance with U.S. Environmental Protection Agency policy and approved for publication. This dataset is not publicly accessible because: It is too large. It can be accessed through the following means: https://portal-s.edirepository.org/nis/mapbrowse?scope=edi&identifier=2033&revision=1. Format: This dataset presents water quality and related variables for 147 reservoirs distributed across the U.S. Water quality parameters were measured during the summers of 2016, 2018, and 2020 – 2023. Measurements include nutrient concentration, algae abundance, dissolved oxygen concentration, and water temperature, among many others. Dataset includes links to other national and global scale data sets that provide additional variables.