27 datasets found

Airline Dataset
kaggle.com
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Airline Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/airline-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.

Content

This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.

Dataset Glossary (Column-wise)

Passenger ID - Unique identifier for each passenger

First Name - First name of the passenger

Last Name - Last name of the passenger

Gender - Gender of the passenger

Age - Age of the passenger

Nationality - Nationality of the passenger

Airport Name - Name of the airport where the passenger boarded

Airport Country Code - Country code of the airport's location

Country Name - Name of the country the airport is located in

Airport Continent - Continent where the airport is situated

Continents - Continents involved in the flight route

Departure Date - Date when the flight departed

Arrival Airport - Destination airport of the flight

Pilot Name - Name of the pilot operating the flight

Flight Status - Current status of the flight (e.g., on-time, delayed, canceled)

Structure of the Dataset

https://i.imgur.com/cUFuMeU.png" alt="">

Acknowledgement

The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.

Cover Photo by: Kevin Woblick on Unsplash

Thumbnail by: Airplane icons created by Freepik - Flaticon
Climate Change: Earth Surface Temperature Data
kaggle.com
redivis.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
Explore at:
zip(88843537 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset authored and provided by
Berkeley Earthhttp://berkeleyearth.org/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Earth
Description
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have include several files:

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures

LandAverageTemperature: global average land temperature in celsius

LandAverageTemperatureUncertainty: the 95% confidence interval around the average

LandMaxTemperature: global average maximum land temperature in celsius

LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature

LandMinTemperature: global average minimum land temperature in celsius

LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature

LandAndOceanAverageTemperature: global average land and ocean temperature in celsius

LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

Other files include:

Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)

Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)

Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)

Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

The raw data comes from the Berkeley Earth data page.
The Bushland, Texas Sunflower Datasets
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). The Bushland, Texas Sunflower Datasets [Dataset]. https://catalog.data.gov/dataset/the-bushland-texas-sunflower-datasets-f8d80
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Area covered
Bushland, Texas
Description
This parent dataset (collection of datasets) describes the general organization of data in the datasets for the 2009 and 2011 growing seasons (year) when sunflower (Helianthus annuus L.) was grown for seed grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Sunflower was grown for seed grain on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The two fields were contiguous, arranged along a north-south axis, and were labeled northeast (NE), and southeast (SE). See the resource titled "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. The fields were irrigated by a linear move sprinkler system equipped with spray applicators. Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe from 0.10- to 2.4-m depth in the field. The number and spacing of neutron probe reading locations changed through the years (additional sites were added), which is one reason why subsidiary datasets and data dictionaries are needed. The lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. The weighing lysimeters were used to measure relative soil water storage to 0.05 mm accuracy at 5-minute intervals, and the 5-minute change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), which is reported at 15-minute intervals. Each lysimeter was equipped with a suite of instruments to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all of which are reported at 15-minute intervals. Instruments used changed from season to season, which is another reason that subsidiary datasets and data dictionaries for each season are required. Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource titled "Conventions for Bushland, TX, Weighing Lysimeter Datasets". There are six datasets in this collection. Common symbols and abbreviations used in the datasets are defined in the resource titled, "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used and lists any instruments used. The remaining tabs in a file consist of dictionary and data tabs. There is a dictionary tab for every data tab. The name of the dictionary tab contains the name of the corresponding data tab. Tab names are unique so that if individual tabs were saved to CSV files, each CSV file in the entire collection would have a different name. The six datasets, according to their titles, are as follows: Agronomic Calendars for the Bushland, Texas Sunflower Datasets Growth and Yield Data for the Bushland, Texas Sunflower Datasets Weighing Lysimeter Data for The Bushland, Texas Sunflower Datasets Soil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter Experiments Evapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Sunflower Datasets Standard Quality Controlled Research Weather Data – USDA-ARS, Bushland, Texas See the README for descriptions of each dataset. The land slope is <1% and topography is flat. The mean annual precipitation is ~470 mm, the 20-year pan evaporation record indicates ~2,600 mm Class A pan evaporation per year, and winds are typically from the South and Southwest. The climate is semi-arid with ~70% (350 mm) of the annual precipitation occurring from May to September, during which period the pan evaporation averages ~1520 mm. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have described the facilities and research methods, and have focused on ET, crop coefficients, and crop water productivity. Crop coefficients have been used by ET networks for irrigation management. The data have utility for testing simulation models of crop ET, growth, and yield. Resources in this dataset: Resource Title: Geographic Coordinates of Experimental Assets, Weighing Lysimeter Experiments, USDA, ARS, Bushland, Texas. File Name: Geographic Coordinates, USDA, ARS, Bushland, Texas.xlsx. Resource Description: The file gives the UTM latitude and longitude of important experimental assets of the Bushland, Texas, USDA, ARS, Conservation & Production Research Laboratory (CPRL). Locations include weather stations [Soil and Water Management Research Unit (SWMRU) and CPRL], large weighing lysimeters, and corners of fields within which each lysimeter was centered. There were four fields designated NE, SE, NW, and SW, and a weighing lysimeter was centered in each field. The SWMRU weather station was adjacent to and immediately east of the NE and SE lysimeter fields. Resource Title: Conventions for Bushland, TX, Weighing Lysimeter Datasets. File Name: Conventions for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Descriptions of conventions and terminology used in the Bushland, TX, weighing lysimeter research program. Resource Title: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets. File Name: Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets.xlsx. Resource Description: Definitions of symbols and abbreviations used in the Bushland, TX, weighing lysimeter research datasets. Resource Title: README - Bushland Texas Sunflower collection. File Name: README_Bushland_sunflower_collection.pdf. Resource Description: Descriptions of the datasets in the Bushland Texas Sunflower collection.
g
Employee Travel 2021 (Excel)
opendata.greatersudbury.ca
arc-gis-hub-home-arcgishub.hub.arcgis.com
Updated Sep 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Greater Sudbury (2021). Employee Travel 2021 (Excel) [Dataset]. https://opendata.greatersudbury.ca/documents/7d73d365118b46e4828f52fea7c8ce3a
Explore at:
Dataset updated
Sep 1, 2021
Dataset authored and provided by
City of Greater Sudbury
Description
Download Employee Travel Excel SheetThis dataset contains information about the employee travel expenses for the year 2021. Details are provided on the employee (name, title, department), the travel (dates, location, purpose) and the cost (expenses, recoveries). Expenses are broken down in separate tabs by Quarter (Q1, Q2, Q3 and Q4). Updated quarterly when expenses are prepared. Expenses for other years are available in separate datasets.
C
Hospital Annual Financial Data - Selected Data & Pivot Tables
data.chhs.ca.gov
data.ca.gov
+6more
csv, data, doc, html +4
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
Explore at:
xlsx(770931), xlsx(756356), xls(19650048), pdf(310420), xls(18445312), xls(16002048), data, xlsx, xls(18301440), xls(920576), xls, html, pdf(333268), xlsx(763636), xls(19577856), pdf(383996), xls(44967936), xlsx(765216), xlsx(771275), zip, xlsx(777616), xlsx(779866), xlsx(790979), xls(51554816), xlsx(758376), xlsx(754073), pdf(258239), pdf(121968), xlsx(758089), xlsx(752914), doc, xlsx(14714368), xls(14657536), xlsx(750199), xls(44933632), pdf(303198), csv(205488092), xls(51424256), xlsx(769128), xlsx(768036), xls(19599360), xlsx(782546), xls(19625472)Available download formats
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
u
The Bushland, Texas, Alfalfa Datasets
agdatacommons.nal.usda.gov
s.cnmilf.com
+1more
xlsx
Updated Mar 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven R. Evett; Karen S. Copeland; Brice B. Ruthardt; Gary W. Marek; Paul D. Colaizzi; Terry A. Howell; David K. Brauer (2024). The Bushland, Texas, Alfalfa Datasets [Dataset]. http://doi.org/10.15482/USDA.ADC/1526356
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1526356
Dataset updated
Mar 4, 2024
Dataset provided by
Ag Data Commons
Authors
Steven R. Evett; Karen S. Copeland; Brice B. Ruthardt; Gary W. Marek; Paul D. Colaizzi; Terry A. Howell; David K. Brauer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Bushland, Texas
Description
This parent dataset (collection of datasets) describes the general organization of data in the datasets for each growing season (year) when alfalfa (Medicago sativa L.) was grown as a reference evapotranspiration (ETr) crop at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Alfalfa was grown on two large, precision weighing lysimeters, calibrated to NIST standards (Howell et al., 1995). Each lysimeter was in the center of a 4.44 ha square field on which alfalfa was also grown (Evett et al., 2000). The two fields were contiguous and arranged with one (labeled northeast, NE) directly north of the other (labeled southeast, SE). See the resource "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. Alfalfa was planted in Autumn 1995 and grown for hay in 1996, 1997, 1998, and 1999. The resource "Agronomic Calendar for the Bushland, Texas Alfalfa Datasets", gives a calendar listing by date the agronomic practices applied, severe weather, and activities (e.g. planting, thinning, fertilization, pesticide application, lysimeter maintenance, harvest) in and on lysimeters that could influence crop growth, water use, and lysimeter data. These include fertilizer and pesticide applications. There is one calendar, from before planting in autumn 1995 to after final harvest in 1999, for the NE and SE lysimeters and fields. There were 4 harvests each year except 1998 when 5 harvests were taken. Irrigation was by linear move sprinkler system equipped with pressure regulated low pressure sprays (mid-elevation spray application, MESA). Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings via field-calibrated (Evett and Steiner, 1995) neutron probe from 0.10- to 2.4-m depth in the field. Lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. Weighing lysimeters measured relative soil water storage to 0.05 mm accuracy at 5-min intervals, and the 5-min change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), reported at 15-min intervals. Each lysimeter was instrumented to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all at 15-min intervals. Instruments used changed from season to season, thus subsidiary datasets and data dictionaries for each season are required. The Bushland weighing lysimeter research program is described by Evett et al. (2016), and lysimeter design is described by Marek et al. (1988). Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource "Conventions for Bushland, TX, Weighing Lysimeter Datasets". There are 5 datasets in this collection. Common symbols and abbreviations used are defined in the resource "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used, and lists instruments used. The remaining tabs in a file consist of dictionary and data tabs. The 5 datasets are:

Growth and Yield Data for the Bushland, Texas Alfalfa Datasets Weighing Lysimeter Data for The Bushland, Texas Alfalfa Datasets Soil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter Experiments Evapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Alfalfa Datasets Standard Quality Controlled Research Weather Data – USDA-ARS, Bushland, Texas

See README for descriptions of each dataset. The soil is a Pullman series fine, mixed, superactive, thermic Torrertic Paleustoll. Soil properties are given in the resource titled "Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets". Land slope in the lysimeter fields is
w
Vehicle licensing statistics data files
gov.uk
s3.amazonaws.com
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Transport (2025). Vehicle licensing statistics data files [Dataset]. https://www.gov.uk/government/statistical-data-sets/vehicle-licensing-statistics-data-files
Explore at:
Dataset updated
Aug 13, 2025
Dataset provided by
GOV.UK
Authors
Department for Transport
Description

We welcome any feedback on the structure of our data files, their usability, or any suggestions for improvements; please contact vehicles statistics.

Data tables containing aggregated information about vehicles in the UK are also available.

How to use CSV files

CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).

When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.

Download data files

Make and model by quarter

df_VEH0120_GB: https://assets.publishing.service.gov.uk/media/6895d1963080e72710b2e2cf/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 59.1 MB)

Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)

Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]

df_VEH0120_UK: https://assets.publishing.service.gov.uk/media/6895d276586f9c9360656a18/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 34.9 MB)

Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)

Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]

df_VEH0160_GB: https://assets.publishing.service.gov.uk/media/6895ef62586f9c9360656a2d/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 25.3 MB)

Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)

Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]

df_VEH0160_UK: https://assets.publishing.service.gov.uk/media/6895f187e7be62b4f06431b1/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 8.53 MB)

Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)

Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]

Make and model by age

In order to keep the datafile df_VEH0124 to a reasonable size, it has been split into 2 halves; 1 covering makes starting with A to M, and the other covering makes starting with N to Z.

df_VEH0124_AM: https://assets.publishing.service.gov.uk/media/68494acf91c75fd63dd3a3ae/df_VEH0124_AM.csv">Vehicles at the end of the year by licence status, body type, make (A to M), generic model, model, year of first use and year of manufacture: United Kingdom (CSV, 47.9 MB)

Scope: All licensed vehicles in the United Kingdom with Make starting with A to M; annually from 2014

Schema: BodyType, Make, GenModel, Model, YearFi
N
Excel, AL Population Breakdown by Gender and Age Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Excel, AL Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1df34fc-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Excel
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Excel by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Excel. The dataset can be utilized to understand the population distribution of Excel by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Excel. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Excel.

Key observations

Largest age group (population): Male # 45-49 years (58) | Female # 5-9 years (55). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the Excel population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Excel is shown in the following column.

Population (Female): The female population in the Excel is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Excel for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel Population by Gender. You can refer the same here
N
Excel Township, Minnesota Population Breakdown by Gender and Age Dataset:...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Excel Township, Minnesota Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1df3575-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Minnesota, Excel Township
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Excel township by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Excel township. The dataset can be utilized to understand the population distribution of Excel township by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Excel township. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Excel township.

Key observations

Largest age group (population): Male # 55-59 years (17) | Female # 20-24 years (22). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the Excel township population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Excel township is shown in the following column.

Population (Female): The female population in the Excel township is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Excel township for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Excel township Population by Gender. You can refer the same here
m
Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...
data.mendeley.com
Updated Oct 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abtin Ijadi Maghsoodi (2021). Integrated Cryptocurrency Historical Data for a Predictive Data-Driven Decision-Making Algorithm [Dataset]. http://doi.org/10.17632/37nb83jwtd.1
Explore at:
Unique identifier
https://doi.org/10.17632/37nb83jwtd.1
Dataset updated
Oct 29, 2021
Authors
Abtin Ijadi Maghsoodi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/

This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA
Road safety statistics: data tables
gov.uk
Updated Sep 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Transport (2025). Road safety statistics: data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/reported-road-accidents-vehicles-and-casualties-tables-for-great-britain
Explore at:
Dataset updated
Sep 25, 2025
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Transport
Description

These tables present high-level breakdowns and time series. A list of all tables, including those discontinued, is available in the table index. More detailed data is available in our data tools, or by downloading the open dataset.

We are proposing to make some changes to these tables in future, further details and a link to a feedback form can be found alongside the 2024 annual report.

Latest data and table index

The tables below are the latest final annual statistics for 2024, which are currently the latest available data. Provisional statistics for the first half of 2025 will be published in November 2025.

A list of all reported road collisions and casualties data tables and variables in our data download tool is available in the https://assets.publishing.service.gov.uk/media/68d3edb0ca266424b221b287/reported-road-casualties-gb-index-of-tables.ods">Tables index (ODS, 28.9 KB).

All collision, casualty and vehicle tables

https://assets.publishing.service.gov.uk/media/68d42292b6c608ff9421b2d2/ras-all-tables-excel.zip">Reported road collisions and casualties data tables (zip file) (ZIP, 11.2 MB)

Historic trends (RAS01)

RAS0101: https://assets.publishing.service.gov.uk/media/68d3cdeeca266424b221b253/ras0101.ods">Collisions, casualties and vehicles involved by road user type since 1926 (ODS, 34.7 KB)

RAS0102: https://assets.publishing.service.gov.uk/media/68d3cdfee65dc716bfb1dcf3/ras0102.ods">Casualties and casualty rates, by road user type and age group, since 1979 (ODS, 129 KB)

Road user type (RAS02)

RAS0201: https://assets.publishing.service.gov.uk/media/68d3ce0bc908572e81248c1f/ras0201.ods">Numbers and rates (ODS, 37.5 KB)

RAS0202: https://assets.publishing.service.gov.uk/media/68d3ce17b6c608ff9421b25e/ras0202.ods">Sex and age group (ODS, 178 KB)

RAS0203: https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods">Rates by mode, including air, water and rail modes (ODS, 24.2 KB) - this table will be updated for 2024 once data is available for other modes.

Road type (RAS03)

RAS0301: https://assets.publishing.service.gov.uk/media/68d3ce2b8c739d679fb1dcf6/ras0301.ods">Speed limit, built-up and non-built-up roads (ODS, 20.8 KB)

RAS0302: <span class="gem-c-attachmen
e
Data file Large-scale Traffic Research Freight Transport Randstad 2012
data.europa.eu
Updated Oct 22, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Data file Large-scale Traffic Research Freight Transport Randstad 2012 [Dataset]. https://data.europa.eu/data/datasets/4djfm8nf-8kug-ve1q-6nrh-mkhpvgvr2012?locale=en
Explore at:
Dataset updated
Oct 22, 2015
Description
Excel file at postcode4 level of origins, destinations, preferences, motifs, percentages of freight transport in Randstad.
w
Fire statistics data tables
gov.uk
s3.amazonaws.com
Updated Sep 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry of Housing, Communities and Local Government (2025). Fire statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/fire-statistics-data-tables
Explore at:
Dataset updated
Sep 25, 2025
Dataset provided by
GOV.UK
Authors
Ministry of Housing, Communities and Local Government
Description

On 1 April 2025 responsibility for fire and rescue transferred from the Home Office to the Ministry of Housing, Communities and Local Government.

This information covers fires, false alarms and other incidents attended by fire crews, and the statistics include the numbers of incidents, fires, fatalities and casualties as well as information on response times to fires. The Ministry of Housing, Communities and Local Government (MHCLG) also collect information on the workforce, fire prevention work, health and safety and firefighter pensions. All data tables on fire statistics are below.

MHCLG has responsibility for fire services in England. The vast majority of data tables produced by the Ministry of Housing, Communities and Local Government are for England but some (0101, 0103, 0201, 0501, 1401) tables are for Great Britain split by nation. In the past the Department for Communities and Local Government (who previously had responsibility for fire services in England) produced data tables for Great Britain and at times the UK. Similar information for devolved administrations are available at https://www.firescotland.gov.uk/about/statistics/">Scotland: Fire and Rescue Statistics, https://statswales.gov.wales/Catalogue/Community-Safety-and-Social-Inclusion/Community-Safety">Wales: Community safety and https://www.nifrs.org/home/about-us/publications/">Northern Ireland: Fire and Rescue Statistics.

If you use assistive technology (for example, a screen reader) and need a version of any of these documents in a more accessible format, please email alternativeformats@communities.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

Related content

Fire statistics guidance
Fire statistics incident level datasets

Incidents attended

https://assets.publishing.service.gov.uk/media/686d2aa22557debd867cbe14/FIRE0101.xlsx">FIRE0101: Incidents attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 153 KB) Previous FIRE0101 tables

https://assets.publishing.service.gov.uk/media/686d2ab52557debd867cbe15/FIRE0102.xlsx">FIRE0102: Incidents attended by fire and rescue services in England, by incident type and fire and rescue authority (MS Excel Spreadsheet, 2.19 MB) Previous FIRE0102 tables

https://assets.publishing.service.gov.uk/media/686d2aca10d550c668de3c69/FIRE0103.xlsx">FIRE0103: Fires attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 201 KB) Previous FIRE0103 tables

https://assets.publishing.service.gov.uk/media/686d2ad92557debd867cbe16/FIRE0104.xlsx">FIRE0104: Fire false alarms by reason for false alarm, England (MS Excel Spreadsheet, 492 KB) Previous FIRE0104 tables

Dwelling fires attended

https://assets.publishing.service.gov.uk/media/686d2af42cfe301b5fb6789f/FIRE0201.xlsx">FIRE0201: Dwelling fires attended by fire and rescue services by motive, population and nation (MS Excel Spreadsheet, 192 KB) Previous FIRE0201 tables

<span class="gem
Walmart Dataset
kaggle.com
Updated Dec 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2021). Walmart Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/walmart-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">

Description:

One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

Acknowledgements

The dataset is taken from Kaggle.

Objective:

Understand the Dataset & cleanup (if required).

Build Regression models to predict the sales w.r.t single & multiple features.

Also evaluate the models & compare their respective scores like R2, RMSE, etc.
t
Bed surface adjustments to spatially variable flow in low relative...
service.tib.eu
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Bed surface adjustments to spatially variable flow in low relative submergence regimes, link to supplementary data in MS Excel format - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-878259
Explore at:
Dataset updated
Nov 29, 2024
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
In mountainous rivers, large relatively immobile grains partly control the local and reach-averaged flow hydraulics and sediment fluxes. When the flow depth in low relative submergence conditions plunging flow and the highly three-dimensional flow field can cause spatial distributions of bed surface elevations and grain size distributions, therefore, causing a spatially variable sediment transport rate. We conducted a set of experiments to study how the bed surface responds to this spatial variability and in particular the effect relative submergence in the formation of sediment patches around simulated large boulders. Same average sediment transport capacity, upstream sediment supply, and initial bed thickness and grain size distribution were imposed in all experiments. The detailed flow field around the boulders was obtained using a combination of laboratory measurements and a 3D flow model based on the Volume of Fluid technique. The local shear stress field displayed substantial variability and controlled the bedload transport rates and direction in which sediment moved. The divergence in shear stress caused by the hemispheres promoted size-selective bedload deposition, which formed patches of coarse sediment upstream of the hemisphere. Sediment deposition caused a decrease in local shear stress, which combined with the coarser grain size, enhanced the stability of this patch. The region downstream of the hemispheres was largely controlled by a recirculation zone and had little to no change in grain size, bed elevation, and shear stress. The formation, development and stability of sediment patches in mountain streams is controlled by the shear stress divergence and magnitude and direction of the local shear stress field.
Z
Quantitative Content Analysis Data for Hand Labeling Road Surface Conditions...
data.niaid.nih.gov
zenodo.org
Updated Sep 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Przybylo, Vanessa (2023). Quantitative Content Analysis Data for Hand Labeling Road Surface Conditions in New York State Department of Transportation Camera Images [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8370664
Explore at:
Dataset updated
Sep 27, 2023
Dataset provided by
Wirz, Christopher D.
Przybylo, Vanessa
Bassill, Nick P.
Sutter, Carly
Sulia, Kara
Cains, Mariana G.
Thorncroft, Christopher D.
Radford, Jacob
Evans, David Aaron
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New York
Description
Traffic camera images from the New York State Department of Transportation (511ny.org) are used to create a hand-labeled dataset of images classified into to one of six road surface conditions: 1) severe snow, 2) snow, 3) wet, 4) dry, 5) poor visibility, or 6) obstructed. Six labelers (authors Sutter, Wirz, Przybylo, Cains, Radford, and Evans) went through a series of four labeling trials where reliability across all six labelers were assessed using the Krippendorff’s alpha (KA) metric (Krippendorff, 2007). The online tool by Dr. Freelon (Freelon, 2013; Freelon, 2010) was used to calculate reliability metrics after each trial, and the group achieved inter-coder reliability with KA of 0.888 on the 4th trial. This process is known as quantitative content analysis, and three pieces of data used in this process are shared, including: 1) a PDF of the codebook which serves as a set of rules for labeling images, 2) images from each of the four labeling trials, including the use of New York State Mesonet weather observation data (Brotzge et al., 2020), and 3) an Excel spreadsheet including the calculated inter-coder reliability metrics and other summaries used to asses reliability after each trial.

The broader purpose of this work is that the six human labelers, after achieving inter-coder reliability, can then label large sets of images independently, each contributing to the creation of larger labeled dataset used for training supervised machine learning models to predict road surface conditions from camera images. The xCITE lab (xCITE, 2023) is used to store camera images from 511ny.org, and the lab provides computing resources for training machine learning models.
Health Facilities State Enforcement Actions
catalog.data.gov
data.chhs.ca.gov
+4more
Updated Jul 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Health Facilities State Enforcement Actions [Dataset]. https://catalog.data.gov/dataset/health-facilities-state-enforcement-actions-dcd50
Explore at:
Dataset updated
Jul 23, 2025
Dataset provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Note: This web page provides data on health facilities only. To file a complaint against a facility, please see: https://www.cdph.ca.gov/Programs/CHCQ/LCP/Pages/FileAComplaint.aspx The California Department of Public Health (CDPH), Center for Health Care Quality, Licensing and Certification (L&C) Program licenses more than 30 types of healthcare facilities. The Electronic Licensing Management System (ELMS) is a CDPH data system created to manage state licensing-related data and enforcement actions. The “Health Facilities’ State Enforcement Actions” dataset includes summary information for state enforcement actions (state citations or administrative penalties) issued to California healthcare facilities. This file, a sub-set of the ELMS system data, includes state enforcement actions that have been issued from July 1, 1997 through June 30, 2024. Data are presented for each citation/penalty, and include information about the type of enforcement action, violation category, penalty amount, violation date, appeal status, and facility. The “LTC Citation Narrative” dataset contains the full text of citations that were issued to long-term care (LTC) facilities between January 1, 2012 – December 31, 2017. DO NOT DOWNLOAD in Excel as this file has large blocks of text which may truncate. For example, Excel 2007 and later display, and allow up to, 32,767 characters in each cell, whereas earlier versions of Excel allow 32,767 characters, but only display the first 1,024 characters. Please refer to instructions in “E_Citation_Access_DB_How_To_Docs”, about how to download and view data. These files enable providers and the public to identify facility non-compliance and quality issues. By making this information available, quality issues can be identified and addressed. Please refer to the background paper, “About Health Facilities’ State Enforcement Actions” for information regarding California state enforcement actions before using these data. Data dictionaries and data summary charts are also available. Note: The Data Dictionary at the bottom of the dataset incorrectly lists the data column formats as all Text. For proper format labels, please go here.
d
Lottery Mega Millions Winning Numbers: Beginning 2002
catalog.data.gov
datasets.ai
+2more
Updated Sep 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of New York (2025). Lottery Mega Millions Winning Numbers: Beginning 2002 [Dataset]. https://catalog.data.gov/dataset/lottery-mega-millions-winning-numbers-beginning-2002
Explore at:
Dataset updated
Sep 20, 2025
Dataset provided by
State of New York
Description
Go to http://on.ny.gov/1J8tPSN on the New York Lottery website for past Mega Millions results and payouts.
d
GP Practice Prescribing Presentation-level Data - January 2017
digital.nhs.uk
csv, zip
Updated Apr 7, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). GP Practice Prescribing Presentation-level Data - January 2017 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/practice-level-prescribing-data
Explore at:
csv(1.7 MB), zip(250.3 MB), csv(283.7 kB), csv(1.4 GB)Available download formats
Dataset updated
Apr 7, 2017
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Jan 1, 2017 - Jan 31, 2017
Area covered
United Kingdom
Description
Following a review of our processes, NHS Digital has recently decided to bring forward the publishing date for Practice Level Prescribing data. This data is currently published on the 1st Friday of the month. The February 2017 data will be published on Friday 5 May 2017, however these dates are constantly under review and may move earlier. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation. Warning: Large file size (over 1GB). Each monthly data set is large (over 10 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available using the link in the 'Related Links' section below. Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet.
p
Business Activity Survey 2009 - Samoa
microdata.pacificdata.org
Updated Jul 2, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samoa Bureau of Statistics (2019). Business Activity Survey 2009 - Samoa [Dataset]. https://microdata.pacificdata.org/index.php/catalog/253
Explore at:
Dataset updated
Jul 2, 2019
Dataset authored and provided by
Samoa Bureau of Statistics
Time period covered
2009
Area covered
Samoa
Description
Abstract

The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).

The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.

The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.

Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).

A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.

Geographic coverage

National Coverage

Analysis unit

The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.

SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.

It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.

Universe

The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).

Kind of data

Sample survey data [ssd]

Sampling procedure

-Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)

WILL CONFIRM LATER!!

OSO LE MEA E LE FAASA...AEA :-)

Mode of data collection

Mail Questionnaire [mail]

Research instrument

General instructions, authority for the survey, etc;

Business demography information on ownership, contact details, structure, etc.;

Employment;

Income;

Expenses;

Inventories;

Profit or loss and reconciliation to business accounts' profit and loss;

Fixed assets - purchases, disposals, book values

Thank you and signature of respondent.

Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.

Cleaning operations

Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.

Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.

Sampling error estimates

NOT APPLICABLE!!

Facebook

Twitter

Click to copy link

Link copied

Cite

Sourav Banerjee (2023). Airline Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/airline-dataset

Airline Dataset

Navigating the Skies: Exploring Insights from Synthetic Airline Data

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 26, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sourav Banerjee

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.

Content

This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.

Dataset Glossary (Column-wise)

Passenger ID - Unique identifier for each passenger
First Name - First name of the passenger
Last Name - Last name of the passenger
Gender - Gender of the passenger
Age - Age of the passenger
Nationality - Nationality of the passenger
Airport Name - Name of the airport where the passenger boarded
Airport Country Code - Country code of the airport's location
Country Name - Name of the country the airport is located in
Airport Continent - Continent where the airport is situated
Continents - Continents involved in the flight route
Departure Date - Date when the flight departed
Arrival Airport - Destination airport of the flight
Pilot Name - Name of the pilot operating the flight
Flight Status - Current status of the flight (e.g., on-time, delayed, canceled)

Structure of the Dataset

https://i.imgur.com/cUFuMeU.png" alt="">

Acknowledgement

The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.

Cover Photo by: Kevin Woblick on Unsplash

Thumbnail by: Airplane icons created by Freepik - Flaticon

Clear search

Close search

Google apps

Main menu

Airline Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

Climate Change: Earth Surface Temperature Data

The Bushland, Texas Sunflower Datasets

Employee Travel 2021 (Excel)

Hospital Annual Financial Data - Selected Data & Pivot Tables

The Bushland, Texas, Alfalfa Datasets

Vehicle licensing statistics data files

How to use CSV files

Download data files

Make and model by quarter

Make and model by age

Excel, AL Population Breakdown by Gender and Age Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Excel Township, Minnesota Population Breakdown by Gender and Age Dataset:...

About this dataset

Content

Inspiration

Recommended for further research

Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...

Road safety statistics: data tables

Latest data and table index

All collision, casualty and vehicle tables

Historic trends (RAS01)

Road user type (RAS02)

Road type (RAS03)

Data file Large-scale Traffic Research Freight Transport Randstad 2012

Fire statistics data tables

Related content

Incidents attended

Dwelling fires attended

Walmart Dataset

Description:

Acknowledgements

Objective:

Bed surface adjustments to spatially variable flow in low relative...

Quantitative Content Analysis Data for Hand Labeling Road Surface Conditions...

Health Facilities State Enforcement Actions

Lottery Mega Millions Winning Numbers: Beginning 2002

GP Practice Prescribing Presentation-level Data - January 2017

Business Activity Survey 2009 - Samoa

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

Airline Dataset

Navigating the Skies: Exploring Insights from Synthetic Airline Data

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement