100+ datasets found

Means of Transportation to Work
catalog.data.gov
geodata.bts.gov
+2more
Updated Jul 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Transportation Statistics (BTS) (Point of Contact) (2025). Means of Transportation to Work [Dataset]. https://catalog.data.gov/dataset/means-of-transportation-to-work2
Explore at:
Dataset updated
Jul 17, 2025
Dataset provided by
Bureau of Transportation Statisticshttp://www.rita.dot.gov/bts
Description
The Means of Transportation to Work dataset was compiled using information from December 31, 2023 and updated December 12, 2024 from the Bureau of Transportation Statistics (BTS) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The Means of Transportation to Work table from the 2023 American Community Survey (ACS) 5-year estimates was joined to 2023 tract-level geographies for all 50 States, District of Columbia and Puerto Rico provided by the Census Bureau. A new file was created that combines the demographic variables from the former with the cartographic boundaries of the latter. The national level census tract layer contains data on the number and percentage of commuters (workers 16 years and over) that used various transportation modes to get to work. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529037
f
Table_1_A scalable and transparent data pipeline for AI-enabled health data...
figshare.com
docx
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen (2024). Table_1_A scalable and transparent data pipeline for AI-enabled health data ecosystems.docx [Dataset]. http://doi.org/10.3389/fmed.2024.1393123.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2024.1393123.s001
Dataset updated
Jul 30, 2024
Dataset provided by
Frontiers
Authors
Tuncay Namli; Ali Anıl Sınacı; Suat Gönül; Cristina Ruiz Herguido; Patricia Garcia-Canadilla; Adriana Modrego Muñoz; Arnau Valls Esteve; Gökçe Banu Laleci Ertürkmen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionTransparency and traceability are essential for establishing trustworthy artificial intelligence (AI). The lack of transparency in the data preparation process is a significant obstacle in developing reliable AI systems which can lead to issues related to reproducibility, debugging AI models, bias and fairness, and compliance and regulation. We introduce a formal data preparation pipeline specification to improve upon the manual and error-prone data extraction processes used in AI and data analytics applications, with a focus on traceability.MethodsWe propose a declarative language to define the extraction of AI-ready datasets from health data adhering to a common data model, particularly those conforming to HL7 Fast Healthcare Interoperability Resources (FHIR). We utilize the FHIR profiling to develop a common data model tailored to an AI use case to enable the explicit declaration of the needed information such as phenotype and AI feature definitions. In our pipeline model, we convert complex, high-dimensional electronic health records data represented with irregular time series sampling to a flat structure by defining a target population, feature groups and final datasets. Our design considers the requirements of various AI use cases from different projects which lead to implementation of many feature types exhibiting intricate temporal relations.ResultsWe implement a scalable and high-performant feature repository to execute the data preparation pipeline definitions. This software not only ensures reliable, fault-tolerant distributed processing to produce AI-ready datasets and their metadata including many statistics alongside, but also serve as a pluggable component of a decision support application based on a trained AI model during online prediction to automatically prepare feature values of individual entities. We deployed and tested the proposed methodology and the implementation in three different research projects. We present the developed FHIR profiles as a common data model, feature group definitions and feature definitions within a data preparation pipeline while training an AI model for “predicting complications after cardiac surgeries”.DiscussionThrough the implementation across various pilot use cases, it has been demonstrated that our framework possesses the necessary breadth and flexibility to define a diverse array of features, each tailored to specific temporal and contextual criteria.
Medical Service Study Area Data Dictionary
gis.data.chhs.ca.gov
data.ca.gov
+4more
Updated Sep 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CA Department of Health Care Access and Information (2024). Medical Service Study Area Data Dictionary [Dataset]. https://gis.data.chhs.ca.gov/datasets/hcai::medical-service-study-area-data-dictionary
Explore at:
Dataset updated
Sep 6, 2024
Dataset provided by
Department of Health Care Access and Information
Authors
CA Department of Health Care Access and Information
Description
Field Name Data Type Description

Statefp Number US Census Bureau unique identifier of the state

Countyfp Number US Census Bureau unique identifier of the county

Countynm Text County name

Tractce Number US Census Bureau unique identifier of the census tract

Geoid Number US Census Bureau unique identifier of the state + county + census tract

Aland Number US Census Bureau defined land area of the census tract

Awater Number US Census Bureau defined water area of the census tract

Asqmi Number Area calculated in square miles from the Aland

MSSAid Text ID of the Medical Service Study Area (MSSA) the census tract belongs to

MSSAnm Text Name of the Medical Service Study Area (MSSA) the census tract belongs to

Definition Text Type of MSSA, possible values are urban, rural and frontier.

TotalPovPop Number US Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701

Patient Risk Profiles

kaggle.com

zip

Updated Oct 28, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Sujay Kapadnis (2023). Patient Risk Profiles [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/patient-risk-profiles

Explore at:

zip(17288 bytes)Available download formats

Dataset updated

Oct 28, 2023

Authors

Sujay Kapadnis

Description

The virtual R/Pharma Conference is happening this week! To celebrate, we're exploring Patient Risk Profiles. Thank you to Jenna Reps for preparing this data!

This dataset contains 100 simulated patient's medical history features and the predicted 1-year risk of 14 outcomes based on each patient's medical history features. The predictions used real logistic regression models developed on a large real world healthcare dataset.

Data Dictionary

`patient_risk_profiles.csv`

variable	class	description
personId	integer	A unique identifier for the simulated patient
age group: 10 - 14	integer	A binary column where 1 means the patient is aged between 10-14 (inclusive) and 0 means the patient is not in that age group
age group: 15 - 19	integer	A binary column where 1 means the patient is aged between 15-19 (inclusive) and 0 means the patient is not in that age group
age group: 20 - 24	integer	A binary column where 1 means the patient is aged between 20-24 (inclusive) and 0 means the patient is not in that age group
age group: 65 - 69	integer	A binary column where 1 means the patient is aged between 65-69 (inclusive) and 0 means the patient is not in that age group
age group: 40 - 44	integer	A binary column where 1 means the patient is aged between 40-44 (inclusive) and 0 means the patient is not in that age group
age group: 45 - 49	integer	A binary column where 1 means the patient is aged between 45-49 (inclusive) and 0 means the patient is not in that age group
age group: 55 - 59	integer	A binary column where 1 means the patient is aged between 55-59 (inclusive) and 0 means the patient is not in that age group
age group: 85 - 89	integer	A binary column where 1 means the patient is aged between 85-89 (inclusive) and 0 means the patient is not in that age group
age group: 75 - 79	integer	A binary column where 1 means the patient is aged between 75-79 (inclusive) and 0 means the patient is not in that age group
age group: 5 - 9	integer	A binary column where 1 means the patient is aged between 5-9 (inclusive) and 0 means the patient is not in that age group
age group: 25 - 29	integer	A binary column where 1 means the patient is aged between 25-29 (inclusive) and 0 means the patient is not in that age group
age group: 0 - 4	integer	A binary column where 1 means the patient is aged between 0-4 (inclusive) and 0 means the patient is not in that age group
age group: 70 - 74	integer	A binary column where 1 means the patient is aged between 70-74 (inclusive) and 0 means the patient is not in that age group
age group: 50 - 54	integer	A binary column where 1 means the patient is aged between 50-54 (inclusive) and 0 means the patient is not in that age group
age group: 60 - 64	integer	A binary column where 1 means the patient is aged between 60-64 (inclusive) and 0 means the patient is not in that age group
age group: 35 - 39	integer	A binary column where 1 means the patient is aged between 35-39 (inclusive) and 0 means the patient is not in that age group
age group: 30 - 34	integer	A binary column where 1 means the patient is aged between 30-34 (inclusive) and 0 means the patient is not in that age group
age group: 80 - 84	integer	A binary column where 1 means the patient is aged between 80-84 (inclusive) and 0 means the patient is not in that age group
age group: 90 - 94	integer	A binary column where 1 means the patient is aged between 90-94 (inclusive) and 0 means the patient is not in that age group
Sex = FEMALE	integer	A binary column where 1 means the patient has a female sex
sex = MALE	integer	A binary column where 1 means the patient has a male sex
Acetaminophen exposures in prior year	integer	A binary column where 1 means the patient had a record for acetaminophen in the prior year and 0 means they did not
Occurrence of Alcoholism in prior year	integer	A binary column where 1 means the patient had a record for alcoholism in the prior year and 0 means they did not
Anemia in prior year	integer	A binary column where 1 means the patient had a record for anemia in the prior year and 0 means they did not
Angina events in prior year	integer	A binary column where 1 means the patient had a record for angina in the prior year and 0 means they did not
ANTIEPILEPTICS in prior year	integer	A binary column where 1 means the patient had a record for a drug in the category ANTIEPILEPTICS in the prior year and 0 means they did not
Occurrence of Anxiety in prior year	integer	A binary column where 1 means the patient had a record for anxiety in the prior year and 0 means...

IPPS DRG Provider Summary
kaggle.com
zip
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). IPPS DRG Provider Summary [Dataset]. https://www.kaggle.com/datasets/thedevastator/ipps-drg-provider-summary
Explore at:
zip(8432015 bytes)Available download formats
Dataset updated
Jan 23, 2023
Authors
The Devastator
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
IPPS DRG Provider Summary

Average Discharges, Charges, and Medicare Payments

By Health [source]

About this dataset

This dataset is a valuable resource for gaining insight into Inpatient Prospective Payment System (IPPS) utilization, average charges and average Medicare payments across the top 100 Diagnosis-Related Groups (DRG). With column categories such as DRG Definition, Hospital Referral Region Description, Total Discharges, Average Covered Charges, Average Medicare Payments and Average Medicare Payments 2 this dataset enables researchers to discover and assess healthcare trends in areas such as provider payment comparsons by geographic location or compare service cost across hospital. Visualize the data using various methods to uncover unique information and drive further hospital research

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a provider level summary of Inpatient Prospective Payment System (IPPS) discharges, average charges and average Medicare payments for the Top 100 Diagnosis-Related Groups (DRG). This data can be used to analyze cost and utilization trends across hospital DRGs.

To make the most use of this dataset, here are some steps to consider:

Understand what each column means in the table: Each column provides different information from the DRG Definition to Hospital Referral Region Description and Average Medicare Payments.

Analyze the data by looking for patterns amongst the relevant columns: Compare different aspects such as total discharges or average Medicare payments by hospital referral region or DRG Definition. This can help identify any potential trends amongst different categories within your analysis.

Generate visualizations: Create charts, graphs, or maps that display your data in an easy-to-understand format using tools such as Microsoft Excel or Tableau. Such visuals may reveal more insights into patterns within your data than simply reading numerical values on a spreadsheet could provide alone.

Research Ideas

Identifying potential areas of cost savings by drilling down to particular DRGs and hospital regions with the highest average covered charges compared to average Medicare payments.

Establishing benchmarks for typical charges and payments across different DRGs and hospital regions to help providers set market-appropriate prices.

Analyzing trends in total discharges, charges and Medicare payments over time, allowing healthcare organizations to measure their performance against regional peers

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Open Database License (ODbL) v1.0 - You are free to: - Share - copy and redistribute the material in any medium or format. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices. - No Derivatives - If you remix, transform, or build upon the material, you may not distribute the modified material. - No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Columns

File: 97k6-zzx3.csv | Column name | Description | |:-----------------------------------------|:------------------------------------------------------| | drg_definition | Diagnosis-Related Group (DRG) definition. (String) | | average_medicare_payments | Average Medicare payments for each DRG. (Numeric) | | hospital_referral_region_description | Description of the hospital referral region. (String) | | total_discharges | Total number of discharges for each DRG. (Numeric) | | average_covered_charges | Average covered charges for each DRG. (Numeric) | | average_medicare_payments_2 | Average Medicare payments for each DRG. (Numeric) |

**File: Inpatient_Prospective_Payment_System_IPPS_Provider_Summary_for_the_Top_100_Diagnosis-Related_Groups_DRG...
K means clustering the opportunity atlas 2023APR
catalog.data.gov
s.cnmilf.com
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). K means clustering the opportunity atlas 2023APR [Dataset]. https://catalog.data.gov/dataset/k-means-clustering-the-opportunity-atlas-2023apr
Explore at:
Dataset updated
Feb 9, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
All code and input files used in k-means clustering analysis of Opportunity Atlas data. This dataset is associated with the following publication: Zelasky, S., C. Martin, C. Weaver, L. Baxter, and K. Rappazzo. Identifying groups of children's social mobility opportunity for public health applications using k-means clustering. Heliyon. Elsevier B.V., Amsterdam, NETHERLANDS, 9(9): E20250, (2023).
d
Data Definition Guidelines
catalog.data.gov
data.virginia.gov
Updated Sep 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Administration for Children and Families (2025). Data Definition Guidelines [Dataset]. https://catalog.data.gov/dataset/data-definition-guidelines
Explore at:
Dataset updated
Sep 8, 2025
Dataset provided by
Administration for Children and Families
Description
ACF Agency Wide resource Metadata-only record linking to the original dataset. Open original dataset below.
Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
zenodo.org
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
University of the Aegean
University of Zagreb
Gdańsk University of Technology
University of Tartu
Authors
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
CERES Energy Balanced and Filled (EBAF) TOA and Surface Monthly means data...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). CERES Energy Balanced and Filled (EBAF) TOA and Surface Monthly means data in netCDF Edition 4.1 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/ceres-energy-balanced-and-filled-ebaf-toa-and-surface-monthly-means-data-in-netcdf-edition-1fbba
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
CERES_EBAF_Edition4.1 is the Clouds and the Earth's Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) Top-of-Atmosphere (TOA) and surface monthly means data in netCDF format Edition 4.1 data product. Data was collected using the CERES Scanner instruments on both the Terra and Aqua platforms. Data collection for this product is ongoing.CERES_EBAF_Edition4.1 data are monthly and climatological averages of TOA clear-sky (spatially complete) fluxes and all-sky fluxes, where the TOA net flux is constrained to the ocean heat storage. It also provides computed monthly mean surface radiative fluxes consistent with the CERES EBAF-TOA product and some basic cloud properties derived from MODIS. Cloud Radiative Effects are provided at both the TOA and surface as determined using a cloud-free profile in the Fu-Liou Radiative Transfer Model (RTM). Observed fluxes are obtained using cloud properties derived from narrow-band imagers onboard both EOS Terra and Aqua satellites, as well as geostationary satellites, to fully model the diurnal cycle of clouds. The computations are also based on meteorological assimilation data from the Goddard Earth Observing System (GEOS) Versions 5.4.1 models. Unlike other CERES Level 3 clear-sky regional data sets that contain clear-sky data gaps, the clear-sky fluxes in the EBAF-TOA product are regionally complete. The EBAF-TOA product is the CERES project's best estimate of the fluxes based on all available satellite platforms and input data. CERES is a key Earth Observing System (EOS) program component. The CERES instruments provide radiometric measurements of the Earth's atmosphere from three broadband channels. The CERES missions follow the successful Earth Radiation Budget Experiment (ERBE) mission. The first CERES instrument, the proto flight model (PFM), was launched on November 27, 1997, as part of the Tropical Rainfall Measuring Mission (TRMM). Two CERES instruments (FM1 and FM2) were launched into polar orbit on board the Earth Observing System (EOS) flagship Terra on December 18, 1999. Two additional CERES instruments (FM3 and FM4) were launched on board Earth Observing System (EOS) Aqua on May 4, 2002. The CERES FM5 instrument was launched on board the Suomi National Polar-orbiting Partnership (NPP) satellite on October 28, 2011. The newest CERES instrument (FM6) was launched on board the Joint Polar-Orbiting Satellite System 1 (JPSS-1) satellite, now called NOAA-20, on November 18, 2017.
Geomagnetic Observatory Annual Means Data
catalog.data.gov
s.cnmilf.com
+2more
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOC/NOAA/NESDIS/NCEI > National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce (Point of Contact) (2024). Geomagnetic Observatory Annual Means Data [Dataset]. https://catalog.data.gov/dataset/geomagnetic-observatory-annual-means-data1
Explore at:
Dataset updated
Oct 18, 2024
Dataset provided by
United States Department of Commercehttp://commerce.gov/
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Environmental Satellite, Data, and Information Service
Description
The NOAA National Centers for Environmental Information (formerly National Geophysical Data Center) / World Data Center, Boulder maintains an active database of worldwide geomagnetic observatory data. Historically, magnetic observatories were established to monitor the secular change (variation), of the Earth's magnetic field, and this remains one of their most important functions. This generally involves absolute measurements sufficient in number to monitor instrumental drift and to produce annual means. While the current global network of geomagnetic observatories involves over 70 countries operating more than 200 observatories, the historic database includes observations from more than 600 observatories since the early 1800s. The magnetic observatory data are crucial to the studies of secular change, investigations into the Earth's interior, navigation, communication, and to global modeling efforts. The Earth's magnetic field is described by seven parameters. These are declination (D), inclination (I), horizontal intensity (H), vertical intensity (Z), total intensity (F) and the north (X) and east (Y) components of the horizontal intensity. By convention, declination is considered positive when measured east of north, inclination and vertical intensity positive down, X positive north, and Y positive east. The magnetic field observed on Earth is constantly changing.
Medical Service Study Areas
data.chhs.ca.gov
healthdata.gov
+5more
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2024). Medical Service Study Areas [Dataset]. https://data.chhs.ca.gov/dataset/medical-service-study-areas
Explore at:
csv, html, geojson, kml, zip, arcgis geoservices rest apiAvailable download formats
Dataset updated
Dec 6, 2024
Dataset authored and provided by
Department of Health Care Access and Information
Description
This is the current Medical Service Study Area. California Medical Service Study Areas are created by the California Department of Health Care Access and Information (HCAI).

Check the Data Dictionary for field descriptions.

Search for the Medical Service Study Area data on the CHHS Open Data Portal.

Checkout the California Healthcare Atlas for more Medical Service Study Area information.
This is an update to the MSSA geometries and demographics to reflect the new 2020 Census tract data. The Medical Service Study Area (MSSA) polygon layer represents the best fit mapping of all new 2020 California census tract boundaries to the original 2010 census tract boundaries used in the construction of the original 2010 MSSA file. Each of the state's new 9,129 census tracts was assigned to one of the previously established medical service study areas (excluding tracts with no land area), as identified in this data layer. The MSSA Census tract data is aggregated by HCAI, to create this MSSA data layer. This represents the final re-mapping of 2020 Census tracts to the original 2010 MSSA geometries. The 2010 MSSA were based on U.S. Census 2010 data and public meetings held throughout California.

Source of update: American Community Survey 5-year 2006-2010 data for poverty. For source tables refer to InfoUSA update procedural documentation. The 2010 MSSA Detail layer was developed to update fields affected by population change. The American Community Survey 5-year 2006-2010 population data pertaining to total, in households, race, ethnicity, age, and poverty was used in the update. The 2010 MSSA Census Tract Detail map layer was developed to support geographic information systems (GIS) applications, representing 2010 census tract geography that is the foundation of 2010 medical service study area (MSSA) boundaries. ***This version is the finalized MSSA reconfiguration boundaries based on the US Census Bureau 2010 Census. In 1976 Garamendi Rural Health Services Act, required the development of a geographic framework for determining which parts of the state were rural and which were urban, and for determining which parts of counties and cities had adequate health care resources and which were "medically underserved". Thus, sub-city and sub-county geographic units called "medical service study areas [MSSAs]" were developed, using combinations of census-defined geographic units, established following General Rules promulgated by a statutory commission. After each subsequent census the MSSAs were revised. In the scheduled revisions that followed the 1990 census, community meetings of stakeholders (including county officials, and representatives of hospitals and community health centers) were held in larger metropolitan areas. The meetings were designed to develop consensus as how to draw the sub-city units so as to best display health care disparities. The importance of involving stakeholders was heightened in 1992 when the United States Department of Health and Human Services' Health and Resources Administration entered a formal agreement to recognize the state-determined MSSAs as "rational service areas" for federal recognition of "health professional shortage areas" and "medically underserved areas". After the 2000 census, two innovations transformed the process, and set the stage for GIS to emerge as a major factor in health care resource planning in California. First, the Office of Statewide Health Planning and Development [OSHPD], which organizes the community stakeholder meetings and provides the staff to administer the MSSAs, entered into an Enterprise GIS contract. Second, OSHPD authorized at least one community meeting to be held in each of the 58 counties, a significant number of which were wholly rural or frontier counties. For populous Los Angeles County, 11 community meetings were held. As a result, health resource data in California are collected and organized by 541 geographic units. The boundaries of these units were established by community healthcare experts, with the objective of maximizing their usefulness for needs assessment purposes. The most dramatic consequence was introducing a data simultaneously displayed in a GIS format. A two-person team, incorporating healthcare policy and GIS expertise, conducted the series of meetings, and supervised the development of the 2000-census configuration of the MSSAs.

MSSA Configuration Guidelines (General Rules):- Each MSSA is composed of one or more complete census tracts.- As a general rule, MSSAs are deemed to be "rational service areas [RSAs]" for purposes of designating health professional shortage areas [HPSAs], medically underserved areas [MUAs] or medically underserved populations [MUPs].- MSSAs will not cross county lines.- To the extent practicable, all census-defined places within the MSSA are within 30 minutes travel time to the largest population center within the MSSA, except in those circumstances where meeting this criterion would require splitting a census tract.- To the extent practicable, areas that, standing alone, would meet both the definition of an MSSA and a Rural MSSA, should not be a part of an Urban MSSA.- Any Urban MSSA whose population exceeds 200,000 shall be divided into two or more Urban MSSA Subdivisions.- Urban MSSA Subdivisions should be within a population range of 75,000 to 125,000, but may not be smaller than five square miles in area. If removing any census tract on the perimeter of the Urban MSSA Subdivision would cause the area to fall below five square miles in area, then the population of the Urban MSSA may exceed 125,000. - To the extent practicable, Urban MSSA Subdivisions should reflect recognized community and neighborhood boundaries and take into account such demographic information as income level and ethnicity. Rural Definitions: A rural MSSA is an MSSA adopted by the Commission, which has a population density of less than 250 persons per square mile, and which has no census defined place within the area with a population in excess of 50,000. Only the population that is located within the MSSA is counted in determining the population of the census defined place. A frontier MSSA is a rural MSSA adopted by the Commission which has a population density of less than 11 persons per square mile. Any MSSA which is not a rural or frontier MSSA is an urban MSSA. Last updated December 6th 2024.
R
Mapping between MEANS-InOut input data and LCI from reference database
entrepot.recherche.data.gouv.fr
pdf, tsv
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie Auberger; Julie Auberger; Christophe Geneste; Guilhem Rostain; Caroline Malnoë; Christophe Geneste; Guilhem Rostain; Caroline Malnoë (2024). Mapping between MEANS-InOut input data and LCI from reference database [Dataset]. http://doi.org/10.57745/VHTM7A
Explore at:
tsv(401939), pdf(415380), pdf(434964), pdf(414431)Available download formats
Unique identifier
https://doi.org/10.57745/VHTM7A
Dataset updated
Jul 1, 2024
Dataset provided by
Recherche Data Gouv
Authors
Julie Auberger; Julie Auberger; Christophe Geneste; Guilhem Rostain; Caroline Malnoë; Christophe Geneste; Guilhem Rostain; Caroline Malnoë
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
This dataset is a mapping between MEANS-InOut input data and Life Cycle Inventories from reference databases (Agribalyse, ecoinvent). The MEANS-InOut input data are agricultural production system inputs (fertilisers, plant protection products, agricultural operations, livestock feed, ingredients to be incorporated into livestock feed, etc.). Each input is associated with one or more LCI, which represent(s) the impacts of the production of this input, and the database from which the LCI(s) is from. This version of the dataset corresponds to the following versions of the databases: Agribalyse v3.1.1 and ecoinvent v3.9. The correspondence file (named mapping_data.tab) is associated with : a document describing the input types in the MEANS-InOut software (file: Input_type_description.pdf), a document describing how the value of the input flow of a LCI for an agricultural system studied in MEANS-InOut is obtained from the value taken by this input in MEANS-InOut. (file: LCI_value_construction.pdf) Ce jeu de données établit la correspondance entre les référentiels de MEANS-InOut et des Inventaires de Cycle de Vie de base de données de référence (Agribalyse, ecoinvent). Les référentiels de MEANS-InOut sont des intrants des systèmes de production agricole (engrais, produits phytosanitaires, opérations agricoles, aliments du bétail, ingrédients à incorporer dans les aliments composés...). A chaque intrant est associé un ou plusieurs ICV, qui représentent les impacts de la production de cet intrant, et la base de données dont le ou les ICV sont issus. Cette version du jeu de données fait la correspondance avec les versions suivantes des bases de données : Agribalyse v3.1.1 et ecoinvent v3.9. Au fichier de correspondances (fichier : mapping_data.tab), sont associés : un document qui décrit les types d'intrants du logiciel MEANS-InOut (fichier : Input_type_description.pdf), un document qui décrit comment est obtenue la valeur du flux des intrants d'un ICV d'un système agricole étudié dans MEANS-InOut à partir de la valeur prise par cet un intrant dans MEANS-InOut. (fichier : LCI_value_construction.pdf)
MOPITT CO gridded monthly means (Near and Thermal Infrared Radiances) V009
data.nasa.gov
s.cnmilf.com
+3more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MOPITT CO gridded monthly means (Near and Thermal Infrared Radiances) V009 [Dataset]. https://data.nasa.gov/dataset/mopitt-co-gridded-monthly-means-near-and-thermal-infrared-radiances-v009-552e7
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
MOP03JM_9 is the Measurements Of Pollution In The Troposphere (MOPITT) Carbon Monoxide (CO) gridded monthly means (Near and Thermal Infrared Radiances) version 9 data product. It contains monthly mean-gridded daily Level 2 CO profile versions and total column retrievals. For this data product, the averaging kernels associated with each retrieval are also gridded and included in the Level 3 files. For a description of the file contents, refer to the File Spec Document. The MOPITT Level 2 Data Quality Statement contains additional information about the retrievals' quality and limitations. MOPITT was successfully launched into sun-synchronous polar orbit aboard Terra, NASA's first Earth Observing System spacecraft, on December 18, 1999. The MOPITT instrument was constructed by a consortium of Canadian companies and funded by the Space Science Division of the Canadian Space Agency. Data collection for this product is ongoing.
Z
Conceptualization of public data ecosystems
data.niaid.nih.gov
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
Explore at:
Dataset updated
Sep 26, 2024
Dataset provided by
University of Hradec Králové
University of Tartu
Authors
Anastasija, Nikiforova; Martin, Lnenicka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

Description of the data in this data set

PublicDataEcosystem_SLR provides the structure of the protocol

Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

Spreadsheets #2 provides the protocol structure.

Spreadsheets #3 provides the filled protocol for relevant studies.

The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

Descriptive Information

Article number

A study number, corresponding to the study number assigned in an Excel worksheet

Complete reference

The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

Year of publication

The year in which the study was published.

Journal article / conference paper / book chapter

The type of the paper, i.e., journal article, conference paper, or book chapter.

Journal / conference / book

Journal article, conference, where the paper is published.

DOI / Website

A link to the website where the study can be found.

Number of words

A number of words of the study.

Number of citations in Scopus and WoS

The number of citations of the paper in Scopus and WoS digital libraries.

Availability in Open Access

Availability of a study in the Open Access or Free / Full Access.

Keywords

Keywords of the paper as indicated by the authors (in the paper).

Relevance for our study (high / medium / low)

What is the relevance level of the paper for our study

Approach- and research design-related information

Approach- and research design-related information

Objective / Aim / Goal / Purpose & Research Questions

The research objective and established RQs.

Research method (including unit of analysis)

The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

Study’s contributions

The study’s contribution as defined by the authors

Qualitative / quantitative / mixed method

Whether the study uses a qualitative, quantitative, or mixed methods approach?

Availability of the underlying research data

Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

Period under investigation

Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

Use of theory / theoretical concepts / approaches? If yes, specify them

Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

Quality-related information

Quality concerns

Whether there are any quality concerns (e.g., limited information about the research methods used)?

Public Data Ecosystem-related information

Public data ecosystem definition

How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

Public data ecosystem evolution / development

Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

What constitutes a public data ecosystem?

What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

Components and relationships

What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

Stakeholders

What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

Actors and their roles

What actors does the public data ecosystem involve? What are their roles?

Data (data types, data dynamism, data categories etc.)

What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

Processes / activities / dimensions, data lifecycle phases

What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

Level (if relevant)

What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

Other elements or relationships (if any)

What other elements or relationships does the public data ecosystem consist of?

Additional comments

Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

New papers

Does the study refer to any other potentially relevant papers?

Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

Format of the file.xls, .csv (for the first spreadsheet only), .docx

Licenses or restrictionsCC-BY

For more info, see README.txt
Integrated Global Radiosonde Archive (IGRA) - Monthly Means (Version...
catalog.data.gov
datasets.ai
+4more
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOC/NOAA/NESDIS/NCEI > National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce (Point of Contact) (2023). Integrated Global Radiosonde Archive (IGRA) - Monthly Means (Version Superseded) [Dataset]. https://catalog.data.gov/dataset/integrated-global-radiosonde-archive-igra-monthly-means-version-superseded2
Explore at:
Dataset updated
Sep 19, 2023
Dataset provided by
United States Department of Commercehttp://commerce.gov/
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Environmental Satellite, Data, and Information Service
Description
Please note, this dataset has been superseded by a newer version (see below). Users should not use this version except in rare cases (e.g., when reproducing previous studies that used this version). Integrated Global Radiosonde Archive is a digital data set archived at the former National Climatic Data Center (NCDC), now National Centers for Environmental Information (NCEI). This dataset contains monthly means of geopotential height, temperature, zonal wind, and meridional wind derived from the Integrated Global Radiosonde Archive (IGRA). IGRA consists of radiosonde and pilot balloon observations at over 1500 globally distributed stations, and monthly means are available for the surface and mandatory levels at many of these stations. The period of record varies from station to station, with many extending from 1970 to 2016. Monthly means are computed separately for the nominal times of 0000 and 1200 UTC, considering data within two hours of each nominal time. A mean is provided, along with the number of values used to calculate it, whenever there are at least 10 values for a particular station, month, nominal time, and level.
GA Fields Descriptions
kaggle.com
zip
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MonicaNeli (2025). GA Fields Descriptions [Dataset]. https://www.kaggle.com/datasets/monicaneli/ga-fields-descriptions
Explore at:
zip(8389 bytes)Available download formats
Dataset updated
Apr 19, 2025
Authors
MonicaNeli
Description
This dataset is a custom reference of Google Analytics field definitions.

It was specifically compiled to enhance datasets like the Google Analytics 360 data from the Google Merchandise Store, which lacks field descriptions in its original BigQuery schema. By providing detailed definitions for each field, this reference aims to improve the interpretability of the data—especially when used by language models or analytics tools that rely on contextual understanding to process and answer queries effectively.
ERA5 monthly averaged data on pressure levels from 1940 to present
cds.climate.copernicus.eu
grib
Updated Nov 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 monthly averaged data on pressure levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.6860a573
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.6860a573
Dataset updated
Nov 6, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. So far this has only been the case for the month September 2021, while it will also be the case for October, November and December 2021. For months prior to September 2021 the final release has always been equal to ERA5T, and the goal is to align the two again after December 2021. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 monthly mean data on pressure levels from 1940 to present".
FHIR-Profiles-Resources
kaggle.com
zip
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fhirfly (2023). FHIR-Profiles-Resources [Dataset]. https://www.kaggle.com/datasets/fhirfly/fhirr4
Explore at:
zip(3709939 bytes)Available download formats
Dataset updated
Aug 1, 2023
Authors
fhirfly
Description
Kaggle Card: FHIR Profiles-Resources JSON File Overview Fast Healthcare Interoperability Resources (FHIR, pronounced "fire") is a standard developed by Health Level Seven International (HL7) for transferring electronic health records. The FHIR Profiles-Resources JSON file is an essential part of this standard. It provides a schema that defines the structure of FHIR resource types, including their properties and attributes.

Dataset Structure This file is structured in the JSON format, known for its versatility and human-readable nature. Each JSON object corresponds to a unique FHIR resource type, outlining its structure and providing a blueprint for the properties and attributes each resource type should contain.

Fields Description While the precise properties and attributes differ for each FHIR resource type, the typical elements you may encounter in this file include:

Id: The unique identifier for the resource type. Url: A global identifier URI for the resource type. Version: The business version of the resource. Name: The human-readable name for the resource type. Status: The publication status of the resource (draft, active, retired). Experimental: A boolean value indicating whether this resource type is experimental. Date: The date of the resource type's last change. Publisher: The individual or organization that published the resource type. Contact: Contact details for the publishers. Description: A natural language description of the resource type. UseContext: A list outlining the usability context for the resource type. Jurisdiction: Identifies the region/country where the resource type is defined. Purpose: An explanation of why the resource type is necessary. Element: A list defining the structure of the properties for the resource type, including data types and relationships with other resource types. Potential Use Cases Schema Validation: Use the schema to validate FHIR data and ensure it aligns with the defined structure and types for each resource. Interoperability: Facilitate the exchange of healthcare information with other FHIR-compatible systems by providing a standardized structure. Data Mapping: Utilize the schema to map data from other formats into the FHIR format, or vice versa. System Design: Aid the design and development of healthcare systems by offering a template for data structure.

🌆 City Lifestyle Segmentation Dataset

kaggle.com

zip

Updated Nov 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

UmutUygurr (2025). 🌆 City Lifestyle Segmentation Dataset [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/city-lifestyle-segmentation-dataset

Explore at:

zip(11274 bytes)Available download formats

Dataset updated

Nov 15, 2025

Authors

UmutUygurr

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">

🌆 About This Dataset

This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.

🎯 Perfect For:

📊 K-Means, DBSCAN, Agglomerative Clustering
🔬 PCA & t-SNE Dimensionality Reduction
🗺️ Geospatial Visualization (Plotly, Folium)
📈 Correlation Analysis & Feature Engineering
🎓 Educational Projects (Beginner to Intermediate)

📦 What's Inside?

Feature	Description	Range
10 Features	Economic, environmental & social indicators	Realistically scaled
300 Cities	Europe, Asia, Americas, Africa, Oceania	Diverse distributions
Strong Correlations	Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6)	ML-ready
No Missing Values	Clean, preprocessed data	Ready for analysis
4-5 Natural Clusters	Metropolitan hubs, eco-towns, developing centers	Pre-validated

🔥 Key Features

✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases

🚀 Quick Start Example

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)

# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)

# Analyze
print(df.groupby('cluster').mean())

🎓 Learning Outcomes

After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics

📚 Ideal For These Projects

🏆 Kaggle Competitions: Practice clustering techniques
📝 Academic Projects: Urban planning, sociology, environmental science
💼 Portfolio Work: Showcase ML skills to employers
🎓 Learning: Hands-on practice with unsupervised learning
🔬 Research: Urban lifestyle segmentation studies

🌍 Expected Clusters

Cluster	Characteristics	Example Cities
Metropolitan Tech Hubs	High income, density, rent	Silicon Valley, Singapore
Eco-Friendly Towns	Low density, clean air, high happiness	Nordic cities
Developing Centers	Mid income, high density, poor air	Emerging markets
Low-Income Suburban	Low infrastructure, income	Rural areas
Industrial Mega-Cities	Very high density, pollution	Manufacturing hubs

🛠️ Technical Details

Format: CSV (UTF-8)
Size: ~300 rows × 10 columns
Missing Values: 0%
Data Types: 2 categorical, 8 numerical
Target Variable: None (unsupervised)
Correlation Strength: Pre-validated (r: 0.4 to 0.8)

📖 What Makes This Dataset Special?

Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code

🏅 Use This Dataset If You Want To:

✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights

📊 Acknowledgments

This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.

Happy Clustering! 🎉

Shift Bookings Data
kaggle.com
zip
Updated Mar 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DD (2023). Shift Bookings Data [Dataset]. https://www.kaggle.com/datasets/dubradave/shift-bookings-data
Explore at:
zip(13765563 bytes)Available download formats
Dataset updated
Mar 23, 2023
Authors
DD
License
https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
Description
Data Details

Each row in your shift data is a shift; the following are helpful descriptions of columns within that dataset: ● “Agent ID”: HCP ID ● “Facility ID”: HCF ID ● “Start”: The shift start time ● Agent Req”: the type of HCP that is being requested for this shift ● “End”: The shift end time ● “Shift Type”: specifies if the shift is in the morning (AM), afternoon (PM), overnight (NOC), or custom (CUSTOM) ● “Deleted”: Whether the shift was deleted ○ Note “deleted” means “canceled by facility” ● “Created At”: When the shift was created ● “Charge”: Per hour charge rate ● “Time”: How many hours the shift lasts ● “Verified”: Indicates that the shift was worked, as confirmed by a signed timesheet

Each row in your cancellation logs is a unique cancellation event; the following are helpful descriptions of columns within that dataset: ● “Action”: The type of cancellation action ○ “WORKER_CANCEL”: The HCP canceled a shift they booked ○ “NO_CALL_NO_SHOW”: The HCP canceled a shift they booked after the shift commenced or otherwise did not show up to the shift and did not inform the facility about their absence ● “Created At”: When the action took place ● “Facility ID”: HCF ID ● “Worker ID”: The ID of the HCP that was previously associated with the shift ● “Shift ID”: The shift ID ● “Lead Time”: The time from “action” to “shift start” (in hours)

Each row in your shift claim logs is a unique booking event; the following are helpful descriptions for columns within that dataset:

Note that we only included claim actions for a subset of the date range in the "shifts" data. Thus, there are likely shifts that don't have associated claim actions. That's OK, we're only providing this data so you can observe HCP booking behavior. ● “Action”: The type of booking action ○ "SHIFT_CLAIM": The HCP instantly booked the shift. As soon as they booked the shift, it was theirs.

Business Problem

You’ll likely want to know more about how the marketplace is currently operating to form your own mental model.

Data ● In this “Data” folder, you can find the below: ○ Shift data for one of the metropolitan statistical areas in which we have a presence ○ A list of cancellation logs for shifts that were canceled by HCPs ○ A list of shift claim logs ● We define the fields in these files below

Assumptions and Business Context ● The most damaging type of cancellation for the HCF is one in which the HCP does what we call a “No-Call-No-Show”; this means they canceled the shift after the shift started or otherwise did not show up to the shift and did not inform the facility of their absence ● The top reasons why HCPs cancel shifts last minute are: sick, family emergency, transportation issue (e.g. car broke down), facility issue ● From interviews, the most important things to HCPs are: will there be shifts that fit my erratic schedule, that are close enough to home, that pay enough, and that pay on time? ● HCPs currently receive a set of notifications prior to their shift to remind them of their upcoming shift

Facebook

Twitter

Click to copy link

Link copied

Cite

Bureau of Transportation Statistics (BTS) (Point of Contact) (2025). Means of Transportation to Work [Dataset]. https://catalog.data.gov/dataset/means-of-transportation-to-work2

Means of Transportation to Work

Explore at:

Dataset updated

Jul 17, 2025

Dataset provided by

Bureau of Transportation Statisticshttp://www.rita.dot.gov/bts

Description

The Means of Transportation to Work dataset was compiled using information from December 31, 2023 and updated December 12, 2024 from the Bureau of Transportation Statistics (BTS) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The Means of Transportation to Work table from the 2023 American Community Survey (ACS) 5-year estimates was joined to 2023 tract-level geographies for all 50 States, District of Columbia and Puerto Rico provided by the Census Bureau. A new file was created that combines the demographic variables from the former with the cartographic boundaries of the latter. The national level census tract layer contains data on the number and percentage of commuters (workers 16 years and over) that used various transportation modes to get to work. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529037

Clear search

Close search

Google apps

Main menu

Means of Transportation to Work

Table_1_A scalable and transparent data pipeline for AI-enabled health data...

Medical Service Study Area Data Dictionary

Patient Risk Profiles

Data Dictionary

patient_risk_profiles.csv

IPPS DRG Provider Summary

IPPS DRG Provider Summary

Average Discharges, Charges, and Medicare Payments

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

K means clustering the opportunity atlas 2023APR

Data Definition Guidelines

Dataset: A Systematic Literature Review on the topic of High-value datasets

CERES Energy Balanced and Filled (EBAF) TOA and Surface Monthly means data...

Geomagnetic Observatory Annual Means Data

Medical Service Study Areas

Mapping between MEANS-InOut input data and LCI from reference database

MOPITT CO gridded monthly means (Near and Thermal Infrared Radiances) V009

Conceptualization of public data ecosystems

Integrated Global Radiosonde Archive (IGRA) - Monthly Means (Version...

GA Fields Descriptions

ERA5 monthly averaged data on pressure levels from 1940 to present

FHIR-Profiles-Resources

🌆 City Lifestyle Segmentation Dataset

🌆 About This Dataset

🎯 Perfect For:

📦 What's Inside?

🔥 Key Features

🚀 Quick Start Example

🎓 Learning Outcomes

📚 Ideal For These Projects

🌍 Expected Clusters

🛠️ Technical Details

📖 What Makes This Dataset Special?

🏅 Use This Dataset If You Want To:

📊 Acknowledgments

Shift Bookings Data

Means of Transportation to WorkSee More Versions

`patient_risk_profiles.csv`

Means of Transportation to Work