100+ datasets found

HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted...
catalog.data.gov
healthdata.gov
+1more
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted Access [Dataset]. https://catalog.data.gov/dataset/hcup-nationwide-ambulatory-surgery-sample-nass-database-restricted-access
Explore at:
Dataset updated
Jul 16, 2025
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Description
The largest all-payer ambulatory surgery database in the United States, the Healthcare Cost and Utilization Project (HCUP) Nationwide Ambulatory Surgery Sample (NASS) produces national estimates of major ambulatory surgery encounters in hospital-owned facilities. Major ambulatory surgeries are defined as selected major therapeutic procedures that require the use of an operating room, penetrate or break the skin, and involve regional anesthesia, general anesthesia, or sedation to control pain (i.e., surgeries flagged as "narrow" in the HCUP Surgery Flag Software). Unweighted, the NASS contains approximately 9.0 million ambulatory surgery encounters each year and approximately 11.8 million ambulatory surgery procedures. Weighted, it estimates approximately 11.9 million ambulatory surgery encounters and 15.7 million ambulatory surgery procedures. Sampled from the HCUP State Ambulatory Surgery and Services Databases (SASD) and State Emergency Department Databases (SEDD) in order to capture both planned and emergent major ambulatory surgeries, the NASS can be used to examine selected ambulatory surgery utilization patterns. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels. The NASS contains clinical and resource-use information that is included in a typical hospital-owned facility record, including patient characteristics, clinical diagnostic and surgical procedure codes, disposition of patients, total charges, facility characteristics, and expected source of payment, regardless of payer, including patients covered by Medicaid, private insurance, and the uninsured. The NASS excludes data elements that could directly or indirectly identify individuals, hospitals, or states. The NASS is limited to encounters with at least one in-scope major ambulatory surgery on the record, performed at hospital-owned facilities. Procedures intended primarily for diagnostic purposes are not considered in-scope. Restricted access data files are available with a data use agreement and brief online security training.
HCUP National Inpatient Database
redivis.com
stanford.redivis.com
application/jsonl +7
Updated Sep 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2025). HCUP National Inpatient Database [Dataset]. http://doi.org/10.57761/gr09-hq95
Explore at:
application/jsonl, csv, avro, arrow, parquet, stata, sas, spssAvailable download formats
Unique identifier
https://doi.org/10.57761/gr09-hq95
Dataset updated
Sep 27, 2025
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 2000 - Dec 31, 2022
Description
Abstract

The NIS is the largest publicly available all-payer inpatient healthcare database designed to produce U.S. regional and national estimates of inpatient utilization, access, cost, quality, and outcomes. Unweighted, it contains data from around 7 million hospital stays each year. Weighted, it estimates around 35 million hospitalizations nationally. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality (AHRQ), HCUP data inform decision making at the national, State, and community levels.

Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, uncommon treatments, and special populations.

Usage

%3Cu%3EDO NOT%3C/u%3E

use this data without referring to the NIS Database Documentation, which includes:

Description of NIS Database

Restrictions on Use

%3C!-- --%3E

Data Elements

Additional Resources for Data Elements

ICD-10-CM/PCS Data Included in the NIS Starting with 2015 (More details about this transition available here.)

Known Data Issues

NIS Supplemental Files

HCUP Tools: Labels and Formats

Obtaining HCUP Data

%3C!-- --%3E

Before Manuscript Submission

%3Cu%3E%3Cstrong%3EAll manuscripts%3C/strong%3E%3C/u%3E

(and other items you'd like to publish) %3Cu%3E%3Cstrong%3Emust be submitted to%3C/strong%3E%3C/u%3E

%3Cu%3E%3Cstrong%3Ephsdatacore@stanford.edu%3C/strong%3E%3C/u%3E

for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

You must also %3Cu%3E%3Cstrong%3Emake sure that your work meets all of the AHRQ (data owner) requirements for publishing%3C/strong%3E%3C/u%3E

with HCUP data--listed at https://hcup-us.ahrq.gov/db/nation/nis/nischecklist.jsp

HCUP Online Tutorials

For additional assistance, AHRQ has created the HCUP Online Tutorial Series, a series of free, interactive courses which provide training on technical methods for conducting research with HCUP data. Topics include an HCUP Overview Course and these tutorials:

• The HCUP Sampling Design tutorial is designed to help users learn how to account for sample design in their work with HCUP national (nationwide) databases. • The Producing National HCUP Estimates tutorial is designed to help users understand how the three national (nationwide) databases – the NIS, Nationwide Emergency Department Sample (NEDS), and Kids' Inpatient Database (KID) – can be used to produce national and regional estimates. HCUP 2020 NIS (8/22/22) 14 Introduction • The Calculating Standard Errors tutorial shows how to accurately determine the precision of the estimates produced from the HCUP nationwide databases. Users will learn two methods for calculating standard errors for estimates produced from the HCUP national (nationwide) databases. • The HCUP Multi-year Analysis tutorial presents solutions that may be necessary when conducting analyses that span multiple years of HCUP data. • The HCUP Software Tools Tutorial provides instructions on how to apply the AHRQ software tools to HCUP or other administrative databases.

New tutorials are added periodically, and existing tutorials are updated when necessary. The Online Tutorial Series is located on the HCUP-US website at https://hcup-us.ahrq.gov/tech_assist/tutorials.jsp

Important notes about the 2015 data

In 2015, AHRQ restructured the data as described here:

https://hcup-us.ahrq.gov/db/nation/nis/2015HCUPNationalInpatientSample.pdf

Some key points:

For the 2015 data, all diagnosis and procedure data elements, including any data elements derived from diagnoses and procedures, were moved out of the Core File and into the Diagnosis and Procedure Groups Files.

Prior to 2015, and for Q1-3 of 2015, the DX1-30 and PR1-15 variables (which use ICD-9 codes) variables were used, but starting in Q4 of 2015, the I10_DX1-30 and I10_PR1-I10-15 (which use ICD-10 codes) were used. The best way to identify discharges for quarter 1-3 or quarter 4 is based on the value of the diagnosis version (DXVER); For quarters 1-3, DXVER has a value of 9; while for quarter 4, DXVER has a value of 10.

Some other variables also transitioned in Q4 of 2015. Please refer to the link above for more details.

Starting in 2016, the diagnosis and procedure information returned to the Core file. Additional detai
HCUP Nationwide Readmissions Database (NRD)- Restricted Access Files
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality, Department of Health & Human Services (2023). HCUP Nationwide Readmissions Database (NRD)- Restricted Access Files [Dataset]. https://catalog.data.gov/dataset/healthcare-cost-and-utilization-project-nationwide-readmissions-database-nrd
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Description
The Healthcare Cost and Utilization Project (HCUP) Nationwide Readmissions Database (NRD) is a unique and powerful database designed to support various types of analyses of national readmission rates for all payers and the uninsured. The NRD includes discharges for patients with and without repeat hospital visits in a year and those who have died in the hospital. Repeat stays may or may not be related. The criteria to determine the relationship between hospital admissions is left to the analyst using the NRD. This database addresses a large gap in health care data - the lack of nationally representative information on hospital readmissions for all ages. Outcomes of interest include national readmission rates, reasons for returning to the hospital for care, and the hospital costs for discharges with and without readmissions. Unweighted, the NRD contains data from approximately 18 million discharges each year. Weighted, it estimates roughly 35 million discharges. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels. The NRD is drawn from HCUP State Inpatient Databases (SID) containing verified patient linkage numbers that can be used to track a person across hospitals within a State, while adhering to strict privacy guidelines. The NRD is not designed to support regional, State-, or hospital-specific readmission analyses. The NRD contains more than 100 clinical and non-clinical data elements provided in a hospital discharge abstract. Data elements include but are not limited to: diagnoses, procedures, patient demographics (e.g., sex, age), expected source of payer, regardless of expected payer, including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as ‘no charge, discharge month, quarter, and year, total charges, length of stay, and data elements essential to readmission analyses. The NIS excludes data elements that could directly or indirectly identify individuals. Restricted access data files are available with a data use agreement and brief online security training.
Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open...
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne (2023). Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform [Dataset]. http://doi.org/10.1371/journal.pone.0145791
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0145791
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner’s Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.
Z
Food and Agriculture Biomass Input–Output (FABIO) database
data.niaid.nih.gov
zenodo.org
+1more
Updated Jun 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruckner, Martin; Kuschnig, Nikolas (2022). Food and Agriculture Biomass Input–Output (FABIO) database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2577066
Explore at:
Dataset updated
Jun 8, 2022
Dataset provided by
Vienna University of Economics and Business
Authors
Bruckner, Martin; Kuschnig, Nikolas
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This data repository provides the Food and Agriculture Biomass Input Output (FABIO) database, a global set of multi-regional physical supply-use and input-output tables covering global agriculture and forestry.

The work is based on mostly freely available data from FAOSTAT, IEA, EIA, and UN Comtrade/BACI. FABIO currently covers 191 countries + RoW, 118 processes and 125 commodities (raw and processed agricultural and food products) for 1986-2013. All R codes and auxilliary data are available on GitHub. For more information please refer to https://fabio.fineprint.global.

The database consists of the following main components, in compressed .rds format:

Z: the inter-commodity input-output matrix, displaying the relationships of intermediate use of each commodity in the production of each commodity, in physical units (tons). The matrix has 24000 rows and columns (125 commodities x 192 regions), and is available in two versions, based on the method to allocate inputs to outputs in production processes: Z_mass (mass allocation) and Z_value (value allocation). Note that the row sums of the Z matrix (= total intermediate use by commodity) are identical in both versions.

Y: the final demand matrix, denoting the consumption of all 24000 commodities by destination country and final use category. There are six final use categories (yielding 192 x 6 = 1152 columns): 1) food use, 2) other use (non-food), 3) losses, 4) stock addition, 5) balancing, and 6) unspecified.

X: the total output vector of all 24000 commodities. Total output is equal to the sum of intermediate and final use by commodity.

L: the Leontief inverse, computed as (I – A)-1, where A is the matrix of input coefficients derived from Z and x. Again, there are two versions, depending on the underlying version of Z (L_mass and L_value).

E: environmental extensions for each of the 24000 commodities, including four resource categories: 1) primary biomass extraction (in tons), 2) land use (in hectares), 3) blue water use (in m3)., and 4) green water use (in m3).

mr_sup_mass/mr_sup_value: For each allocation method (mass/value), the supply table gives the physical supply quantity of each commodity by producing process, with processes in the rows (118 processes x 192 regions = 22656 rows) and commodities in columns (24000 columns).

mr_use: the use table capture the quantities of each commodity (rows) used as an input in each process (columns).

A description of the included countries and commodities (i.e. the rows and columns of the Z matrix) can be found in the auxiliary file io_codes.csv. Separate lists of the country sample (including ISO3 codes and continental grouping) and commodities (including moisture content) are given in the files regions.csv and items.csv, respectively. For information on the individual processes, see auxiliary file su_codes.csv. RDS files can be opened in R. Information on how to read these files can be obtained here: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/readRDS

Except of X.rds, which contains a matrix, all variables are organized as lists, where each element contains a sparse matrix. Please note that values are always given in physical units, i.e. tonnes or head, as specified in items.csv. The suffixes value and mass only indicate the form of allocation chosen for the construction of the symmetric IO tables (for more details see Bruckner et al. 2019). Product, process and country classifications can be found in the file fabio_classifications.xlsx.

Footprint results are not contained in the database but can be calculated, e.g. by using this script: https://github.com/martinbruckner/fabio_comparison/blob/master/R/fabio_footprints.R

How to cite:

To cite FABIO work please refer to this paper:

Bruckner, M., Wood, R., Moran, D., Kuschnig, N., Wieland, H., Maus, V., Börner, J. 2019. FABIO – The Construction of the Food and Agriculture Input–Output Model. Environmental Science & Technology 53(19), 11302–11312. DOI: 10.1021/acs.est.9b03554

License:

This data repository is distributed under the CC BY-NC-SA 4.0 License. You are free to share and adapt the material for non-commercial purposes using proper citation. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. In case you are interested in a collaboration, I am happy to receive enquiries at martin.bruckner@wu.ac.at.

Known issues:

The underlying FAO data have been manipulated to the minimum extent necessary. Data filling and supply-use balancing, yet, required some adaptations. These are documented in the code and are also reflected in the balancing item in the final demand matrices. For a proper use of the database, I recommend to distribute the balancing item over all other uses proportionally and to do analyses with and without balancing to illustrate uncertainties.
Database of Free Tech Books
kaggle.com
zip
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farhan Ali (2025). Database of Free Tech Books [Dataset]. https://www.kaggle.com/datasets/farhanali097/database-of-free-tech-books
Explore at:
zip(43973 bytes)Available download formats
Dataset updated
Jan 15, 2025
Authors
Farhan Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Database Free TechBooks

This dataset is a comprehensive collection of free tech books available on the web, specifically sourced from the FreeTechBooks platform. It includes details such as the names and URLs of various free textbooks, covering a wide range of topics including computer science, programming, data science, artificial intelligence, and more. The dataset is designed for educational purposes, providing easy access to high-quality, freely available technical resources.

Dataset Details:

The dataset consists of two columns:

Name: The title of the book.

URL: A direct link to the page where the book can be accessed or downloaded for free.

Features:

Comprehensive: Contains a collection of over 1200 free tech books.

Variety of Topics: Books span various domains such as:

Programming Languages: (Python, Java, C++)

Data Science & Machine Learning

Artificial Intelligence

Cybersecurity

Networking

Web Development

Cloud Computing

And much more.

Usage:

The dataset can be used for:

Educational research and learning.

Building recommendation systems for tech resources.

Analyzing trends in the availability of open-source learning materials.

Supporting the creation of educational tools and resources in tech-related fields.

Source:

The data was scraped from the FreeTechBooks website, a platform that aggregates freely available textbooks on various technical topics.

Data Collection Method:

The data was collected by iterating through 82 pages of the FreeTechBooks website, extracting the names andURLs of books listed under different topics. The dataset includes data for a total of 1200+ books.

Notes:

All books listed are freely available and open to the public.

URLs lead to external sites where users can read or download the books.

Dataset Size:

Number of rows: 1200+

Number of columns: 2 (Name, URL)
NCSS Soil Characterization Database
catalog.data.gov
ngda-soils-geoplatform.hub.arcgis.com
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Resources Conservation Service (2025). NCSS Soil Characterization Database [Dataset]. https://catalog.data.gov/dataset/ncss-soil-characterization-database-d2772
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
Description
The National Cooperative Soil Survey - Soil Characterization Database (NCSS-SCD) contains laboratory data for more than 65,000 locations (i.e. xy coordinates) throughout the United States and its Territories, and about 2,100 locations from other countries. It is a compilation of data from the Kellogg Soil Survey Laboratory (KSSL) and several cooperating laboratories. The data steward and distributor is the National Soil Survey Center (NSSC). Information contained within the database includes physical, chemical, biological, mineralogical, morphological, and mid infrared reflectance (MIR) soil measurements, as well a collection of calculated values. The intended use of the data is to support interpretations related to soil use and management. Data Usage Access to the data is provided via the following user interfaces: 1. Interactive Web Map 2. Lab Data Mart (LDM) for querying data and generating reports 3. Soil Data Access (SDA) web services for querying data 5. Direct download of the entire database in several formats Data at each location includes measurements at multiple depths (e.g. soil horizons). However, not all analyses have been conducted for each location and depth. Typically, a suite of measurements was collected based upon assumed or known conditions regarding the soil being analyzed. For example, soils of arid environments are routinely analyzed for salts and carbonates as part of the standard analysis suite. Standard morphological soil descriptions are available for about 60,000 of these locations. Mid-infrared (MIR) spectroscopy is available for about 7,000 locations. Soil fertility measurements, such as those made by Agricultural Experiment Stations, were not made. Most of the data were obtained over the last 40 years, with about 4,000 locations before 1960, 25,000 from 1960-1990, 27,000 from 1990-2010, and 13,000 from 2010 to 2021. Generally, the number of measurements recorded per location has increased over time. Typically, the data were collected to represent a soil series or map unit component concept. They may also have been sampled to determine the range of variation within a given landscape. Although strict quality-control measures are applied, the NSSC does not warrant that the data are error free. Also, in some cases the measurements are not within the applicability range of the laboratory methods. For example, dispersion of clay is incomplete in some soils by the standard method used for determining particle-size distribution. Soils producing incomplete dispersion include those that are derived from volcanic materials or that have a high content of iron oxides, gypsum, carbonates, or other cementing materials. Also note that determination of clay minerals by x-ray diffraction is relative. Measurements of very high or very low quantities by any method are not very precise. Other measurements have other limitations in some kinds of soils. Such data are retained in the database for research purposes. Also, some of the data for were obtained from cooperating laboratories within the NCSS. The accuracy of the location coordinates has not been quantified but can be inferred from the precision of their decimal degrees and the presence of a map datum. Some older records may correspond to a county centroid. When the map datum is missing it can be assumed that data prior to 1990 was recorded using NAD27 and with WGS84 after 1995. For detailed information about methods used in the KSSL and other laboratories refer to "Soil Survey Investigation Report No. 42". For information on the application of laboratory data, refer to "Soil Survey Investigation Report No. 45". If you are unfamiliar with any terms or methods feel free to consult your NRCS State Soil Scientist. Terms of Use This dataset is not designed for use as a primary regulatory tool in permitting or citing decisions but may be used as a reference source. This is public information and may be interpreted by organizations, agencies, units of government, or others based on needs; however, they are responsible for the appropriate application. Federal, State, or local regulatory bodies are not to reassign to the Natural Resources Conservation Service or the National Cooperative Soil Survey any authority for the decisions that they make. The Natural Resources Conservation Service will not perform any evaluations of these data for purposes related solely to State or local regulatory programs.
Use of electronic identification procedures in the last 12 months for...
data.europa.eu
html, unknown
Updated Oct 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE (2021). Use of electronic identification procedures in the last 12 months for private purposes by individuals, by status of activity, Slovenia, 2018 [Dataset]. https://data.europa.eu/data/datasets/surs2980415s
Explore at:
html, unknownAvailable download formats
Dataset updated
Oct 12, 2021
Dataset provided by
Government of Slovenia
Authors
VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE
Area covered
Slovenia
Description
This database automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source database entitled “Use of electronic identification procedures in the last 12 months for private purposes by individuals, by status of activity, Slovenia, 2018”.

Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.
Z
Data from: AneuX morphology database
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Nov 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juchler, Norman; Bijlenga, Philippe; Hirsch, Sven (2024). AneuX morphology database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6678441
Explore at:
Dataset updated
Nov 5, 2024
Dataset provided by
Zurich University of Applied Sciences
Geneva University Hospital and Faculty of Medicine
Authors
Juchler, Norman; Bijlenga, Philippe; Hirsch, Sven
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The AneuX morphology database is an open-access, multi-centric database containing 3D geometries of 750 intracranial aneurysms curated in the context of the AneuX project (2015-2020). The database combines data from three different projects (AneuX, @neurIST and Aneurisk) standardized using a single processing pipeline. The code to process and view the 3D geometries is provided under this public repository: https://github.com/hirsch-lab/aneuxdb

The database at a glance:

750 aneurysm domes (surface meshes)

668 vessel trees (surface meshes)

3 different data sources (AneuX, @neurIST, Aneurisk)

3 different mesh resolutions (original resolution, 0.01mm² and 0.05mm² target cell area)

4 different cut configurations (including planar and free-form cuts)

5 clinical parameters (aneurysm rupture status, location and side; patient age and sex)

170 pre-computed morphometric indices for each of the aneurysm domes

Terms of use / License:

The data is provided "as is", without any warranties of any kind. It is provided under the CC BY-NC 4.0 license, with the additional requirements (A) that the use of the database is declared using the sentence below (you can omit the URLs), and (B) to cite our peer reviewed journal article below.

[This project] uses data from the AneuX morphology database, an open-access, multi-centric database combining data from three European projects: AneuX project (www.aneux.ch), @neurIST project (www.aneurist.org) and Aneurisk (http://ecm2.mathcs.emory.edu/aneuriskweb/index).

In accordance with the terms of use, please cite the following journal article when referring to our dataset.

Juchler, Schilling, Bijlenga, Kurtcuoglu, Hirsch. Shape trumps size: Image-based morphological analysis reveals that the 3D shape discriminates intracranial aneurysm disease status better than aneurysm size. Frontiers in Neurology (2022), DOI: 10.3389/fneur.2022.809391

The AneuX morphology database contains parts (geometric models, clinical data) of the publicly available Aneurisk dataset released under the CC BY-NC 3.0 license (which is compatible with the license used here). Like all geometric models in this database, the Aneurisk models were preprocessed using the same procedure. See here for a description.

Funding and authorizations

The AneuX project

Data collection in accordance with @neurIST protocol v5

Ethics autorisations Geneva BASEC PB_2018‐00073

Supported by the grant from the Swiss SystemsX.ch initiative, evaluated by the Swiss National Science Foundation

@neurIST project

Data collection in accordance with @neurIST protocol v1

Ethics autorisations AmsterdamMEC 07-159, Barcelona2007-3507, Geneva CER 07-056, Oxfordshire REC AQ05/Q1604/162, Pècs RREC MC P 06 Jul 2007

Supported by the 6th framework program of the European Commission FP6-IST-2004–027703

Acknowledgments:

The AneuX project was supported by SystemsX.ch, and evaluated by the Swiss National Science Foundation (SNSF). This database would not be possible without the support of the Zurich University of Applied Sciences (ZHAW) and University Hospitals Geneva (HUG).

We thank the following people for their support and contributions to the AneuX morphology database.

From the AneuX project (in alphabetical order):

Daniel Rüfenacht

Diana Sapina

Isabel Wanke

Karl Lovblad

Karl Schaller

Olivier Brina

Paolo Machi

Rafik Ouared

Sabine Schilling

Sandrine Morel

Ueli Ebnöther

Vartan Kurtucuoglu

Vitor Mendes Pereira

Zolt Kuscàr

From the @neurIST project (in alphabetical order)

Alan Waterworth

Alberto Marzo

Alejandro Frangi

Alison Clarke

Ana Marcos Gonzalez

Ana Paula Narata

Antonio Arbona

Bawarjan Schatlo

Daniel Rüfenacht

Elio Vivas

Ferenc Kover

Gulam Zilani

Guntram Berti

Guy Lonsdale

Istvan Hudak

James Byrne

Jimison Iavindrasana

Jordi Blasco

Juan Macho

Julia Yarnold

Mari Cruz Villa Uriol

Martin Hofmann-Apitius

Max Jägersberg

Miriam CJM Sturkenboom

Nicolas Roduit

Pankaj Singh

Patricia Lawford

Paul Summers

Peer Hasselmeyer

Peter Bukovics

Rod Hose

Roelof Risselada

Stuart Coley

Tamas Doczi

Teresa Sola

Umang Patel

From the Aneurisk project (list from AneuriskWeb, in alphabetical order):

Alessandro Veneziani

Andrea Remuzzi

Edoardo Boccardi

Francesco. Migliavacca

Gabriele Dubini

Laura Sangalli

Luca Antiga

Maria Piccinelli

Piercesare Secchi

Simone Vantini

Susanna Bacigaluppi

Tiziano Passerini
Data Bundle for PyPSA-Eur: An Open Optimisation Model of the European...
zenodo.org
data.niaid.nih.gov
xz, zip
Updated Jul 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Fabian Neumann; Fabian Neumann; Tom Brown; Iegor Riepin; Bobby Xiong; Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Tom Brown; Iegor Riepin; Bobby Xiong (2024). Data Bundle for PyPSA-Eur: An Open Optimisation Model of the European Transmission System [Dataset]. http://doi.org/10.5281/zenodo.12760663
Explore at:
zip, xzAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12760663
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Fabian Neumann; Fabian Neumann; Tom Brown; Iegor Riepin; Bobby Xiong; Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Tom Brown; Iegor Riepin; Bobby Xiong
Description
PyPSA-Eur is an open model dataset of the European power system at the transmission network level that covers the full ENTSO-E area. It can be built using the code provided at https://github.com/PyPSA/PyPSA-eur.

It contains alternating current lines at and above 220 kV voltage level and all high voltage direct current lines, substations, an open database of conventional power plants, time series for electrical demand and variable renewable generator availability, and geographic potentials for the expansion of wind and solar power.

Not all data dependencies are shipped with the code repository, since git is not suited for handling large changing files. Instead we provide separate data bundles to be downloaded and extracted as noted in the documentation.

This is the full data bundle to be used for rigorous research. It includes large bathymetry and natural protection area datasets.

While the code in PyPSA-Eur is released as free software under the MIT, different licenses and terms of use apply to the various input data, which are summarised below:

corine/*

CORINE Land Cover (CLC) database

Source: https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012/

Extract from Terms of Use:

Access to data is based on a principle of full, open and free access as established by the Copernicus data and information policy Regulation (EU) No 1159/2013 of 12 July 2013. This regulation establishes registration and licensing conditions for GMES/Copernicus users and can be found here. Free, full and open access to this data set is made on the conditions that:

When distributing or communicating Copernicus dedicated data and Copernicus service information to the public, users shall inform the public of the source of that data and information.

Users shall make sure not to convey the impression to the public that the user's activities are officially endorsed by the Union.

Where that data or information has been adapted or modified, the user shall clearly state this.

The data remain the sole property of the European Union. Any information and data produced in the framework of the action shall be the sole property of the European Union. Any communication and publication by the beneficiary shall acknowledge that the data were produced “with funding by the European Union”.

https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012?tab=metadata

eez/*

World exclusive economic zones (EEZ)

Source: http://www.marineregions.org/sources.php#unioneezcountry

Extract from Terms of Use:

Marine Regions’ products are licensed under CC-BY-NC-SA. Please contact us for other uses of the Licensed Material beyond license terms. We kindly request our users not to make our products available for download elsewhere and to always refer to marineregions.org for the most up-to-date products and services.

http://www.marineregions.org/disclaimer.php

natura/*

Natura 2000 natural protection areas

Source: https://www.eea.europa.eu/data-and-maps/data/natura-10

Extract from Terms of Use:

EEA standard re-use policy: unless otherwise indicated, re-use of content on the EEA website for commercial or non-commercial purposes is permitted free of charge, provided that the source is acknowledged (https://www.eea.europa.eu/legal/copyright). Copyright holder: Directorate-General for Environment (DG ENV).

https://www.eea.europa.eu/data-and-maps/data/natura-10#tab-metadata

naturalearth/*

World country shapes

Source: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/

Extract from Terms of Use:

All versions of Natural Earth raster + vector map data found on this website are in the public domain. You may use the maps in any manner, including modifying the content and design, electronic dissemination, and offset printing. The primary authors, Tom Patterson and Nathaniel Vaughn Kelso, and all other contributors renounce all financial claim to the maps and invites you to use them for personal, educational, and commercial purposes.

No permission is needed to use Natural Earth. Crediting the authors is unnecessary.

http://www.naturalearthdata.com/about/terms-of-use/

NUTS_2013_60M_SH/*

Europe NUTS3 regions

Source: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

Extract from Terms of Use:

In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:

The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").

The permission to use the data is granted on condition that: the data will not be used for commercial purposes; the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page.

https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

https://ec.europa.eu/eurostat/about/policies/copyright

gebco/GEBCO_2014_2D.nc

GEBCO bathymetric dataset

Source: https://www.gebco.net/data_and_products/gridded_bathymetry_data/version_20141103/

Extract from Terms of Use:

The GEBCO Grid is placed in the public domain and may be used free of charge. Use of the GEBCO Grid indicates that the user accepts the conditions of use and disclaimer information given below.

Users are free to:

Copy, publish, distribute and transmit The GEBCO Grid

Adapt The GEBCO Grid

Commercially exploit The GEBCO Grid, by, for example, combining it with other information, or by including it in their own product or application

Users must:

Acknowledge the source of The GEBCO Grid. A suitable form of attribution is given in the documentation that accompanies The GEBCO Grid.

Not use The GEBCO Grid in a way that suggests any official status or that GEBCO, or the IHO or IOC, endorses any particular application of The GEBCO Grid.

Not mislead others or misrepresent The GEBCO Grid or its source.

https://www.gebco.net/data_and_products/gridded_bathymetry_data/documents/gebco_2014_historic.pdf

je-e-21.03.02.xls

Population and GDP data for Swiss Cantons

Source: https://www.bfs.admin.ch/bfs/en/home/news/whats-new.assetdetail.7786557.html

Extract from Terms of Use:

Information on the websites of the Federal Authorities is accessible to the public. Downloading, copying or integrating content (texts, tables, graphics, maps, photos or any other data) does not entail any transfer of rights to the content.

Copyright and any other rights relating to content available on the websites of the Federal Authorities are the exclusive property of the Federal Authorities or of any other expressly mentioned owners.

Any reproduction requires the prior written consent of the copyright holder. The source of the content (statistical results) should always be given.
O*NET Database
onetcenter.org
excel, mysql, oracle +2
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for O*NET Development, O*NET Database [Dataset]. https://www.onetcenter.org/database.html
Explore at:
oracle, sql server, text, mysql, excelAvailable download formats
Dataset provided by
Occupational Information Network
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Dataset funded by
US Department of Labor, Employment and Training Administration
Description
The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.
Data content areas include:
Worker Characteristics (e.g., Abilities, Interests, Work Styles)
Worker Requirements (e.g., Education, Knowledge, Skills)
Experience Requirements (e.g., On-the-Job Training, Work Experience)
Occupational Requirements (e.g., Detailed Work Activities, Work Context)
Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)
G
Time-series database for OT data Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Oct 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Time-series database for OT data Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/time-series-database-for-ot-data-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Oct 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Time-Series Database for OT Data Market Outlook

According to our latest research, the global Time-Series Database for OT Data market size reached USD 1.84 billion in 2024, driven by increasing adoption of IoT and Industry 4.0 initiatives across operational technology (OT) environments. The market is expanding at a robust CAGR of 15.2%, and is forecasted to reach USD 5.18 billion by 2033. This growth is primarily propelled by the escalating need for real-time data analytics and process optimization in critical industries such as manufacturing, energy, and transportation, which are leveraging time-series databases to efficiently store, process, and analyze massive volumes of time-stamped data generated by OT systems.

A significant growth factor in the Time-Series Database for OT Data market is the rapid digital transformation occurring across traditional industrial sectors. As organizations strive to modernize their operations, there is a marked increase in the deployment of smart sensors, connected devices, and automation solutions. These advancements generate vast streams of time-stamped data, necessitating robust, scalable, and high-performance time-series databases capable of handling the unique requirements of OT environments. The integration of advanced analytics and artificial intelligence (AI) with time-series databases further enhances their value proposition, enabling predictive maintenance, anomaly detection, and real-time decision-making, which are critical for maximizing operational efficiency and minimizing downtime.

Another critical driver is the growing emphasis on predictive maintenance and asset management. Industrial companies are shifting from reactive to proactive maintenance strategies to reduce unplanned outages and extend asset lifecycles. Time-series databases play a pivotal role in this transition by enabling the continuous collection, storage, and analysis of sensor data from machinery, equipment, and infrastructure. The ability to detect patterns, trends, and anomalies in real-time empowers organizations to schedule maintenance activities precisely when needed, thereby reducing costs and improving overall productivity. This trend is particularly pronounced in sectors such as energy & utilities, oil & gas, and transportation, where equipment reliability and uptime are paramount.

Furthermore, the increasing adoption of cloud-based solutions is accelerating the growth of the Time-Series Database for OT Data market. Cloud deployment offers enhanced scalability, flexibility, and cost-efficiency, making it an attractive option for organizations seeking to manage large volumes of time-series data without the burden of maintaining on-premises infrastructure. Cloud-based time-series databases facilitate seamless integration with other cloud-native analytics tools and platforms, supporting advanced use cases such as remote monitoring, process optimization, and cross-site data aggregation. This shift is also fostering greater adoption among small and medium enterprises (SMEs), which can now leverage enterprise-grade time-series data management capabilities without significant upfront investment.

From a regional perspective, North America continues to dominate the global Time-Series Database for OT Data market, accounting for the largest share in 2024. The region benefits from a high concentration of technologically advanced industries, robust IT infrastructure, and early adoption of IoT and digitalization initiatives. Europe follows closely, driven by stringent regulatory requirements and a strong focus on industrial automation. The Asia Pacific region, meanwhile, is witnessing the fastest growth, fueled by rapid industrialization, expanding manufacturing sectors, and increasing investments in smart infrastructure projects across countries such as China, India, and Japan. As the adoption of time-series databases for OT data accelerates globally, regional markets are expected to experience differentiated growth trajectories based on industry maturity, technological readiness, and regulatory landscapes.

Database Type Analysis
<br /&
n
Verst-Maldaun Language Assessment (VMLA) Validation Process Database
narcis.nl
data.mendeley.com
Updated Dec 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verst, S (via Mendeley Data) (2020). Verst-Maldaun Language Assessment (VMLA) Validation Process Database [Dataset]. http://doi.org/10.17632/zjhfk7mm7v.3
Explore at:
Unique identifier
https://doi.org/10.17632/zjhfk7mm7v.3
Dataset updated
Dec 3, 2020
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Verst, S (via Mendeley Data)
Description
This paper drives the process of creating VMLA, a language test meant to be used during awake craniotomies. It focuses on step by step process and aims to help other developers to build their own assessment. This project was designed as a prospective study and registered in the Ethic Committee of Educational and Research Institute of Sirio Libanês Hospital. Ethics committee approval number: HSL 2018-37 / CAEE 90603318.9.0000.5461. Images were bought by Shutterstock.com and generated the following receipts: SSTK-0CA8F-1358 and SSTK-0235F-6FC2 VMLA is a neuropsychological assessment of language function, comprising object naming (ON) and semantic. Originally composed by 420 slides, validation among Brazilian native speakers left 368 figures plus fifteen other elements, like numbers, sentences and count. Validation was focused on educational level (EL), gender and age. Volunteers were tested in fourteen different states of Brazil. Cultural differences resulted in improvements to final Answer Template. EL and age were identified as factors that influenced VLMA assessment results. Highly educated volunteers performed better for both ON and semantic. People over 50 and 35 years old had better performance for ON and semantic, respectively. Further validation in unevaluated regions of Brazil, including more balanced number of males and females and more even distribution of age and EL, could confirm our statistical analysis. After validation, ON-VMLA was framed in batteries of 100 slides each, mixing images of six different complexity categories. Semantic-VMLA kept all the original seventy verbal and non-verbal combinations. The validation process resulted in increased confidence during intraoperative test application. We are now able to score and evaluate patient´s language deficits. Currently, VLMA fits its purpose of dynamical application and accuracy during language areas mapping. It is the first test targeted to Brazilians, representing much of our culture and collective imagery. Our experience may be of value to clinicians and researchers working with awake craniotomy who seek to develop their own language test.

The test is available for free use at www.vemotests.com (beginning in February, 2021)
Data_Sheet_1_Xenbase: Facilitating the Use of Xenopus to Model Human...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mardi J. Nenni; Malcolm E. Fisher; Christina James-Zorn; Troy J. Pells; Virgilio Ponferrada; Stanley Chu; Joshua D. Fortriede; Kevin A. Burns; Ying Wang; Vaneet S. Lotay; Dong Zhou Wang; Erik Segerdell; Praneet Chaturvedi; Kamran Karimi; Peter D. Vize; Aaron M. Zorn (2023). Data_Sheet_1_Xenbase: Facilitating the Use of Xenopus to Model Human Disease.ZIP [Dataset]. http://doi.org/10.3389/fphys.2019.00154.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fphys.2019.00154.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Mardi J. Nenni; Malcolm E. Fisher; Christina James-Zorn; Troy J. Pells; Virgilio Ponferrada; Stanley Chu; Joshua D. Fortriede; Kevin A. Burns; Ying Wang; Vaneet S. Lotay; Dong Zhou Wang; Erik Segerdell; Praneet Chaturvedi; Kamran Karimi; Peter D. Vize; Aaron M. Zorn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
At a fundamental level most genes, signaling pathways, biological functions and organ systems are highly conserved between man and all vertebrate species. Leveraging this conservation, researchers are increasingly using the experimental advantages of the amphibian Xenopus to model human disease. The online Xenopus resource, Xenbase, enables human disease modeling by curating the Xenopus literature published in PubMed and integrating these Xenopus data with orthologous human genes, anatomy, and more recently with links to the Online Mendelian Inheritance in Man resource (OMIM) and the Human Disease Ontology (DO). Here we review how Xenbase supports disease modeling and report on a meta-analysis of the published Xenopus research providing an overview of the different types of diseases being modeled in Xenopus and the variety of experimental approaches being used. Text mining of over 50,000 Xenopus research articles imported into Xenbase from PubMed identified approximately 1,000 putative disease- modeling articles. These articles were manually assessed and annotated with disease ontologies, which were then used to classify papers based on disease type. We found that Xenopus is being used to study a diverse array of disease with three main experimental approaches: cell-free egg extracts to study fundamental aspects of cellular and molecular biology, oocytes to study ion transport and channel physiology and embryo experiments focused on congenital diseases. We integrated these data into Xenbase Disease Pages to allow easy navigation to disease information on external databases. Results of this analysis will equip Xenopus researchers with a suite of experimental approaches available to model or dissect a pathological process. Ideally clinicians and basic researchers will use this information to foster collaborations necessary to interrogate the development and treatment of human diseases.
MIMIC-III - Deep Reinforcement Learning
kaggle.com
zip
Updated Apr 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asjad K (2022). MIMIC-III - Deep Reinforcement Learning [Dataset]. https://www.kaggle.com/datasets/asjad99/mimiciii
Explore at:
zip(11100065 bytes)Available download formats
Dataset updated
Apr 7, 2022
Authors
Asjad K
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Digitization of healthcare data along with algorithmic breakthroughts in AI will have a major impact on healthcare delivery in coming years. Its intresting to see application of AI to assist clinicians during patient treatment in a privacy preserving way. While scientific knowledge can help guide interventions, there remains a key need to quickly cut through the space of decision policies to find effective strategies to support patients during the care process.

Offline Reinforcement learning (also referred to as safe or batch reinforcement learning) is a promising sub-field of RL which provides us with a mechanism for solving real world sequential decision making problems where access to simulator is not available. Here we assume that learn a policy from fixed dataset of trajectories with further interaction with the environment(agent doesn't receive reward or punishment signal from the environment). It has shown that such an approach can leverage vast amount of existing logged data (in the form of previous interactions with the environment) and can outperform supervised learning approaches or heuristic based policies for solving real world - decision making problems. Offline RL algorithms when trained on sufficiently large and diverse offline datasets can produce close to optimal policies(ability to generalize beyond training data).

As Part of my PhD, research, I investigated the problem of developing a Clinical Decision Support System for Sepsis Management using Offline Deep Reinforcement Learning.

MIMIC-III ('Medical Information Mart for Intensive Care') is a large open-access anonymized single-center database which consists of comprehensive clinical data of 61,532 critical care admissions from 2001–2012 collected at a Boston teaching hospital. Dataset consists of 47 features (including demographics, vitals, and lab test results) on a cohort of sepsis patients who meet the sepsis-3 definition criteria.

we try to answer the following question:

Given a particular patient’s characteristics and physiological information at each time step as input, can our DeepRL approach, learn an optimal treatment policy that can prescribe the right intervention(e.g use of ventilator) to the patient each stage of the treatment process, in order to improve the final outcome(e.g patient mortality)?

we can use popular state-of-the-art algorithms such as Deep Q Learning(DQN), Double Deep Q Learning (DDQN), DDQN combined with BNC, Mixed Monte Carlo(MMC) and Persistent Advantage Learning (PAL). Using these methods we can train an RL policy to recommend optimum treatment path for a given patient.

Data acquisition, standard pre-processing and modelling details can be found here in Github repo: https://github.com/asjad99/MIMIC_RL_COACH
s
Biospecimen Research Database
scicrunch.org
rrid.site
+2more
Updated Oct 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Biospecimen Research Database [Dataset]. http://identifiers.org/RRID:SCR_001944
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001944
Dataset updated
Oct 17, 2019
Description
Free and publicly accessible literature database for peer-reviewed primary and review articles in the field of human Biospecimen Science. Each entry has been created by a Ph.D. level scientist to capture relevant parameters, pre-analytical factors, and original summaries of relevant results.
Use of electronic identification procedures in the last 12 months for...
data.europa.eu
html, unknown
Updated Jun 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE (2024). Use of electronic identification procedures in the last 12 months for private purposes by individuals, by degree of urbanisation of the area in which these individuals live, Slovenia, 2018 [Dataset]. https://data.europa.eu/data/datasets/surs2980420s
Explore at:
unknown, htmlAvailable download formats
Dataset updated
Jun 11, 2024
Dataset provided by
Government of Slovenia
Authors
VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE
Area covered
Slovenia
Description
This database automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source database entitled “Use of electronic identification procedures in the last 12 months for private purposes by individuals, by degree of urbanisation of the area in which these individuals live, Slovenia, 2018”.

Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.
🎓 365DS Practice Exams • People Analytics Dataset
kaggle.com
zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ísis Santos Costa (2025). 🎓 365DS Practice Exams • People Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/isissantoscosta/365ds-practice-exams-people-analytics-dataset
Explore at:
zip(61775349 bytes)Available download formats
Dataset updated
May 20, 2025
Authors
Ísis Santos Costa
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This dataset has been uploaded to Kaggle on the occasion of solving questions of the 365 Data Science • Practice Exams: SQL curriculum, a set of free resources designed to help test and elevate data science skills. The dataset consists of a synthetic, relational collection of data structured to simulate common employee and organizational data scenarios, ideal for practicing SQL queries and data analysis skills in a People Analytics context.

The dataset contains the following tables:

departments.csv: List of all company departments. dept_emp.csv: Historical and current assignments of employees to departments. dept_manager.csv: Historical and current assignments of employees as department managers. employees.csv: Core employee demographic information. employees.db: A SQLite database containing all the relational tables from the CSV files. salaries.csv: Historical salary records for employees. titles.csv: Historical job titles held by employees.

Usage

The dataset is ideal for practicing SQL queries and data analysis skills in a People Analytics context. It serves applications on both general Data Analytics, and also Time Series Analysis.

A practical application is presented on the 🎓 365DS Practice Exams • SQL notebook, which covers in detail answers to the questions of SQL Practice Exams 1, 2, and 3 on the 365DS platform, especially ilustrating the usage and the value of SQL procedures and functions.

Acknowledgements & Data Origin

This dataset has a rich lineage, originating from academic research and evolving through various formats to its current relational structure:

Original Authors

The foundational dataset was authored by Prof. Dr. Fusheng Wang 🔗 (then a PhD student at the University of California, Los Angeles - UCLA) and his advisor, Prof. Dr. Carlo Zaniolo 🔗 (UCLA). This work is primarily described in their paper:

Wang, F., & Zaniolo, C. (2004). Publishing and Querying the Histories of Archived Relational Databases in XML. DOI:10.1109/WISE.2003.1254473.

Relational Conversion

It was originally distributed as an .xml file. Giuseppe Maxia (known as @datacharmer on GitHub🔗 and LinkedIn🔗, as well as here on Kaggle) converted it into its relational form and subsequently distributed it as a .sql file, making it accessible for relational database use.

Kaggle Upload

This .sql version was then loaded to Kaggle as the « Employees Dataset » by Mirza Huzaifa🔗 on February 5th, 2023.
B
(Shapefiles - Validation countries) SEEDNet: A covariate-free multi-country...
borealisdata.ca
search.dataone.org
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir Hossein Darooneh; Jean-Luc Kortenaar; Celine M Goulart; Katie McLaughlin; Sean P Cornelius; Diego G Bassani (2025). (Shapefiles - Validation countries) SEEDNet: A covariate-free multi-country settlement-level database of epidemiological estimates for network analysis [Dataset]. http://doi.org/10.5683/SP3/R0DPTZ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/R0DPTZ
Dataset updated
Apr 25, 2025
Dataset provided by
Borealis
Authors
Amir Hossein Darooneh; Jean-Luc Kortenaar; Celine M Goulart; Katie McLaughlin; Sean P Cornelius; Diego G Bassani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Benin, Nigeria, Gabon, Senegal, Mali, Angola, Malawi, Cambodia, Zambia, Mozambique
Description
This folder includes the shapefiles for the 10 validation countries included in the manuscript. Abstract: The study of population health through network science holds high promise, but data sources that allow complete representation of populations are limited in low- and middle-income settings. Large national health surveys designed to gather nationally representative health and development data in low- and middle-income countries are promising sources of such data. Although they provide researchers, healthcare providers, and policymakers with valuable information, they are not designed to produce small-area estimates of health indicators, and the methods for producing these tend to rely on diverse and imperfect covariate data sources, have high data input requirements and are computationally demanding, limiting their use for network representations of populations. To reduce the sources of measurement error and allow efficient multi-country representation of populations as networks of human settlements here, we present a covariate-free multi-country method to estimate small-area health indicators using standardized georeferenced surveys. The approach utilizes interpolation via local inverse distance weighting. The estimates are compared to those obtained using a Bayesian Geostatistical Model and have been cross-validated. The estimates are aggregated into population settlements and identified using the Global Human Settlement Layer database. The method is fully automated, requiring a single standard georeferenced survey data source for mapping populations, eliminating the need for indicator or country-specific covariate selection by investigators. Efficient estimation is achieved by only computing values for human-occupied areas and adopting a logical aggregation of estimates into the complete range of settlement sizes. An open-access library of standardized georeferenced settlement-level datasets for 15 indicators and 10 countries was validated in this paper, as well as the code used to identify settlements and estimate indicators. The datasets are intended to be used as the basis for population health studies, and the library will continue to be expanded. The novel aspects include using harmonized input sources and estimation procedures across countries and the adoption of real-world units for population data aggregation, creating a specialized library of nodes that serve as a basis for network representations of population health in low- and middle-income countries.
Anime Images Dataset
kaggle.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Banana_Leopard (2023). Anime Images Dataset [Dataset]. https://www.kaggle.com/datasets/diraizel/anime-images-dataset
Explore at:
zip(910502838 bytes)Available download formats
Dataset updated
Jun 1, 2023
Authors
Banana_Leopard
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset contains anime images for 231 different anime, with approximately 380 image for each of those anime. Please note that you might need to clean the image directories a bit, since the images might contain merchandise and live-action photos in addition to the actual anime itself.

Scripts

If you'd like to take a look at the scripts used to make this dataset, you can find them on this GitHub repo.
Feel free to extend it, scrape your own images, etc. etc.

Inspiration

As a big anime fan, I found a lot of anime related datasets on Kaggle. I was however disappointed to find no dataset containing anime specific images for popular anime. Some other great datasets that I've been inspired by include: - Top 250 Anime 2023 - Anime Recommendations Database - Anime Recommendation Database 2020 - Anime Face Dataset - Safebooru - Anime Image Metadata

Process

You need a list of anime to scrape it. You can either:

Make your own list. This is what I do in the directory called "scraped_anime_list".

Use someone else's list. This is what I do in the directory called "kaggle_anime_list" and "top_anime_list".

To be honest, I wanted to make my own list. To make a list of anime, I used the python wrapper of the unofficial MAL (MyAnimeList) API called JikanPy. JikanPy scraped MAL.

Animes on MAL have a unique identifier called anime id, think of this as a unique number for each anime. This is supposed to be sequential but there are a lot of gaps from valid anime id to the next, which I discovered based on this post.

These IDs can go from 1 - 100,000 and maybe beyond. However, I decided to go through the anime ids one by one from 1-50,000 and retrive the id, rank and anime_name. This is what you will find in the folder called "scraped_anime_list". Note that I prefer using the English name of the anime if it exists, and if it doesn't I get the Japanese name. Please use this list to obtain the anime ids if you intend to scrape MAL yourself, it will save you a LOT of time.

I thought that someone else might've gone through and same process and voila, I found MyAnimeList Dataset on kaggle. I didn't want to wait for my scraper to finish scraping, so I decided to use this "anime_cleaned.csv" version of this list. The lists from this dataset are what you find in the "kaggle_anime_list" folder.

Cleaning anime names is a task in and of itself. Within the GitHub repo, refer to the file called "notes_and_todo.md" to look at all the cleaning troubles. I tried my best to remove all:

Anime Movies: Since you have for instance One Piece (the anime) and One Piece Movie 1, One Piece Movie 2, and so on.

Seasons: MAL is an anime ranker. Different anime seasons can show up on the list with different ranks. I retain the original anime name (the most basic ones, for instance, just "Gintama" instead of "Gintama Season 4".

Ultimately, I manually curated around 300 anime names, which reduced to 231 after removing duplicates, since after the curation, "Gintama" and "Gintama: Enchousen" would both be named "Gintama". This list with the duplicates is what you find in the file called "UsableAnimeList.xlsx" within the "top_anime_list" folder.

This list is then rid of the duplicates and used to scrape the image URLs for each anime found in the folder called "anime_img_urls".

These URLs are then used to scrape the anime images themselves, found in the folder called "anime_images".

Also the tags are only a guide, feel free to use this dataset for any Deep Learning task.

Sources

JikanPy

Useful MAL forum post

Google Image Search

Cover image and thumnail obtained from Safebooru

Facebook

Twitter

Click to copy link

Link copied

Cite

Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted Access [Dataset]. https://catalog.data.gov/dataset/hcup-nationwide-ambulatory-surgery-sample-nass-database-restricted-access

HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted Access

Explore at:

Dataset updated

Jul 16, 2025

Dataset provided by

Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
United States Department of Health and Human Serviceshttp://www.hhs.gov/

Description

The largest all-payer ambulatory surgery database in the United States, the Healthcare Cost and Utilization Project (HCUP) Nationwide Ambulatory Surgery Sample (NASS) produces national estimates of major ambulatory surgery encounters in hospital-owned facilities. Major ambulatory surgeries are defined as selected major therapeutic procedures that require the use of an operating room, penetrate or break the skin, and involve regional anesthesia, general anesthesia, or sedation to control pain (i.e., surgeries flagged as "narrow" in the HCUP Surgery Flag Software). Unweighted, the NASS contains approximately 9.0 million ambulatory surgery encounters each year and approximately 11.8 million ambulatory surgery procedures. Weighted, it estimates approximately 11.9 million ambulatory surgery encounters and 15.7 million ambulatory surgery procedures. Sampled from the HCUP State Ambulatory Surgery and Services Databases (SASD) and State Emergency Department Databases (SEDD) in order to capture both planned and emergent major ambulatory surgeries, the NASS can be used to examine selected ambulatory surgery utilization patterns. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels. The NASS contains clinical and resource-use information that is included in a typical hospital-owned facility record, including patient characteristics, clinical diagnostic and surgical procedure codes, disposition of patients, total charges, facility characteristics, and expected source of payment, regardless of payer, including patients covered by Medicaid, private insurance, and the uninsured. The NASS excludes data elements that could directly or indirectly identify individuals, hospitals, or states. The NASS is limited to encounters with at least one in-scope major ambulatory surgery on the record, performed at hospital-owned facilities. Procedures intended primarily for diagnostic purposes are not considered in-scope. Restricted access data files are available with a data use agreement and brief online security training.

Clear search

Close search

Google apps

Main menu

HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted...

HCUP National Inpatient Database

Abstract

Usage

Before Manuscript Submission

HCUP Online Tutorials

Important notes about the 2015 data

HCUP Nationwide Readmissions Database (NRD)- Restricted Access Files

Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open...

Food and Agriculture Biomass Input–Output (FABIO) database

Database of Free Tech Books

Database Free TechBooks

Dataset Details:

Features:

Usage:

Source:

Data Collection Method:

Notes:

Dataset Size:

NCSS Soil Characterization Database

Use of electronic identification procedures in the last 12 months for...

Data from: AneuX morphology database

Data Bundle for PyPSA-Eur: An Open Optimisation Model of the European...

O*NET Database

Time-series database for OT data Market Research Report 2033

Time-Series Database for OT Data Market Outlook

Database Type Analysis

Verst-Maldaun Language Assessment (VMLA) Validation Process Database

Data_Sheet_1_Xenbase: Facilitating the Use of Xenopus to Model Human...

MIMIC-III - Deep Reinforcement Learning

Biospecimen Research Database

Use of electronic identification procedures in the last 12 months for...

🎓 365DS Practice Exams • People Analytics Dataset

Usage

Acknowledgements & Data Origin

Original Authors

Relational Conversion

Kaggle Upload

(Shapefiles - Validation countries) SEEDNet: A covariate-free multi-country...

Anime Images Dataset

Context

Scripts

Inspiration

Process

Sources

HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted Access