100+ datasets found
  1. HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted...

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted Access [Dataset]. https://catalog.data.gov/dataset/hcup-nationwide-ambulatory-surgery-sample-nass-database-restricted-access
    Explore at:
    Dataset updated
    Jul 16, 2025
    Description

    The largest all-payer ambulatory surgery database in the United States, the Healthcare Cost and Utilization Project (HCUP) Nationwide Ambulatory Surgery Sample (NASS) produces national estimates of major ambulatory surgery encounters in hospital-owned facilities. Major ambulatory surgeries are defined as selected major therapeutic procedures that require the use of an operating room, penetrate or break the skin, and involve regional anesthesia, general anesthesia, or sedation to control pain (i.e., surgeries flagged as "narrow" in the HCUP Surgery Flag Software). Unweighted, the NASS contains approximately 9.0 million ambulatory surgery encounters each year and approximately 11.8 million ambulatory surgery procedures. Weighted, it estimates approximately 11.9 million ambulatory surgery encounters and 15.7 million ambulatory surgery procedures. Sampled from the HCUP State Ambulatory Surgery and Services Databases (SASD) and State Emergency Department Databases (SEDD) in order to capture both planned and emergent major ambulatory surgeries, the NASS can be used to examine selected ambulatory surgery utilization patterns. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels. The NASS contains clinical and resource-use information that is included in a typical hospital-owned facility record, including patient characteristics, clinical diagnostic and surgical procedure codes, disposition of patients, total charges, facility characteristics, and expected source of payment, regardless of payer, including patients covered by Medicaid, private insurance, and the uninsured. The NASS excludes data elements that could directly or indirectly identify individuals, hospitals, or states. The NASS is limited to encounters with at least one in-scope major ambulatory surgery on the record, performed at hospital-owned facilities. Procedures intended primarily for diagnostic purposes are not considered in-scope. Restricted access data files are available with a data use agreement and brief online security training.

  2. HCUP National Inpatient Database

    • redivis.com
    • stanford.redivis.com
    application/jsonl +7
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2025). HCUP National Inpatient Database [Dataset]. http://doi.org/10.57761/gr09-hq95
    Explore at:
    application/jsonl, csv, avro, arrow, parquet, stata, sas, spssAvailable download formats
    Dataset updated
    Sep 27, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Time period covered
    Jan 1, 2000 - Dec 31, 2022
    Description

    Abstract

    The NIS is the largest publicly available all-payer inpatient healthcare database designed to produce U.S. regional and national estimates of inpatient utilization, access, cost, quality, and outcomes. Unweighted, it contains data from around 7 million hospital stays each year. Weighted, it estimates around 35 million hospitalizations nationally. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality (AHRQ), HCUP data inform decision making at the national, State, and community levels.

    Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, uncommon treatments, and special populations.

    Usage

    %3Cu%3EDO NOT%3C/u%3E

    use this data without referring to the NIS Database Documentation, which includes:

    • Description of NIS Database
    • Restrictions on Use

    %3C!-- --%3E

    • Data Elements
    • Additional Resources for Data Elements
    • ICD-10-CM/PCS Data Included in the NIS Starting with 2015 (More details about this transition available here.)
    • Known Data Issues
    • NIS Supplemental Files
    • HCUP Tools: Labels and Formats
    • Obtaining HCUP Data

    %3C!-- --%3E

    Before Manuscript Submission

    %3Cu%3E%3Cstrong%3EAll manuscripts%3C/strong%3E%3C/u%3E

    (and other items you'd like to publish) %3Cu%3E%3Cstrong%3Emust be submitted to%3C/strong%3E%3C/u%3E

    %3Cu%3E%3Cstrong%3Ephsdatacore@stanford.edu%3C/strong%3E%3C/u%3E

    for approval prior to journal submission.

    We will check your cell sizes and citations.

    For more information about how to cite PHS and PHS datasets, please visit:

    https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

    You must also %3Cu%3E%3Cstrong%3Emake sure that your work meets all of the AHRQ (data owner) requirements for publishing%3C/strong%3E%3C/u%3E

    with HCUP data--listed at https://hcup-us.ahrq.gov/db/nation/nis/nischecklist.jsp

    HCUP Online Tutorials

    For additional assistance, AHRQ has created the HCUP Online Tutorial Series, a series of free, interactive courses which provide training on technical methods for conducting research with HCUP data. Topics include an HCUP Overview Course and these tutorials:

    • The HCUP Sampling Design tutorial is designed to help users learn how to account for sample design in their work with HCUP national (nationwide) databases. • The Producing National HCUP Estimates tutorial is designed to help users understand how the three national (nationwide) databases – the NIS, Nationwide Emergency Department Sample (NEDS), and Kids' Inpatient Database (KID) – can be used to produce national and regional estimates. HCUP 2020 NIS (8/22/22) 14 Introduction • The Calculating Standard Errors tutorial shows how to accurately determine the precision of the estimates produced from the HCUP nationwide databases. Users will learn two methods for calculating standard errors for estimates produced from the HCUP national (nationwide) databases. • The HCUP Multi-year Analysis tutorial presents solutions that may be necessary when conducting analyses that span multiple years of HCUP data. • The HCUP Software Tools Tutorial provides instructions on how to apply the AHRQ software tools to HCUP or other administrative databases.

    New tutorials are added periodically, and existing tutorials are updated when necessary. The Online Tutorial Series is located on the HCUP-US website at https://hcup-us.ahrq.gov/tech_assist/tutorials.jsp

    Important notes about the 2015 data

    In 2015, AHRQ restructured the data as described here:

    https://hcup-us.ahrq.gov/db/nation/nis/2015HCUPNationalInpatientSample.pdf

    Some key points:

    • For the 2015 data, all diagnosis and procedure data elements, including any data elements derived from diagnoses and procedures, were moved out of the Core File and into the Diagnosis and Procedure Groups Files.
    • Prior to 2015, and for Q1-3 of 2015, the DX1-30 and PR1-15 variables (which use ICD-9 codes) variables were used, but starting in Q4 of 2015, the I10_DX1-30 and I10_PR1-I10-15 (which use ICD-10 codes) were used. The best way to identify discharges for quarter 1-3 or quarter 4 is based on the value of the diagnosis version (DXVER); For quarters 1-3, DXVER has a value of 9; while for quarter 4, DXVER has a value of 10.
    • Some other variables also transitioned in Q4 of 2015. Please refer to the link above for more details.
    • Starting in 2016, the diagnosis and procedure information returned to the Core file. Additional detai
  3. HCUP Nationwide Readmissions Database (NRD)- Restricted Access Files

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality, Department of Health & Human Services (2023). HCUP Nationwide Readmissions Database (NRD)- Restricted Access Files [Dataset]. https://catalog.data.gov/dataset/healthcare-cost-and-utilization-project-nationwide-readmissions-database-nrd
    Explore at:
    Dataset updated
    Jul 26, 2023
    Description

    The Healthcare Cost and Utilization Project (HCUP) Nationwide Readmissions Database (NRD) is a unique and powerful database designed to support various types of analyses of national readmission rates for all payers and the uninsured. The NRD includes discharges for patients with and without repeat hospital visits in a year and those who have died in the hospital. Repeat stays may or may not be related. The criteria to determine the relationship between hospital admissions is left to the analyst using the NRD. This database addresses a large gap in health care data - the lack of nationally representative information on hospital readmissions for all ages. Outcomes of interest include national readmission rates, reasons for returning to the hospital for care, and the hospital costs for discharges with and without readmissions. Unweighted, the NRD contains data from approximately 18 million discharges each year. Weighted, it estimates roughly 35 million discharges. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels. The NRD is drawn from HCUP State Inpatient Databases (SID) containing verified patient linkage numbers that can be used to track a person across hospitals within a State, while adhering to strict privacy guidelines. The NRD is not designed to support regional, State-, or hospital-specific readmission analyses. The NRD contains more than 100 clinical and non-clinical data elements provided in a hospital discharge abstract. Data elements include but are not limited to: diagnoses, procedures, patient demographics (e.g., sex, age), expected source of payer, regardless of expected payer, including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as ‘no charge, discharge month, quarter, and year, total charges, length of stay, and data elements essential to readmission analyses. The NIS excludes data elements that could directly or indirectly identify individuals. Restricted access data files are available with a data use agreement and brief online security training.

  4. Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open...

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne (2023). Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform [Dataset]. http://doi.org/10.1371/journal.pone.0145791
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner’s Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.

  5. Z

    Food and Agriculture Biomass Input–Output (FABIO) database

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jun 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruckner, Martin; Kuschnig, Nikolas (2022). Food and Agriculture Biomass Input–Output (FABIO) database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2577066
    Explore at:
    Dataset updated
    Jun 8, 2022
    Dataset provided by
    Vienna University of Economics and Business
    Authors
    Bruckner, Martin; Kuschnig, Nikolas
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This data repository provides the Food and Agriculture Biomass Input Output (FABIO) database, a global set of multi-regional physical supply-use and input-output tables covering global agriculture and forestry.

    The work is based on mostly freely available data from FAOSTAT, IEA, EIA, and UN Comtrade/BACI. FABIO currently covers 191 countries + RoW, 118 processes and 125 commodities (raw and processed agricultural and food products) for 1986-2013. All R codes and auxilliary data are available on GitHub. For more information please refer to https://fabio.fineprint.global.

    The database consists of the following main components, in compressed .rds format:

    Z: the inter-commodity input-output matrix, displaying the relationships of intermediate use of each commodity in the production of each commodity, in physical units (tons). The matrix has 24000 rows and columns (125 commodities x 192 regions), and is available in two versions, based on the method to allocate inputs to outputs in production processes: Z_mass (mass allocation) and Z_value (value allocation). Note that the row sums of the Z matrix (= total intermediate use by commodity) are identical in both versions.

    Y: the final demand matrix, denoting the consumption of all 24000 commodities by destination country and final use category. There are six final use categories (yielding 192 x 6 = 1152 columns): 1) food use, 2) other use (non-food), 3) losses, 4) stock addition, 5) balancing, and 6) unspecified.

    X: the total output vector of all 24000 commodities. Total output is equal to the sum of intermediate and final use by commodity.

    L: the Leontief inverse, computed as (I – A)-1, where A is the matrix of input coefficients derived from Z and x. Again, there are two versions, depending on the underlying version of Z (L_mass and L_value).

    E: environmental extensions for each of the 24000 commodities, including four resource categories: 1) primary biomass extraction (in tons), 2) land use (in hectares), 3) blue water use (in m3)., and 4) green water use (in m3).

    mr_sup_mass/mr_sup_value: For each allocation method (mass/value), the supply table gives the physical supply quantity of each commodity by producing process, with processes in the rows (118 processes x 192 regions = 22656 rows) and commodities in columns (24000 columns).

    mr_use: the use table capture the quantities of each commodity (rows) used as an input in each process (columns).

    A description of the included countries and commodities (i.e. the rows and columns of the Z matrix) can be found in the auxiliary file io_codes.csv. Separate lists of the country sample (including ISO3 codes and continental grouping) and commodities (including moisture content) are given in the files regions.csv and items.csv, respectively. For information on the individual processes, see auxiliary file su_codes.csv. RDS files can be opened in R. Information on how to read these files can be obtained here: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/readRDS

    Except of X.rds, which contains a matrix, all variables are organized as lists, where each element contains a sparse matrix. Please note that values are always given in physical units, i.e. tonnes or head, as specified in items.csv. The suffixes value and mass only indicate the form of allocation chosen for the construction of the symmetric IO tables (for more details see Bruckner et al. 2019). Product, process and country classifications can be found in the file fabio_classifications.xlsx.

    Footprint results are not contained in the database but can be calculated, e.g. by using this script: https://github.com/martinbruckner/fabio_comparison/blob/master/R/fabio_footprints.R

    How to cite:

    To cite FABIO work please refer to this paper:

    Bruckner, M., Wood, R., Moran, D., Kuschnig, N., Wieland, H., Maus, V., Börner, J. 2019. FABIO – The Construction of the Food and Agriculture Input–Output Model. Environmental Science & Technology 53(19), 11302–11312. DOI: 10.1021/acs.est.9b03554

    License:

    This data repository is distributed under the CC BY-NC-SA 4.0 License. You are free to share and adapt the material for non-commercial purposes using proper citation. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. In case you are interested in a collaboration, I am happy to receive enquiries at martin.bruckner@wu.ac.at.

    Known issues:

    The underlying FAO data have been manipulated to the minimum extent necessary. Data filling and supply-use balancing, yet, required some adaptations. These are documented in the code and are also reflected in the balancing item in the final demand matrices. For a proper use of the database, I recommend to distribute the balancing item over all other uses proportionally and to do analyses with and without balancing to illustrate uncertainties.

  6. Database of Free Tech Books

    • kaggle.com
    zip
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farhan Ali (2025). Database of Free Tech Books [Dataset]. https://www.kaggle.com/datasets/farhanali097/database-of-free-tech-books
    Explore at:
    zip(43973 bytes)Available download formats
    Dataset updated
    Jan 15, 2025
    Authors
    Farhan Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Database Free TechBooks

    This dataset is a comprehensive collection of free tech books available on the web, specifically sourced from the FreeTechBooks platform. It includes details such as the names and URLs of various free textbooks, covering a wide range of topics including computer science, programming, data science, artificial intelligence, and more. The dataset is designed for educational purposes, providing easy access to high-quality, freely available technical resources.

    Dataset Details:

    • The dataset consists of two columns:

    • Name: The title of the book.

    • URL: A direct link to the page where the book can be accessed or downloaded for free.

    Features:

    • Comprehensive: Contains a collection of over 1200 free tech books.
    • Variety of Topics: Books span various domains such as:
    • Programming Languages: (Python, Java, C++)
    • Data Science & Machine Learning
    • Artificial Intelligence
    • Cybersecurity
    • Networking
    • Web Development
    • Cloud Computing
    • And much more.

    Usage:

    The dataset can be used for:

    • Educational research and learning.
    • Building recommendation systems for tech resources.
    • Analyzing trends in the availability of open-source learning materials.
    • Supporting the creation of educational tools and resources in tech-related fields.

    Source:

    • The data was scraped from the FreeTechBooks website, a platform that aggregates freely available textbooks on various technical topics.

    Data Collection Method:

    • The data was collected by iterating through 82 pages of the FreeTechBooks website, extracting the names andURLs of books listed under different topics. The dataset includes data for a total of 1200+ books.

    Notes:

    • All books listed are freely available and open to the public.
    • URLs lead to external sites where users can read or download the books.

    Dataset Size:

    • Number of rows: 1200+
    • Number of columns: 2 (Name, URL)
  7. NCSS Soil Characterization Database

    • catalog.data.gov
    • ngda-soils-geoplatform.hub.arcgis.com
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Conservation Service (2025). NCSS Soil Characterization Database [Dataset]. https://catalog.data.gov/dataset/ncss-soil-characterization-database-d2772
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
    Description

    The National Cooperative Soil Survey - Soil Characterization Database (NCSS-SCD) contains laboratory data for more than 65,000 locations (i.e. xy coordinates) throughout the United States and its Territories, and about 2,100 locations from other countries. It is a compilation of data from the Kellogg Soil Survey Laboratory (KSSL) and several cooperating laboratories. The data steward and distributor is the National Soil Survey Center (NSSC). Information contained within the database includes physical, chemical, biological, mineralogical, morphological, and mid infrared reflectance (MIR) soil measurements, as well a collection of calculated values. The intended use of the data is to support interpretations related to soil use and management. Data Usage Access to the data is provided via the following user interfaces: 1. Interactive Web Map 2. Lab Data Mart (LDM) for querying data and generating reports 3. Soil Data Access (SDA) web services for querying data 5. Direct download of the entire database in several formats Data at each location includes measurements at multiple depths (e.g. soil horizons). However, not all analyses have been conducted for each location and depth. Typically, a suite of measurements was collected based upon assumed or known conditions regarding the soil being analyzed. For example, soils of arid environments are routinely analyzed for salts and carbonates as part of the standard analysis suite. Standard morphological soil descriptions are available for about 60,000 of these locations. Mid-infrared (MIR) spectroscopy is available for about 7,000 locations. Soil fertility measurements, such as those made by Agricultural Experiment Stations, were not made. Most of the data were obtained over the last 40 years, with about 4,000 locations before 1960, 25,000 from 1960-1990, 27,000 from 1990-2010, and 13,000 from 2010 to 2021. Generally, the number of measurements recorded per location has increased over time. Typically, the data were collected to represent a soil series or map unit component concept. They may also have been sampled to determine the range of variation within a given landscape. Although strict quality-control measures are applied, the NSSC does not warrant that the data are error free. Also, in some cases the measurements are not within the applicability range of the laboratory methods. For example, dispersion of clay is incomplete in some soils by the standard method used for determining particle-size distribution. Soils producing incomplete dispersion include those that are derived from volcanic materials or that have a high content of iron oxides, gypsum, carbonates, or other cementing materials. Also note that determination of clay minerals by x-ray diffraction is relative. Measurements of very high or very low quantities by any method are not very precise. Other measurements have other limitations in some kinds of soils. Such data are retained in the database for research purposes. Also, some of the data for were obtained from cooperating laboratories within the NCSS. The accuracy of the location coordinates has not been quantified but can be inferred from the precision of their decimal degrees and the presence of a map datum. Some older records may correspond to a county centroid. When the map datum is missing it can be assumed that data prior to 1990 was recorded using NAD27 and with WGS84 after 1995. For detailed information about methods used in the KSSL and other laboratories refer to "Soil Survey Investigation Report No. 42". For information on the application of laboratory data, refer to "Soil Survey Investigation Report No. 45". If you are unfamiliar with any terms or methods feel free to consult your NRCS State Soil Scientist. Terms of Use This dataset is not designed for use as a primary regulatory tool in permitting or citing decisions but may be used as a reference source. This is public information and may be interpreted by organizations, agencies, units of government, or others based on needs; however, they are responsible for the appropriate application. Federal, State, or local regulatory bodies are not to reassign to the Natural Resources Conservation Service or the National Cooperative Soil Survey any authority for the decisions that they make. The Natural Resources Conservation Service will not perform any evaluations of these data for purposes related solely to State or local regulatory programs.

  8. Use of electronic identification procedures in the last 12 months for...

    • data.europa.eu
    html, unknown
    Updated Oct 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE (2021). Use of electronic identification procedures in the last 12 months for private purposes by individuals, by status of activity, Slovenia, 2018 [Dataset]. https://data.europa.eu/data/datasets/surs2980415s
    Explore at:
    html, unknownAvailable download formats
    Dataset updated
    Oct 12, 2021
    Dataset provided by
    Government of Slovenia
    Authors
    VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE
    Area covered
    Slovenia
    Description

    This database automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source database entitled “Use of electronic identification procedures in the last 12 months for private purposes by individuals, by status of activity, Slovenia, 2018”.

    Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.

  9. Z

    Data from: AneuX morphology database

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juchler, Norman; Bijlenga, Philippe; Hirsch, Sven (2024). AneuX morphology database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6678441
    Explore at:
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    Zurich University of Applied Sciences
    Geneva University Hospital and Faculty of Medicine
    Authors
    Juchler, Norman; Bijlenga, Philippe; Hirsch, Sven
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The AneuX morphology database is an open-access, multi-centric database containing 3D geometries of 750 intracranial aneurysms curated in the context of the AneuX project (2015-2020). The database combines data from three different projects (AneuX, @neurIST and Aneurisk) standardized using a single processing pipeline. The code to process and view the 3D geometries is provided under this public repository: https://github.com/hirsch-lab/aneuxdb

    The database at a glance:

    750 aneurysm domes (surface meshes)

    668 vessel trees (surface meshes)

    3 different data sources (AneuX, @neurIST, Aneurisk)

    3 different mesh resolutions (original resolution, 0.01mm² and 0.05mm² target cell area)

    4 different cut configurations (including planar and free-form cuts)

    5 clinical parameters (aneurysm rupture status, location and side; patient age and sex)

    170 pre-computed morphometric indices for each of the aneurysm domes

    Terms of use / License:

    The data is provided "as is", without any warranties of any kind. It is provided under the CC BY-NC 4.0 license, with the additional requirements (A) that the use of the database is declared using the sentence below (you can omit the URLs), and (B) to cite our peer reviewed journal article below.

    [This project] uses data from the AneuX morphology database, an open-access, multi-centric database combining data from three European projects: AneuX project (www.aneux.ch), @neurIST project (www.aneurist.org) and Aneurisk (http://ecm2.mathcs.emory.edu/aneuriskweb/index).

    In accordance with the terms of use, please cite the following journal article when referring to our dataset.

    Juchler, Schilling, Bijlenga, Kurtcuoglu, Hirsch. Shape trumps size: Image-based morphological analysis reveals that the 3D shape discriminates intracranial aneurysm disease status better than aneurysm size. Frontiers in Neurology (2022), DOI: 10.3389/fneur.2022.809391

    The AneuX morphology database contains parts (geometric models, clinical data) of the publicly available Aneurisk dataset released under the CC BY-NC 3.0 license (which is compatible with the license used here). Like all geometric models in this database, the Aneurisk models were preprocessed using the same procedure. See here for a description.

    Funding and authorizations

    The AneuX project

    Data collection in accordance with @neurIST protocol v5

    Ethics autorisations Geneva BASEC PB_2018‐00073

    Supported by the grant from the Swiss SystemsX.ch initiative, evaluated by the Swiss National Science Foundation

    @neurIST project

    Data collection in accordance with @neurIST protocol v1

    Ethics autorisations AmsterdamMEC 07-159, Barcelona2007-3507, Geneva CER 07-056, Oxfordshire REC AQ05/Q1604/162, Pècs RREC MC P 06 Jul 2007

    Supported by the 6th framework program of the European Commission FP6-IST-2004–027703

    Acknowledgments:

    The AneuX project was supported by SystemsX.ch, and evaluated by the Swiss National Science Foundation (SNSF). This database would not be possible without the support of the Zurich University of Applied Sciences (ZHAW) and University Hospitals Geneva (HUG).

    We thank the following people for their support and contributions to the AneuX morphology database.

    From the AneuX project (in alphabetical order):

    Daniel Rüfenacht

    Diana Sapina

    Isabel Wanke

    Karl Lovblad

    Karl Schaller

    Olivier Brina

    Paolo Machi

    Rafik Ouared

    Sabine Schilling

    Sandrine Morel

    Ueli Ebnöther

    Vartan Kurtucuoglu

    Vitor Mendes Pereira

    Zolt Kuscàr

    From the @neurIST project (in alphabetical order)

    Alan Waterworth

    Alberto Marzo

    Alejandro Frangi

    Alison Clarke

    Ana Marcos Gonzalez

    Ana Paula Narata

    Antonio Arbona

    Bawarjan Schatlo

    Daniel Rüfenacht

    Elio Vivas

    Ferenc Kover

    Gulam Zilani

    Guntram Berti

    Guy Lonsdale

    Istvan Hudak

    James Byrne

    Jimison Iavindrasana

    Jordi Blasco

    Juan Macho

    Julia Yarnold

    Mari Cruz Villa Uriol

    Martin Hofmann-Apitius

    Max Jägersberg

    Miriam CJM Sturkenboom

    Nicolas Roduit

    Pankaj Singh

    Patricia Lawford

    Paul Summers

    Peer Hasselmeyer

    Peter Bukovics

    Rod Hose

    Roelof Risselada

    Stuart Coley

    Tamas Doczi

    Teresa Sola

    Umang Patel

    From the Aneurisk project (list from AneuriskWeb, in alphabetical order):

    Alessandro Veneziani

    Andrea Remuzzi

    Edoardo Boccardi

    Francesco. Migliavacca

    Gabriele Dubini

    Laura Sangalli

    Luca Antiga

    Maria Piccinelli

    Piercesare Secchi

    Simone Vantini

    Susanna Bacigaluppi

    Tiziano Passerini

  10. Data Bundle for PyPSA-Eur: An Open Optimisation Model of the European...

    • zenodo.org
    • data.niaid.nih.gov
    xz, zip
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Fabian Neumann; Fabian Neumann; Tom Brown; Iegor Riepin; Bobby Xiong; Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Tom Brown; Iegor Riepin; Bobby Xiong (2024). Data Bundle for PyPSA-Eur: An Open Optimisation Model of the European Transmission System [Dataset]. http://doi.org/10.5281/zenodo.12760663
    Explore at:
    zip, xzAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Fabian Neumann; Fabian Neumann; Tom Brown; Iegor Riepin; Bobby Xiong; Jonas Hörsch; Fabian Hofmann; David Schlachtberger; Philipp Glaum; Tom Brown; Iegor Riepin; Bobby Xiong
    Description

    PyPSA-Eur is an open model dataset of the European power system at the transmission network level that covers the full ENTSO-E area. It can be built using the code provided at https://github.com/PyPSA/PyPSA-eur.

    It contains alternating current lines at and above 220 kV voltage level and all high voltage direct current lines, substations, an open database of conventional power plants, time series for electrical demand and variable renewable generator availability, and geographic potentials for the expansion of wind and solar power.

    Not all data dependencies are shipped with the code repository, since git is not suited for handling large changing files. Instead we provide separate data bundles to be downloaded and extracted as noted in the documentation.

    This is the full data bundle to be used for rigorous research. It includes large bathymetry and natural protection area datasets.

    While the code in PyPSA-Eur is released as free software under the MIT, different licenses and terms of use apply to the various input data, which are summarised below:

    corine/*

    Access to data is based on a principle of full, open and free access as established by the Copernicus data and information policy Regulation (EU) No 1159/2013 of 12 July 2013. This regulation establishes registration and licensing conditions for GMES/Copernicus users and can be found here. Free, full and open access to this data set is made on the conditions that:

    • When distributing or communicating Copernicus dedicated data and Copernicus service information to the public, users shall inform the public of the source of that data and information.

    • Users shall make sure not to convey the impression to the public that the user's activities are officially endorsed by the Union.

    • Where that data or information has been adapted or modified, the user shall clearly state this.

    • The data remain the sole property of the European Union. Any information and data produced in the framework of the action shall be the sole property of the European Union. Any communication and publication by the beneficiary shall acknowledge that the data were produced “with funding by the European Union”.

    eez/*

    Marine Regions’ products are licensed under CC-BY-NC-SA. Please contact us for other uses of the Licensed Material beyond license terms. We kindly request our users not to make our products available for download elsewhere and to always refer to marineregions.org for the most up-to-date products and services.

    natura/*

    EEA standard re-use policy: unless otherwise indicated, re-use of content on the EEA website for commercial or non-commercial purposes is permitted free of charge, provided that the source is acknowledged (https://www.eea.europa.eu/legal/copyright). Copyright holder: Directorate-General for Environment (DG ENV).

    naturalearth/*

    All versions of Natural Earth raster + vector map data found on this website are in the public domain. You may use the maps in any manner, including modifying the content and design, electronic dissemination, and offset printing. The primary authors, Tom Patterson and Nathaniel Vaughn Kelso, and all other contributors renounce all financial claim to the maps and invites you to use them for personal, educational, and commercial purposes.

    No permission is needed to use Natural Earth. Crediting the authors is unnecessary.

    NUTS_2013_60M_SH/*

    In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:

    1. The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").

    2. The permission to use the data is granted on condition that: the data will not be used for commercial purposes; the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page.

    gebco/GEBCO_2014_2D.nc

    The GEBCO Grid is placed in the public domain and may be used free of charge. Use of the GEBCO Grid indicates that the user accepts the conditions of use and disclaimer information given below.

    Users are free to:

    • Copy, publish, distribute and transmit The GEBCO Grid

    • Adapt The GEBCO Grid

    • Commercially exploit The GEBCO Grid, by, for example, combining it with other information, or by including it in their own product or application

    Users must:

    • Acknowledge the source of The GEBCO Grid. A suitable form of attribution is given in the documentation that accompanies The GEBCO Grid.

    • Not use The GEBCO Grid in a way that suggests any official status or that GEBCO, or the IHO or IOC, endorses any particular application of The GEBCO Grid.

    • Not mislead others or misrepresent The GEBCO Grid or its source.

    je-e-21.03.02.xls

    Information on the websites of the Federal Authorities is accessible to the public. Downloading, copying or integrating content (texts, tables, graphics, maps, photos or any other data) does not entail any transfer of rights to the content.

    Copyright and any other rights relating to content available on the websites of the Federal Authorities are the exclusive property of the Federal Authorities or of any other expressly mentioned owners.

    Any reproduction requires the prior written consent of the copyright holder. The source of the content (statistical results) should always be given.

  11. O*NET Database

    • onetcenter.org
    excel, mysql, oracle +2
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for O*NET Development, O*NET Database [Dataset]. https://www.onetcenter.org/database.html
    Explore at:
    oracle, sql server, text, mysql, excelAvailable download formats
    Dataset provided by
    Occupational Information Network
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Dataset funded by
    US Department of Labor, Employment and Training Administration
    Description

    The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.

    Data content areas include:

    • Worker Characteristics (e.g., Abilities, Interests, Work Styles)
    • Worker Requirements (e.g., Education, Knowledge, Skills)
    • Experience Requirements (e.g., On-the-Job Training, Work Experience)
    • Occupational Requirements (e.g., Detailed Work Activities, Work Context)
    • Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)

  12. G

    Time-series database for OT data Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Time-series database for OT data Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/time-series-database-for-ot-data-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Oct 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Time-Series Database for OT Data Market Outlook



    According to our latest research, the global Time-Series Database for OT Data market size reached USD 1.84 billion in 2024, driven by increasing adoption of IoT and Industry 4.0 initiatives across operational technology (OT) environments. The market is expanding at a robust CAGR of 15.2%, and is forecasted to reach USD 5.18 billion by 2033. This growth is primarily propelled by the escalating need for real-time data analytics and process optimization in critical industries such as manufacturing, energy, and transportation, which are leveraging time-series databases to efficiently store, process, and analyze massive volumes of time-stamped data generated by OT systems.



    A significant growth factor in the Time-Series Database for OT Data market is the rapid digital transformation occurring across traditional industrial sectors. As organizations strive to modernize their operations, there is a marked increase in the deployment of smart sensors, connected devices, and automation solutions. These advancements generate vast streams of time-stamped data, necessitating robust, scalable, and high-performance time-series databases capable of handling the unique requirements of OT environments. The integration of advanced analytics and artificial intelligence (AI) with time-series databases further enhances their value proposition, enabling predictive maintenance, anomaly detection, and real-time decision-making, which are critical for maximizing operational efficiency and minimizing downtime.



    Another critical driver is the growing emphasis on predictive maintenance and asset management. Industrial companies are shifting from reactive to proactive maintenance strategies to reduce unplanned outages and extend asset lifecycles. Time-series databases play a pivotal role in this transition by enabling the continuous collection, storage, and analysis of sensor data from machinery, equipment, and infrastructure. The ability to detect patterns, trends, and anomalies in real-time empowers organizations to schedule maintenance activities precisely when needed, thereby reducing costs and improving overall productivity. This trend is particularly pronounced in sectors such as energy & utilities, oil & gas, and transportation, where equipment reliability and uptime are paramount.



    Furthermore, the increasing adoption of cloud-based solutions is accelerating the growth of the Time-Series Database for OT Data market. Cloud deployment offers enhanced scalability, flexibility, and cost-efficiency, making it an attractive option for organizations seeking to manage large volumes of time-series data without the burden of maintaining on-premises infrastructure. Cloud-based time-series databases facilitate seamless integration with other cloud-native analytics tools and platforms, supporting advanced use cases such as remote monitoring, process optimization, and cross-site data aggregation. This shift is also fostering greater adoption among small and medium enterprises (SMEs), which can now leverage enterprise-grade time-series data management capabilities without significant upfront investment.



    From a regional perspective, North America continues to dominate the global Time-Series Database for OT Data market, accounting for the largest share in 2024. The region benefits from a high concentration of technologically advanced industries, robust IT infrastructure, and early adoption of IoT and digitalization initiatives. Europe follows closely, driven by stringent regulatory requirements and a strong focus on industrial automation. The Asia Pacific region, meanwhile, is witnessing the fastest growth, fueled by rapid industrialization, expanding manufacturing sectors, and increasing investments in smart infrastructure projects across countries such as China, India, and Japan. As the adoption of time-series databases for OT data accelerates globally, regional markets are expected to experience differentiated growth trajectories based on industry maturity, technological readiness, and regulatory landscapes.





    Database Type Analysis

    <br /&

  13. n

    Verst-Maldaun Language Assessment (VMLA) Validation Process Database

    • narcis.nl
    • data.mendeley.com
    Updated Dec 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verst, S (via Mendeley Data) (2020). Verst-Maldaun Language Assessment (VMLA) Validation Process Database [Dataset]. http://doi.org/10.17632/zjhfk7mm7v.3
    Explore at:
    Dataset updated
    Dec 3, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Verst, S (via Mendeley Data)
    Description

    This paper drives the process of creating VMLA, a language test meant to be used during awake craniotomies. It focuses on step by step process and aims to help other developers to build their own assessment. This project was designed as a prospective study and registered in the Ethic Committee of Educational and Research Institute of Sirio Libanês Hospital. Ethics committee approval number: HSL 2018-37 / CAEE 90603318.9.0000.5461. Images were bought by Shutterstock.com and generated the following receipts: SSTK-0CA8F-1358 and SSTK-0235F-6FC2 VMLA is a neuropsychological assessment of language function, comprising object naming (ON) and semantic. Originally composed by 420 slides, validation among Brazilian native speakers left 368 figures plus fifteen other elements, like numbers, sentences and count. Validation was focused on educational level (EL), gender and age. Volunteers were tested in fourteen different states of Brazil. Cultural differences resulted in improvements to final Answer Template. EL and age were identified as factors that influenced VLMA assessment results. Highly educated volunteers performed better for both ON and semantic. People over 50 and 35 years old had better performance for ON and semantic, respectively. Further validation in unevaluated regions of Brazil, including more balanced number of males and females and more even distribution of age and EL, could confirm our statistical analysis. After validation, ON-VMLA was framed in batteries of 100 slides each, mixing images of six different complexity categories. Semantic-VMLA kept all the original seventy verbal and non-verbal combinations. The validation process resulted in increased confidence during intraoperative test application. We are now able to score and evaluate patient´s language deficits. Currently, VLMA fits its purpose of dynamical application and accuracy during language areas mapping. It is the first test targeted to Brazilians, representing much of our culture and collective imagery. Our experience may be of value to clinicians and researchers working with awake craniotomy who seek to develop their own language test.

    The test is available for free use at www.vemotests.com (beginning in February, 2021)

  14. Data_Sheet_1_Xenbase: Facilitating the Use of Xenopus to Model Human...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mardi J. Nenni; Malcolm E. Fisher; Christina James-Zorn; Troy J. Pells; Virgilio Ponferrada; Stanley Chu; Joshua D. Fortriede; Kevin A. Burns; Ying Wang; Vaneet S. Lotay; Dong Zhou Wang; Erik Segerdell; Praneet Chaturvedi; Kamran Karimi; Peter D. Vize; Aaron M. Zorn (2023). Data_Sheet_1_Xenbase: Facilitating the Use of Xenopus to Model Human Disease.ZIP [Dataset]. http://doi.org/10.3389/fphys.2019.00154.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Mardi J. Nenni; Malcolm E. Fisher; Christina James-Zorn; Troy J. Pells; Virgilio Ponferrada; Stanley Chu; Joshua D. Fortriede; Kevin A. Burns; Ying Wang; Vaneet S. Lotay; Dong Zhou Wang; Erik Segerdell; Praneet Chaturvedi; Kamran Karimi; Peter D. Vize; Aaron M. Zorn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    At a fundamental level most genes, signaling pathways, biological functions and organ systems are highly conserved between man and all vertebrate species. Leveraging this conservation, researchers are increasingly using the experimental advantages of the amphibian Xenopus to model human disease. The online Xenopus resource, Xenbase, enables human disease modeling by curating the Xenopus literature published in PubMed and integrating these Xenopus data with orthologous human genes, anatomy, and more recently with links to the Online Mendelian Inheritance in Man resource (OMIM) and the Human Disease Ontology (DO). Here we review how Xenbase supports disease modeling and report on a meta-analysis of the published Xenopus research providing an overview of the different types of diseases being modeled in Xenopus and the variety of experimental approaches being used. Text mining of over 50,000 Xenopus research articles imported into Xenbase from PubMed identified approximately 1,000 putative disease- modeling articles. These articles were manually assessed and annotated with disease ontologies, which were then used to classify papers based on disease type. We found that Xenopus is being used to study a diverse array of disease with three main experimental approaches: cell-free egg extracts to study fundamental aspects of cellular and molecular biology, oocytes to study ion transport and channel physiology and embryo experiments focused on congenital diseases. We integrated these data into Xenbase Disease Pages to allow easy navigation to disease information on external databases. Results of this analysis will equip Xenopus researchers with a suite of experimental approaches available to model or dissect a pathological process. Ideally clinicians and basic researchers will use this information to foster collaborations necessary to interrogate the development and treatment of human diseases.

  15. MIMIC-III - Deep Reinforcement Learning

    • kaggle.com
    zip
    Updated Apr 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asjad K (2022). MIMIC-III - Deep Reinforcement Learning [Dataset]. https://www.kaggle.com/datasets/asjad99/mimiciii
    Explore at:
    zip(11100065 bytes)Available download formats
    Dataset updated
    Apr 7, 2022
    Authors
    Asjad K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Digitization of healthcare data along with algorithmic breakthroughts in AI will have a major impact on healthcare delivery in coming years. Its intresting to see application of AI to assist clinicians during patient treatment in a privacy preserving way. While scientific knowledge can help guide interventions, there remains a key need to quickly cut through the space of decision policies to find effective strategies to support patients during the care process.

    Offline Reinforcement learning (also referred to as safe or batch reinforcement learning) is a promising sub-field of RL which provides us with a mechanism for solving real world sequential decision making problems where access to simulator is not available. Here we assume that learn a policy from fixed dataset of trajectories with further interaction with the environment(agent doesn't receive reward or punishment signal from the environment). It has shown that such an approach can leverage vast amount of existing logged data (in the form of previous interactions with the environment) and can outperform supervised learning approaches or heuristic based policies for solving real world - decision making problems. Offline RL algorithms when trained on sufficiently large and diverse offline datasets can produce close to optimal policies(ability to generalize beyond training data).

    As Part of my PhD, research, I investigated the problem of developing a Clinical Decision Support System for Sepsis Management using Offline Deep Reinforcement Learning.

    MIMIC-III ('Medical Information Mart for Intensive Care') is a large open-access anonymized single-center database which consists of comprehensive clinical data of 61,532 critical care admissions from 2001–2012 collected at a Boston teaching hospital. Dataset consists of 47 features (including demographics, vitals, and lab test results) on a cohort of sepsis patients who meet the sepsis-3 definition criteria.

    we try to answer the following question:

    Given a particular patient’s characteristics and physiological information at each time step as input, can our DeepRL approach, learn an optimal treatment policy that can prescribe the right intervention(e.g use of ventilator) to the patient each stage of the treatment process, in order to improve the final outcome(e.g patient mortality)?

    we can use popular state-of-the-art algorithms such as Deep Q Learning(DQN), Double Deep Q Learning (DDQN), DDQN combined with BNC, Mixed Monte Carlo(MMC) and Persistent Advantage Learning (PAL). Using these methods we can train an RL policy to recommend optimum treatment path for a given patient.

    Data acquisition, standard pre-processing and modelling details can be found here in Github repo: https://github.com/asjad99/MIMIC_RL_COACH

  16. s

    Biospecimen Research Database

    • scicrunch.org
    • rrid.site
    • +2more
    Updated Oct 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Biospecimen Research Database [Dataset]. http://identifiers.org/RRID:SCR_001944
    Explore at:
    Dataset updated
    Oct 17, 2019
    Description

    Free and publicly accessible literature database for peer-reviewed primary and review articles in the field of human Biospecimen Science. Each entry has been created by a Ph.D. level scientist to capture relevant parameters, pre-analytical factors, and original summaries of relevant results.

  17. Use of electronic identification procedures in the last 12 months for...

    • data.europa.eu
    html, unknown
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE (2024). Use of electronic identification procedures in the last 12 months for private purposes by individuals, by degree of urbanisation of the area in which these individuals live, Slovenia, 2018 [Dataset]. https://data.europa.eu/data/datasets/surs2980420s
    Explore at:
    unknown, htmlAvailable download formats
    Dataset updated
    Jun 11, 2024
    Dataset provided by
    Government of Slovenia
    Authors
    VLADA REPUBLIKE SLOVENIJE STATISTIČNI URAD REPUBLIKE SLOVENIJE
    Area covered
    Slovenia
    Description

    This database automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source database entitled “Use of electronic identification procedures in the last 12 months for private purposes by individuals, by degree of urbanisation of the area in which these individuals live, Slovenia, 2018”.

    Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.

  18. 🎓 365DS Practice Exams • People Analytics Dataset

    • kaggle.com
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ísis Santos Costa (2025). 🎓 365DS Practice Exams • People Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/isissantoscosta/365ds-practice-exams-people-analytics-dataset
    Explore at:
    zip(61775349 bytes)Available download formats
    Dataset updated
    May 20, 2025
    Authors
    Ísis Santos Costa
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This dataset has been uploaded to Kaggle on the occasion of solving questions of the 365 Data Science • Practice Exams: SQL curriculum, a set of free resources designed to help test and elevate data science skills. The dataset consists of a synthetic, relational collection of data structured to simulate common employee and organizational data scenarios, ideal for practicing SQL queries and data analysis skills in a People Analytics context.

    The dataset contains the following tables:

    departments.csv: List of all company departments. dept_emp.csv: Historical and current assignments of employees to departments. dept_manager.csv: Historical and current assignments of employees as department managers. employees.csv: Core employee demographic information. employees.db: A SQLite database containing all the relational tables from the CSV files. salaries.csv: Historical salary records for employees. titles.csv: Historical job titles held by employees.

    Usage

    The dataset is ideal for practicing SQL queries and data analysis skills in a People Analytics context. It serves applications on both general Data Analytics, and also Time Series Analysis.

    A practical application is presented on the 🎓 365DS Practice Exams • SQL notebook, which covers in detail answers to the questions of SQL Practice Exams 1, 2, and 3 on the 365DS platform, especially ilustrating the usage and the value of SQL procedures and functions.

    Acknowledgements & Data Origin

    This dataset has a rich lineage, originating from academic research and evolving through various formats to its current relational structure:

    Original Authors

    The foundational dataset was authored by Prof. Dr. Fusheng Wang 🔗 (then a PhD student at the University of California, Los Angeles - UCLA) and his advisor, Prof. Dr. Carlo Zaniolo 🔗 (UCLA). This work is primarily described in their paper:

    Relational Conversion

    It was originally distributed as an .xml file. Giuseppe Maxia (known as @datacharmer on GitHub🔗 and LinkedIn🔗, as well as here on Kaggle) converted it into its relational form and subsequently distributed it as a .sql file, making it accessible for relational database use.

    Kaggle Upload

    This .sql version was then loaded to Kaggle as the « Employees Dataset » by Mirza Huzaifa🔗 on February 5th, 2023.

  19. B

    (Shapefiles - Validation countries) SEEDNet: A covariate-free multi-country...

    • borealisdata.ca
    • search.dataone.org
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Hossein Darooneh; Jean-Luc Kortenaar; Celine M Goulart; Katie McLaughlin; Sean P Cornelius; Diego G Bassani (2025). (Shapefiles - Validation countries) SEEDNet: A covariate-free multi-country settlement-level database of epidemiological estimates for network analysis [Dataset]. http://doi.org/10.5683/SP3/R0DPTZ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Borealis
    Authors
    Amir Hossein Darooneh; Jean-Luc Kortenaar; Celine M Goulart; Katie McLaughlin; Sean P Cornelius; Diego G Bassani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Benin, Nigeria, Gabon, Senegal, Mali, Angola, Malawi, Cambodia, Zambia, Mozambique
    Description

    This folder includes the shapefiles for the 10 validation countries included in the manuscript. Abstract: The study of population health through network science holds high promise, but data sources that allow complete representation of populations are limited in low- and middle-income settings. Large national health surveys designed to gather nationally representative health and development data in low- and middle-income countries are promising sources of such data. Although they provide researchers, healthcare providers, and policymakers with valuable information, they are not designed to produce small-area estimates of health indicators, and the methods for producing these tend to rely on diverse and imperfect covariate data sources, have high data input requirements and are computationally demanding, limiting their use for network representations of populations. To reduce the sources of measurement error and allow efficient multi-country representation of populations as networks of human settlements here, we present a covariate-free multi-country method to estimate small-area health indicators using standardized georeferenced surveys. The approach utilizes interpolation via local inverse distance weighting. The estimates are compared to those obtained using a Bayesian Geostatistical Model and have been cross-validated. The estimates are aggregated into population settlements and identified using the Global Human Settlement Layer database. The method is fully automated, requiring a single standard georeferenced survey data source for mapping populations, eliminating the need for indicator or country-specific covariate selection by investigators. Efficient estimation is achieved by only computing values for human-occupied areas and adopting a logical aggregation of estimates into the complete range of settlement sizes. An open-access library of standardized georeferenced settlement-level datasets for 15 indicators and 10 countries was validated in this paper, as well as the code used to identify settlements and estimate indicators. The datasets are intended to be used as the basis for population health studies, and the library will continue to be expanded. The novel aspects include using harmonized input sources and estimation procedures across countries and the adoption of real-world units for population data aggregation, creating a specialized library of nodes that serve as a basis for network representations of population health in low- and middle-income countries.

  20. Anime Images Dataset

    • kaggle.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Banana_Leopard (2023). Anime Images Dataset [Dataset]. https://www.kaggle.com/datasets/diraizel/anime-images-dataset
    Explore at:
    zip(910502838 bytes)Available download formats
    Dataset updated
    Jun 1, 2023
    Authors
    Banana_Leopard
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains anime images for 231 different anime, with approximately 380 image for each of those anime. Please note that you might need to clean the image directories a bit, since the images might contain merchandise and live-action photos in addition to the actual anime itself.

    Scripts

    If you'd like to take a look at the scripts used to make this dataset, you can find them on this GitHub repo.
    Feel free to extend it, scrape your own images, etc. etc.

    Inspiration

    As a big anime fan, I found a lot of anime related datasets on Kaggle. I was however disappointed to find no dataset containing anime specific images for popular anime. Some other great datasets that I've been inspired by include: - Top 250 Anime 2023 - Anime Recommendations Database - Anime Recommendation Database 2020 - Anime Face Dataset - Safebooru - Anime Image Metadata

    Process

    1. You need a list of anime to scrape it. You can either:
      • Make your own list. This is what I do in the directory called "scraped_anime_list".
      • Use someone else's list. This is what I do in the directory called "kaggle_anime_list" and "top_anime_list".
    2. To be honest, I wanted to make my own list. To make a list of anime, I used the python wrapper of the unofficial MAL (MyAnimeList) API called JikanPy. JikanPy scraped MAL.
    3. Animes on MAL have a unique identifier called anime id, think of this as a unique number for each anime. This is supposed to be sequential but there are a lot of gaps from valid anime id to the next, which I discovered based on this post.
    4. These IDs can go from 1 - 100,000 and maybe beyond. However, I decided to go through the anime ids one by one from 1-50,000 and retrive the id, rank and anime_name. This is what you will find in the folder called "scraped_anime_list". Note that I prefer using the English name of the anime if it exists, and if it doesn't I get the Japanese name. Please use this list to obtain the anime ids if you intend to scrape MAL yourself, it will save you a LOT of time.
    5. I thought that someone else might've gone through and same process and voila, I found MyAnimeList Dataset on kaggle. I didn't want to wait for my scraper to finish scraping, so I decided to use this "anime_cleaned.csv" version of this list. The lists from this dataset are what you find in the "kaggle_anime_list" folder.
    6. Cleaning anime names is a task in and of itself. Within the GitHub repo, refer to the file called "notes_and_todo.md" to look at all the cleaning troubles. I tried my best to remove all:
      • Anime Movies: Since you have for instance One Piece (the anime) and One Piece Movie 1, One Piece Movie 2, and so on.
      • Seasons: MAL is an anime ranker. Different anime seasons can show up on the list with different ranks. I retain the original anime name (the most basic ones, for instance, just "Gintama" instead of "Gintama Season 4".
    7. Ultimately, I manually curated around 300 anime names, which reduced to 231 after removing duplicates, since after the curation, "Gintama" and "Gintama: Enchousen" would both be named "Gintama". This list with the duplicates is what you find in the file called "UsableAnimeList.xlsx" within the "top_anime_list" folder.
    8. This list is then rid of the duplicates and used to scrape the image URLs for each anime found in the folder called "anime_img_urls".
    9. These URLs are then used to scrape the anime images themselves, found in the folder called "anime_images".
    10. Also the tags are only a guide, feel free to use this dataset for any Deep Learning task.

    Sources

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted Access [Dataset]. https://catalog.data.gov/dataset/hcup-nationwide-ambulatory-surgery-sample-nass-database-restricted-access
Organization logoOrganization logo

HCUP Nationwide Ambulatory Surgery Sample (NASS) Database – Restricted Access

Explore at:
Dataset updated
Jul 16, 2025
Description

The largest all-payer ambulatory surgery database in the United States, the Healthcare Cost and Utilization Project (HCUP) Nationwide Ambulatory Surgery Sample (NASS) produces national estimates of major ambulatory surgery encounters in hospital-owned facilities. Major ambulatory surgeries are defined as selected major therapeutic procedures that require the use of an operating room, penetrate or break the skin, and involve regional anesthesia, general anesthesia, or sedation to control pain (i.e., surgeries flagged as "narrow" in the HCUP Surgery Flag Software). Unweighted, the NASS contains approximately 9.0 million ambulatory surgery encounters each year and approximately 11.8 million ambulatory surgery procedures. Weighted, it estimates approximately 11.9 million ambulatory surgery encounters and 15.7 million ambulatory surgery procedures. Sampled from the HCUP State Ambulatory Surgery and Services Databases (SASD) and State Emergency Department Databases (SEDD) in order to capture both planned and emergent major ambulatory surgeries, the NASS can be used to examine selected ambulatory surgery utilization patterns. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels. The NASS contains clinical and resource-use information that is included in a typical hospital-owned facility record, including patient characteristics, clinical diagnostic and surgical procedure codes, disposition of patients, total charges, facility characteristics, and expected source of payment, regardless of payer, including patients covered by Medicaid, private insurance, and the uninsured. The NASS excludes data elements that could directly or indirectly identify individuals, hospitals, or states. The NASS is limited to encounters with at least one in-scope major ambulatory surgery on the record, performed at hospital-owned facilities. Procedures intended primarily for diagnostic purposes are not considered in-scope. Restricted access data files are available with a data use agreement and brief online security training.

Search
Clear search
Close search
Google apps
Main menu