94 datasets found
  1. f

    Minimal data set containing the radius determined for each sample used in...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Sep 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra (2021). Minimal data set containing the radius determined for each sample used in this study, organized by Figure. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000900449
    Explore at:
    Dataset updated
    Sep 27, 2021
    Authors
    Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra
    Description

    This dataset also includes the values for the linear regression analyses used to derive Eqs 1–22. (XLSX)

  2. Minimal Example Dataset

    • kaggle.com
    zip
    Updated Mar 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    smartcaveman (2020). Minimal Example Dataset [Dataset]. https://www.kaggle.com/datasets/smartcaveman/minimal-example-dataset
    Explore at:
    zip(441 bytes)Available download formats
    Dataset updated
    Mar 30, 2020
    Authors
    smartcaveman
    Description

    Dataset

    This dataset was created by smartcaveman

    Contents

  3. WIC Participant and Program Characteristics 2020

    • agdatacommons.nal.usda.gov
    docx
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Food and Nutrition Service, Office of Policy Support (2025). WIC Participant and Program Characteristics 2020 [Dataset]. http://doi.org/10.15482/USDA.ADC/1527885
    Explore at:
    docxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    United States Department of Agriculturehttp://usda.gov/
    Food and Nutrition Servicehttps://www.fns.usda.gov/
    Authors
    USDA Food and Nutrition Service, Office of Policy Support
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Background: In 1986, the Congress enacted Public Laws 99-500 and 99-591, requiring a biennial report on the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC). In response to these requirements, FNS developed a prototype system that allowed for the routine acquisition of information on WIC participants from WIC State Agencies. Since 1992, State Agencies have provided electronic copies of these data to FNS on a biennial basis.FNS and the National WIC Association (formerly National Association of WIC Directors) agreed on a set of data elements for the transfer of information. In addition, FNS established a minimum standard dataset for reporting participation data. For each biennial reporting cycle, each State Agency is required to submit a participant-level dataset containing standardized information on persons enrolled at local agencies for the reference month of April. The 2020 Participant and Program Characteristics (PC2020) is the 17th to be completed using the prototype PC reporting system. In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Processing methods and equipment used: Specifications on formats (“Guidance for States Providing Participant Data”) were provided to all State agencies in January 2020. This guide specified 20 minimum dataset (MDS) elements and 11 supplemental dataset (SDS) elements to be reported on each WIC participant. Each State Agency was required to submit all 20 MDS items and any SDS items collected by the State agency. Study date(s) and duration The information for each participant was from the participants’ most current WIC certification as of April 2020.Study spatial scale (size of replicates and spatial scale of study area): In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Level of true replication: UnknownSampling precision (within-replicate sampling or pseudoreplication):State Agency Data Submissions. PC2020 is a participant dataset consisting of 7,036,867 active records. The records, submitted to USDA by the State Agencies, comprise a census of all WIC enrollees, so there is no sampling involved in the collection of this data.PII Analytic Datasets. State agency files were combined to create a national census participant file of approximately 7 million records. The census dataset contains potentially personally identifiable information (PII) and is therefore not made available to the public.National Sample Dataset. The public use SAS analytic dataset made available to the public has been constructed from a nationally representative sample drawn from the census of WIC participants, selected by participant category. The national sample consists of 1 percent of the total number of participants, or 70,368 records. The distribution by category is 5,469 pregnant women, 6,131 breastfeeding women, 4,373 postpartum women, 16,817 infants, and 37,578 children.Level of subsampling (number and repeat or within-replicate sampling): The proportionate (or self-weighting) sample was drawn by WIC participant category: pregnant women, breastfeeding women, postpartum women, infants, and children. In this type of sample design, each WIC participant has the same probability of selection across all strata. Sampling weights are not needed when the data are analyzed. In a proportionate stratified sample, the largest stratum accounts for the highest percentage of the analytic sample.Study design (before–after, control–impacts, time series, before–after-control–impacts): None – Non-experimentalDescription of any data manipulation, modeling, or statistical analysis undertaken: Each entry in the dataset contains all MDS and SDS information submitted by the State agency on the sampled WIC participant. In addition, the file contains constructed variables used for analytic purposes. To protect individual privacy, the public use file does not include State agency, local agency, or case identification numbers.Description of any gaps in the data or other limiting factors: All State agencies provided data on a census of their WIC participants.Resources in this dataset:Resource Title: WIC PC 2020 National Sample File Public Use Codebook.; File Name: PC2020 National Sample File Public Use Codebook.docx; Resource Description: WIC PC 2020 National Sample File Public Use CodebookResource Title: WIC PC 2020 Public Use CSV Data.; File Name: wicpc2020_public_use.csv; Resource Description: WIC PC 2020 Public Use CSV DataResource Title: WIC PC 2020 Data Set SAS, R, SPSS, Stata.; File Name: PC2020 Ag Data Commons.zipResource; Description: WIC PC 2020 Data Set SAS, R, SPSS, Stata One dataset in multiple formats

  4. CMS Payroll Based Journal Daily Non-Nurse Staffing

    • datalumos.org
    delimited
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. Centers for Medicare and Medicaid Services (2025). CMS Payroll Based Journal Daily Non-Nurse Staffing [Dataset]. http://doi.org/10.3886/E231310V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Centers for Medicare & Medicaid Services
    Authors
    United States Department of Health and Human Services. Centers for Medicare and Medicaid Services
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2017 - Dec 31, 2024
    Description

    The Payroll Based Journal (PBJ) Nurse Staffing and Non-Nurse Staffing datasets provide information submitted by nursing homes including rehabilitation services on a quarterly basis. The data include the hours staff are paid to work each day, for each facility. Examples of reporting categories include Director of Nursing, Administrative Registered Nurses, Registered Nursing, Administrative Licensed Practice Nurses, Licensed Practice Nurses, Certified Nurse Aides, Certified Medication Aides, and Nurse Aides in Training. There are also other non-nurse staff categories provided in the data such as Respiratory Therapist, Occupational Therapist, and Social Worker. The datasets also include a facility’s daily census calculated using the Minimum Data Set (MDS) submission.The Payroll Based Journal (PBJ) Employee Detail Nursing Home Staffing datasets and technical information have been moved to a new location. Note: This full dataset contains more records than most spreadsheet programs can handle, which will result in an incomplete load of data. Use of a database or statistical software is required.

  5. MHEALTH Dataset Data Set CSV

    • kaggle.com
    zip
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmal Sankalana (2023). MHEALTH Dataset Data Set CSV [Dataset]. https://www.kaggle.com/datasets/nirmalsankalana/mhealth-dataset-data-set-csv
    Explore at:
    zip(78174751 bytes)Available download formats
    Dataset updated
    Jan 4, 2023
    Authors
    Nirmal Sankalana
    Description

    Source:

    Oresti Banos, Department of Computer Architecture and Computer Technology, University of Granada Rafael Garcia, Department of Computer Architecture and Computer Technology, University of Granada Alejandro Saez, Department of Computer Architecture and Computer Technology, University of Granada

    Email to whom correspondence should be addressed: oresti '@' ugr.es (oresti.bl '@' gmail.com)

    Data Set Information:

    The MHEALTH (Mobile HEALTH) dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing several physical activities. Sensors placed on the subject's chest, right wrist, and left ankle are used to measure the motion experienced by diverse body parts, namely, acceleration, rate of turn, and magnetic field orientation. The sensor positioned on the chest also provides 2-lead ECG measurements, which can be potentially used for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG.

    DATASET SUMMARY:

    • Activities: 12
    • Sensor devices: 3
    • Subjects: 10

    EXPERIMENTAL SETUP

    The collected dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing 12 physical activities (Table 1). Shimmer2 [BUR10] wearable sensors were used for the recordings. The sensors were respectively placed on the subject's chest, right wrist, and left ankle and attached by using elastic straps (as shown in the figure in the attachment). The use of multiple sensors permits us to measure the motion experienced by diverse body parts, namely, the acceleration, the rate of turn, and the magnetic field orientation, thus better capturing the body dynamics. The sensor positioned on the chest also provides 2-lead ECG measurements which are not used for the development of the recognition model but rather collected for future work purposes. This information can be used, for example, for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG. All sensing modalities are recorded at a sampling rate of 50 Hz, which is considered sufficient for capturing human activity. Each session was recorded using a video camera. This dataset is found to generalize to common activities of daily living, given the diversity of body parts involved in each one (e.g., the frontal elevation of arms vs. knees bending), the intensity of the actions (e.g., cycling vs. sitting and relaxing) and their execution speed or dynamicity (e.g., running vs. standing still). The activities were collected in an out-of-lab environment with no constraints on the way these must be executed, with the exception that the subject should try their best when executing them.

    ACTIVITY SET

    The activity set is listed in the following: L1: Standing still (1 min) L2: Sitting and relaxing (1 min) L3: Lying down (1 min) L4: Walking (1 min) L5: Climbing stairs (1 min) L6: Waist bends forward (20x) L7: Frontal elevation of arms (20x) L8: Knees bending (crouching) (20x) L9: Cycling (1 min) L10: Jogging (1 min) L11: Running (1 min) L12: Jump front & back (20x) NOTE: In brackets are the number of repetitions (Nx) or the duration of the exercises (min).

    A complete and illustrated description (including table of activities, sensor setup, etc.) of the dataset is provided in the papers presented in the section “Citation Requests†.

    Attribute Information:

    The data collected for each subject is stored in a different log file: 'mHealth_subject.log'. Each file contains the samples (by rows) recorded for all sensors (by columns). The labels used to identify the activities are similar to the abovementioned (e.g., the label for walking is '4').

    The meaning of each column is detailed next: Column 1: acceleration from the chest sensor (X-axis) Column 2: acceleration from the chest sensor (Y axis) Column 3: acceleration from the chest sensor (Z axis) Column 4: electrocardiogram signal (lead 1) Column 5: electrocardiogram signal (lead 2) Column 6: acceleration from the left-ankle sensor (X-axis) Column 7: acceleration from the left-ankle sensor (Y axis) Column 8: acceleration from the left-ankle sensor (Z axis) Column 9: gyro from the left-ankle sensor (X-axis) Column 10: gyro from the left-ankle sensor (Y axis) Column 11: gyro from the left-ankle sensor (Z axis) Column 13: magnetometer from the left-ankle sensor (X-axis) Column 13: magnetometer from the left-ankle sensor (Y axis) Column 14: magnetometer from the left-ankle sensor (Z axis) Column 15: acceleration from the right-lower-arm sensor (X-axis) Column 16: acceleration from the right-lower-arm sensor (Y axis) Column 17: acceleration from the right-lower-arm sensor (Z axis) Column 18: gyro from the right-lower-arm sensor (X-axis) Column 19: gyro from the right-lower-arm sensor (Y axis) Column 20: gyro fro...

  6. Sample data: Article minimum data set.

    • plos.figshare.com
    application/x-rar
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhaoyang Pan; Liang Ma (2025). Sample data: Article minimum data set. [Dataset]. http://doi.org/10.1371/journal.pone.0316206.s001
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zhaoyang Pan; Liang Ma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent decades, with the support and traction of a number of key policy steps, the scale of China’s sports industry has achieved a new leap. The optimization of industrial structure has made new progress. From "nascent" to "strong", China’s sports industry grows in importance of the national economy. In the meantime, sport is a significant way to promote health. With the rapid growth of people’s requirements for sport and health, it is urgent to re-evaluate the past development path and formulate new directions so as to continuously improve and optimize the system. This study systematically sorts out China’s sports industry documents at different stages, and describes the focus of each stage and the overall evolution track. On this basis, text mining and quantitative evaluation being used to extract high-frequency words of sports industry documents, and a sports industry document evaluation system including 9 first-level indicators and 47 second-level indicators is established. In this study, text similarity analysis is used to realize intelligent PMC index analysis, which effectively improves the analysis efficiency and makes up for the deficiency of simple qualitative analysis. According to the study, China’s sports industry policies are scientific and effective. Combining with the development direction of industrial transformation, it provides ideas for the future adjustment and optimization of sports industry evolution path.

  7. Minimal dataset for ConFindr testing using pytest

    • figshare.com
    txt
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liam Brown (2023). Minimal dataset for ConFindr testing using pytest [Dataset]. http://doi.org/10.6084/m9.figshare.22852937.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Liam Brown
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A .tar.gz archive containing 14 .fastq.gz files, which correspond to paired-end Illumina whole-genome sequence data from different foodborne pathogens, and an associated tab-separated value file for the metadata of these samples. These sequence data were obtained by selecting samples from the originally published ConFindr dataset (doi: 10.7717/peerj.6995) and downsampling them. The metadata for these samples was obtained from the Supplemental Information of the original publication. The DownsampleFactor column in the metadata file corresponds to the factor by which the original samples were downsampled (e.g. 0.5 is 2-fold downsampling, 0.1 is 10-fold).

    Changelog

    Version 3

    Changed test_samples archive from .zip to .tar.gz, as it was in Version 1.

    Version 2

    Renamed '_1' and '_2' file patterns to '_R1' and '_R2' to reflect default ConFindr parameters.

  8. Environmental data associated to particular health events example dataset

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Environmental data associated to particular health events example dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5823426?locale=el
    Explore at:
    unknown(6689542)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data set is a collection of environmental records associated with the individual events. The data set has been generated using the serdif-api wrapper (https://github.com/navarral/serdif-api) when sending a CSV file with example events for the Republic of Ireland. The serdif-api send a semantic query that (i) selects the environmental data sets within the region of the event, (ii) filters by the specific period of interest from the event, (iii) aggregates the data sets using the minimum, maximum, average or sum for each of the available variables for a specific time unit. The aggregation method and the time unit can be passed to the serdif-api through the Command Line Interface (CLI) (see example in https://github.com/navarral/serdif-api). The resulting data set format can be also specified as data table (CSV) or as graph (RDF) for analysis and publication as FAIR data. The open-ready data for research is retrieved as a zip file that contains: (i) data as csv: environmental data associated to particular events as a data table (ii) data as rdf: environmental data associated to particular events as a graph (iii) metadata for publication as rdf: metadata record with generalized information about the data that do not contain personal data as a graph; therefore, publishable. (iv) metadata for research as rdf: metadata records with detailed information about the data, such as individual dates, regions, data sets used and data lineage; which could lead to data privacy issues if published without approval from the Data Protection Officer (DPO) and data controller.

  9. Z

    Data from: Large Landing Trajectory Data Set for Go-Around Analysis

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    ZHAW
    Authors
    Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

    If you use this data for a scientific publication, please consider citing our paper.

    The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

    go_arounds_minimal.csv.gz

    Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    

    The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

    go_arounds_augmented.csv.gz

    Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    
    
        registration
        string
        Aircraft registration
    
    
        typecode
        string
        Aircraft ICAO typecode
    
    
        icaoaircrafttype
        string
        ICAO aircraft type
    
    
        wtc
        string
        ICAO wake turbulence category
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
    

    string

        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometre
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        operator_country
        string
        ISO Alpha-3 country code of the operator
    
    
        operator_region
        string
        Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        wind_speed_knts
        integer
        METAR, surface wind speed in knots
    
    
        wind_dir_deg
        integer
        METAR, surface wind direction in degrees
    
    
        wind_gust_knts
        integer
        METAR, surface wind gust speed in knots
    
    
        visibility_m
        float
        METAR, visibility in m
    
    
        temperature_deg
        integer
        METAR, temperature in degrees Celsius
    
    
        press_sea_level_p
        float
        METAR, sea level pressure in hPa
    
    
        press_p
        float
        METAR, QNH in hPA
    
    
        weather_intensity
        list
        METAR, list of present weather codes: qualifier - intensity
    
    
        weather_precipitation
        list
        METAR, list of present weather codes: weather phenomena - precipitation
    
    
        weather_desc
        list
        METAR, list of present weather codes: qualifier - descriptor
    
    
        weather_obscuration
        list
        METAR, list of present weather codes: weather phenomena - obscuration
    
    
        weather_other
        list
        METAR, list of present weather codes: weather phenomena - other
    

    This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

    go_arounds_agg.csv.gz

    Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        n_landings
        integer
        Total number of landings observed on this runway in 2019
    
    
        ga_rate
        float
        Go-around rate, per 1000 landings
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
        string
        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometres
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    

    This aggregated data set is used in the paper for the generalized linear regression model.

    Downloading the trajectories

    Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

    import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

    load minimum data set

    df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

    select London City Airport, go-arounds, and 2019-01-04

    airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

    df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

    iterate over flights and pull the data from OpenSky Network

    flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

    # fetch the data from OpenSky Network
    flights.append(
      opensky.history(
        start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
        stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
        callsign=row["callsign"],
        return_flight=True,
      )
    )
    

    The flights can be converted into a Traffic object

    Traffic.from_flights(flights)

    Additional files

    Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

    validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

    validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.

  10. Minimal data set.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xlsx
    Updated Dec 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mila Pacheco; Pedro Sá; Gláucia Santos; Ney Boa-Sorte; Kilma Domingues; Larissa Assis; Marina Silva; Ana Oliveira; Daniel Santos; Jamile Ferreira; Rosemeire Fernandes; Flora Fortes; Raquel Rocha; Genoile Santana (2023). Minimal data set. [Dataset]. http://doi.org/10.1371/journal.pone.0295832.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 27, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mila Pacheco; Pedro Sá; Gláucia Santos; Ney Boa-Sorte; Kilma Domingues; Larissa Assis; Marina Silva; Ana Oliveira; Daniel Santos; Jamile Ferreira; Rosemeire Fernandes; Flora Fortes; Raquel Rocha; Genoile Santana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AimsEvaluate the impact of an intervention program in non-adherent patients with ulcerative colitis.MethodsParallel controlled randomized clinical trial (1:1), approved by the ethics committee (No. 3.068.511/2018) and registered at The Brazilian Clinical Trials Registry (No. RBR-79dn4k). Non-adherent ulcerative colitis patients according to the Morisky-Green-Levine-test were included. Recruitment began in August 2019 until August 2020, with 6-month follow-up. All participants received standard usual care, and additionally the intervention group received educational (video, educational leaflet, verbal guidance) and behavioral interventions (therapeutic scheme, motivational and reminder type short message services). Researchers were blinded for allocation prior to data collection at Visits 1 and 2 (0 and 6 months). Primary outcome: 180-day adherence rate, with relative risk 95%CI. Secondary outcome: 180-day quality of life according to SF-36 domains, using Student’s t test. Variables with p

  11. f

    Minimal dataset.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mujakovic, Suhreta; Gorgels, Koen M. F.; Hoebe, Christian J. P. A.; Hackert, Volker H.; Stallenberg, Eline (2024). Minimal dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001426515
    Explore at:
    Dataset updated
    Jun 17, 2024
    Authors
    Mujakovic, Suhreta; Gorgels, Koen M. F.; Hoebe, Christian J. P. A.; Hackert, Volker H.; Stallenberg, Eline
    Description

    There has been a lot of discussion about the role of schools in the transmission of severe acute respiratory coronavirus 2 (SARS-CoV-2) during the coronavirus 2019 (COVID-19) pandemic, where many countries responded with school closures in 2020. Reopening of primary schools in the Netherlands in February 2021 was sustained by various non-pharmaceutical interventions (NPIs) following national recommendations. Our study attempted to assess the degree of regional implementation and effectiveness of these NPIs in South Limburg, Netherlands. We approached 150 primary schools with a structured questionnaire containing items on the implementation of NPIs, including items on ventilation. Based on our registry of cases, we determined the number of COVID-19 cases linked to each school, classifying cases by their source of transmission. We calculated a crude secondary attack rate by dividing the number of cases of within-school transmission by the total number of children and staff members. Two-sample proportion tests were performed to compare these rates between schools stratified by the presence of a ventilation system and mask mandates for staff members. A total of 69 schools responded. Most implemented NPIs were aimed at students, except for masking mandates, which preferentially targeted teachers over students (63% versus 22%). We observed lower crude secondary attack rates in schools with a ventilation system compared to schools without a ventilation system (1.2% versus 2.8%, p<0.01). Mandatory masking for staff members had no effect on the overall crude secondary attack rate (2.0% versus 2.1%, p = 0.03) but decreased the crude secondary attack rate among staff members (2.3% versus 1.7%, p<0.01). Schools varied in their implementation of NPIs, most of which targeted students. Rates of within-school transmission were higher compared to other studies, possibly due to a lack of proper ventilation. Our research may help improve guidance for primary schools in future outbreaks.

  12. Phishing URL Content Dataset

    • kaggle.com
    zip
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaditey Pillai (2024). Phishing URL Content Dataset [Dataset]. https://www.kaggle.com/datasets/aaditeypillai/phishing-website-content-dataset
    Explore at:
    zip(62701 bytes)Available download formats
    Dataset updated
    Nov 25, 2024
    Authors
    Aaditey Pillai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Phishing URL Content Dataset

    Executive Summary

    Motivation:
    Phishing attacks are one of the most significant cyber threats in today’s digital era, tricking users into divulging sensitive information like passwords, credit card numbers, and personal details. This dataset aims to support research and development of machine learning models that can classify URLs as phishing or benign.

    Applications:
    - Building robust phishing detection systems.
    - Enhancing security measures in email filtering and web browsing.
    - Training cybersecurity practitioners in identifying malicious URLs.

    The dataset contains diverse features extracted from URL structures, HTML content, and website metadata, enabling deep insights into phishing behavior patterns.

    Description of Data

    This dataset comprises two types of URLs:
    1. Phishing URLs: Malicious URLs designed to deceive users. 2. Benign URLs: Legitimate URLs posing no harm to users.

    Key Features:
    - URL-based features: Domain, protocol type (HTTP/HTTPS), and IP-based links.
    - Content-based features: Link density, iframe presence, external/internal links, and metadata.
    - Certificate-based features: SSL/TLS details like validity period and organization.
    - WHOIS data: Registration details like creation and expiration dates.

    Statistics:
    - Total Samples: 800 (400 phishing, 400 benign).
    - Features: 22 including URL, domain, link density, and SSL attributes.

    Power Analysis

    To ensure statistical reliability, a power analysis was conducted to determine the minimum sample size required for binary classification with 22 features. Using a medium effect size (0.15), alpha = 0.05, and power = 0.80, the analysis indicated a minimum sample size of ~325 per class. Our dataset exceeds this requirement with 400 examples per class, ensuring robust model training.

    Exploratory Data Analysis (EDA)

    Insights from EDA:
    - Distribution Plots: Histograms and density plots for numerical features like link density, URL length, and iframe counts. - Bar Plots: Class distribution and protocol usage trends. - Correlation Heatmap: Highlights relationships between numerical features to identify multicollinearity or strong patterns. - Box Plots: For SSL certificate validity and URL lengths, comparing phishing versus benign URLs.

    EDA visualizations are provided in the repository.

    Link to Publicly Available Data and Code

    The repository contains the Python code used to extract features, conduct EDA, and build the dataset.

    Ethics Statement

    Phishing detection datasets must balance the need for security research with the risk of misuse. This dataset:
    1. Protects User Privacy: No personally identifiable information is included.
    2. Promotes Ethical Use: Intended solely for academic and research purposes.
    3. Avoids Reinforcement of Bias: Balanced class distribution ensures fairness in training models.

    Risks:
    - Misuse of the dataset for creating more deceptive phishing attacks.
    - Over-reliance on outdated features as phishing tactics evolve.

    Researchers are encouraged to pair this dataset with continuous updates and contextual studies of real-world phishing.

    Open Source License

    This dataset is shared under the MIT License, allowing free use, modification, and distribution for academic and non-commercial purposes. License details can be found here.

  13. Data from: A field-based characterization of conductivity in areas of...

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). A field-based characterization of conductivity in areas of minimal alteration: a case example in the Cascades of northwestern United States [Dataset]. https://catalog.data.gov/dataset/a-field-based-characterization-of-conductivity-in-areas-of-minimal-alteration-a-case-examp
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Cascade Range, Northwestern United States, United States
    Description

    The data set contains 3 files from three sources (1) state (Washington and Oregon), (2) Combined EPA survey date from Griffith, (3) data from USGS. This dataset is associated with the following publication: Cormier, S., L. Zheng, G. Hayslip, and C. Flaherty. A field-based characterization of conductivity in areas of minimal alteration: a case example in the Cascades of northwestern United States. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 633: 1657-1666, (2018).

  14. The associations between TOPICS-CEP scores and sample characteristics for...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis (2023). The associations between TOPICS-CEP scores and sample characteristics for the complete study sample and stratified by subgroup. [Dataset]. http://doi.org/10.1371/journal.pone.0173081.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The associations between TOPICS-CEP scores and sample characteristics for the complete study sample and stratified by subgroup.

  15. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  16. Article minimum data set.

    • plos.figshare.com
    xlsx
    Updated Jun 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiong Wu; Yi Sun; Lei Gao; Wanxing Yin (2025). Article minimum data set. [Dataset]. http://doi.org/10.1371/journal.pone.0316200.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Qiong Wu; Yi Sun; Lei Gao; Wanxing Yin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To improve the competitive state of badminton athletes and summarize the technical characteristics of badminton players, this paper introduces multi-dimensional fuzzy removal intelligent computing. Taking 120 badminton students from a sports school as data samples, the sports images of athletes are collected, the images are enhanced using histogram equalization, and then the fuzzy clustering algorithm is used to analyze the characteristics of the pictures. The following results were obtained from the analysis of the understanding degree of motion decomposition, the analysis of the lasting effect, the study of the number of repetitions, and the analysis of the simulation results: The degree of understanding was 17.75% higher than that of traditional training methods; the effect was better than that of conventional training methods; the traditional training method had a small number of action repetitions; the performance of boys and girls in the temporary mock exam would be related to different training methods. Therefore, this paper had practical significance for this research, to help promote such academic and give reference. At the same time, most optimization problems needed to comprehensively consider many factors, so multi-objective optimization algorithms became a hot spot in academic research.

  17. h

    PPFT

    • huggingface.co
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silin Meng (2024). PPFT [Dataset]. https://huggingface.co/datasets/SilinMeng0510/PPFT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 23, 2024
    Authors
    Silin Meng
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    PPFT: Path Planning on Fifty x Thirty Our dataset consists of 100 manually selected 50 × 30 maps from a randomly generated collection, each with 10 different start and goal positions. Therefore, there are 1000 samples in total. Our data conform to the standard of search-based algorithm environments in a continuous space. Each map includes the following parameters:

    x range: The minimum and maximum x-coordinates of the environment boundary range as [x min, x max]. y range: The minimum and… See the full description on the dataset page: https://huggingface.co/datasets/SilinMeng0510/PPFT.

  18. The mean (±SD) scores and floor and ceiling effects for the complete sample...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis (2023). The mean (±SD) scores and floor and ceiling effects for the complete sample and stratified by subgroup. [Dataset]. http://doi.org/10.1371/journal.pone.0173081.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The mean (±SD) scores and floor and ceiling effects for the complete sample and stratified by subgroup.

  19. f

    Minimal data set.

    • datasetcatalog.nlm.nih.gov
    Updated Aug 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, Xiaodong; Ma, Yongqing; Qu, Xiaofu; Qu, Weiguo; Yang, Miaomiao; He, Ping (2023). Minimal data set. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000960638
    Explore at:
    Dataset updated
    Aug 10, 2023
    Authors
    Chen, Xiaodong; Ma, Yongqing; Qu, Xiaofu; Qu, Weiguo; Yang, Miaomiao; He, Ping
    Description

    ObjectiveThe aim of this META-analysis was to evaluate the efficacy of photobiomodulation (PBM) therapy in the treatment of inferior alveolar nerve (IAN) injury due to orthognathic surgeries, extraction of impacted third molars and mandibular fractures.Methods and materialsA electric search was conducted by a combination of manual search and four electric databases including Pubmed, Embase, Cochrane library and Web of Science, with no limitation on language and publication date. Gray literature was searched in ClinicalTrials.gov and googlescholar. All retrieved articles were imported into ENDNOTE software (version X9) and screened by two independent reviewers. All analysis was performed using the REVMAN software (version 5.3)ResultsFinally, 15 randomized controlled trials met the inclusion criteria for qualitative analysis and 14 for META-analysis from 219 articles. The results showed that PBM therapy had no effect on nerve injury in a short period of time (0-48h, 14 days), but had significant effect over 30 days. However, the effect of photobiomodulation therapy on thermal discrimination was still controversial, most authors supported no significant improvement. By calculating the effective rate of PBM, it was found that there was no significant difference in the onset time of treatment, whether within or over 6 months.ConclusionsThe results of this META-analysis show that PBM therapy is effective in the treatment of IAN injures no matter it begins early or later. However, due to the limited number of well-designed RCTs and small number of patients in each study, it would be necessary to conduct randomized controlled trials with large sample size, long follow-up time and more standardized treatment and evaluation methods in the future to provide more accurate and clinically meaningful results.

  20. B

    Minimal Dataset for Characterization of a C9orf72 Knockout Danio rerio Model...

    • borealisdata.ca
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Emond; Carl Laflamme; Martine Therrien; Meijiang Liao; Claudia Maios; Audrey Labarre; Pierre Drapeau; Alex Parker (2025). Minimal Dataset for Characterization of a C9orf72 Knockout Danio rerio Model for ALS and Cross-Species Validation of Therapeutics in Caenorhabditis elegans [Dataset]. http://doi.org/10.5683/SP3/RTGSAG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2025
    Dataset provided by
    Borealis
    Authors
    Alexandre Emond; Carl Laflamme; Martine Therrien; Meijiang Liao; Claudia Maios; Audrey Labarre; Pierre Drapeau; Alex Parker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    The Canadian Institutes of Health Research (CIHR)
    The Department of Defense (DoD)
    Centre Hospitalier de l’Université de Montréal (CHUM)
    The Amyotrophic Lateral Sclerosis (ALS) Society of Canada
    Description

    This dataset is associated with the manuscript "Characterization of a C9orf72 Knockout Danio rerio Model for ALS and Cross-Species Validation of Potential Therapeutics Screened in Caenorhabditis elegans", currently under review at PLOS Genetics. It represents the minimal dataset required for reproducibility under PLOS Genetics data-sharing policies. Abstract: Intronic hexanucleotide repeat expansions in the C9orf72 gene are the most common genetic cause of the neurodegenerative diseases amyotrophic lateral sclerosis (ALS) and frontotemporal dementia. This expansion reduces C9orf72 expression in affected patients, implicating loss of C9orf72 function (LOF) as a pathogenic mechanism. Various Danio rerio (zebrafish) models of C9orf72 depletion have been developed to investigate disease mechanisms and the effects of C9orf72 LOF. However, there are inconsistencies in reported phenotypes, and most have yet to be validated in stable germline ablation models. To address this, we generated a zebrafish C9orf72 knockout model using CRISPR/Cas9. The C9orf72 LOF model exhibits, in a generally dose-dependent manner, increased larval mortality, persistent growth reduction, and motor deficits. Additionally, homozygous C9orf72 LOF larvae displayed mild overbranching of spinal motoneurons. To identify potential therapeutic compounds, we conducted a screen in an established Caenorhabditis elegans (C. elegans) C9orf72 homologue (alfa-1) LOF model, identifying 12 compounds that improved the motility, neurodegeneration, and paralysis phenotypes. Prompted by the shared motor phenotype, 2 of those compounds were tested in our zebrafish C9orf72 LOF model. Pizotifen malate was found to significantly improve motor deficits in C9orf72 LOF zebrafish larvae. We present a novel zebrafish C9orf72 knockout model that exhibits phenotypic differences from depletion models, providing a valuable tool for in vivo C9orf72 research and ALS therapeutic validation. Furthermore, we identify pizotifen malate as a promising compound for further preclinical evaluation. Dataset Information: This dataset includes raw and processed numerical data from key experimental assays in zebrafish (Danio rerio) and Caenorhabditis elegans. In zebrafish, it contains data from qPCR, sequencing, western blots, survival assays, motor activity tracking, spinal motor neuron morphology analysis, neuromuscular junction integrity evaluation, and drug screening experiments. For C. elegans, it includes data on swimming activity, paralysis assays, neurodegeneration analysis, and drug screening experiments. This dataset represents the minimal dataset required for reproducibility in accordance with PLOS Genetics guidelines and provides all necessary numerical data to replicate the study’s findings. Where relevant, sample raw files are included to ensure data provenance and reproducibility.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra (2021). Minimal data set containing the radius determined for each sample used in this study, organized by Figure. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000900449

Minimal data set containing the radius determined for each sample used in this study, organized by Figure.

Explore at:
Dataset updated
Sep 27, 2021
Authors
Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra
Description

This dataset also includes the values for the linear regression analyses used to derive Eqs 1–22. (XLSX)

Search
Clear search
Close search
Google apps
Main menu