94 datasets found

f
Minimal data set containing the radius determined for each sample used in...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Sep 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra (2021). Minimal data set containing the radius determined for each sample used in this study, organized by Figure. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000900449
Explore at:
Dataset updated
Sep 27, 2021
Authors
Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra
Description
This dataset also includes the values for the linear regression analyses used to derive Eqs 1–22. (XLSX)
Minimal Example Dataset
kaggle.com
zip
Updated Mar 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
smartcaveman (2020). Minimal Example Dataset [Dataset]. https://www.kaggle.com/datasets/smartcaveman/minimal-example-dataset
Explore at:
zip(441 bytes)Available download formats
Dataset updated
Mar 30, 2020
Authors
smartcaveman
Description
Dataset

This dataset was created by smartcaveman

Contents
WIC Participant and Program Characteristics 2020
agdatacommons.nal.usda.gov
docx
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA Food and Nutrition Service, Office of Policy Support (2025). WIC Participant and Program Characteristics 2020 [Dataset]. http://doi.org/10.15482/USDA.ADC/1527885
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1527885
Dataset updated
Nov 21, 2025
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Food and Nutrition Servicehttps://www.fns.usda.gov/
Authors
USDA Food and Nutrition Service, Office of Policy Support
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Background: In 1986, the Congress enacted Public Laws 99-500 and 99-591, requiring a biennial report on the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC). In response to these requirements, FNS developed a prototype system that allowed for the routine acquisition of information on WIC participants from WIC State Agencies. Since 1992, State Agencies have provided electronic copies of these data to FNS on a biennial basis.FNS and the National WIC Association (formerly National Association of WIC Directors) agreed on a set of data elements for the transfer of information. In addition, FNS established a minimum standard dataset for reporting participation data. For each biennial reporting cycle, each State Agency is required to submit a participant-level dataset containing standardized information on persons enrolled at local agencies for the reference month of April. The 2020 Participant and Program Characteristics (PC2020) is the 17th to be completed using the prototype PC reporting system. In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Processing methods and equipment used: Specifications on formats (“Guidance for States Providing Participant Data”) were provided to all State agencies in January 2020. This guide specified 20 minimum dataset (MDS) elements and 11 supplemental dataset (SDS) elements to be reported on each WIC participant. Each State Agency was required to submit all 20 MDS items and any SDS items collected by the State agency. Study date(s) and duration The information for each participant was from the participants’ most current WIC certification as of April 2020.Study spatial scale (size of replicates and spatial scale of study area): In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Level of true replication: UnknownSampling precision (within-replicate sampling or pseudoreplication):State Agency Data Submissions. PC2020 is a participant dataset consisting of 7,036,867 active records. The records, submitted to USDA by the State Agencies, comprise a census of all WIC enrollees, so there is no sampling involved in the collection of this data.PII Analytic Datasets. State agency files were combined to create a national census participant file of approximately 7 million records. The census dataset contains potentially personally identifiable information (PII) and is therefore not made available to the public.National Sample Dataset. The public use SAS analytic dataset made available to the public has been constructed from a nationally representative sample drawn from the census of WIC participants, selected by participant category. The national sample consists of 1 percent of the total number of participants, or 70,368 records. The distribution by category is 5,469 pregnant women, 6,131 breastfeeding women, 4,373 postpartum women, 16,817 infants, and 37,578 children.Level of subsampling (number and repeat or within-replicate sampling): The proportionate (or self-weighting) sample was drawn by WIC participant category: pregnant women, breastfeeding women, postpartum women, infants, and children. In this type of sample design, each WIC participant has the same probability of selection across all strata. Sampling weights are not needed when the data are analyzed. In a proportionate stratified sample, the largest stratum accounts for the highest percentage of the analytic sample.Study design (before–after, control–impacts, time series, before–after-control–impacts): None – Non-experimentalDescription of any data manipulation, modeling, or statistical analysis undertaken: Each entry in the dataset contains all MDS and SDS information submitted by the State agency on the sampled WIC participant. In addition, the file contains constructed variables used for analytic purposes. To protect individual privacy, the public use file does not include State agency, local agency, or case identification numbers.Description of any gaps in the data or other limiting factors: All State agencies provided data on a census of their WIC participants.Resources in this dataset:Resource Title: WIC PC 2020 National Sample File Public Use Codebook.; File Name: PC2020 National Sample File Public Use Codebook.docx; Resource Description: WIC PC 2020 National Sample File Public Use CodebookResource Title: WIC PC 2020 Public Use CSV Data.; File Name: wicpc2020_public_use.csv; Resource Description: WIC PC 2020 Public Use CSV DataResource Title: WIC PC 2020 Data Set SAS, R, SPSS, Stata.; File Name: PC2020 Ag Data Commons.zipResource; Description: WIC PC 2020 Data Set SAS, R, SPSS, Stata One dataset in multiple formats
CMS Payroll Based Journal Daily Non-Nurse Staffing
datalumos.org
delimited
Updated May 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Health and Human Services. Centers for Medicare and Medicaid Services (2025). CMS Payroll Based Journal Daily Non-Nurse Staffing [Dataset]. http://doi.org/10.3886/E231310V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E231310V1
Dataset updated
May 29, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Centers for Medicare & Medicaid Services
Authors
United States Department of Health and Human Services. Centers for Medicare and Medicaid Services
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2017 - Dec 31, 2024
Description
The Payroll Based Journal (PBJ) Nurse Staffing and Non-Nurse Staffing datasets provide information submitted by nursing homes including rehabilitation services on a quarterly basis. The data include the hours staff are paid to work each day, for each facility. Examples of reporting categories include Director of Nursing, Administrative Registered Nurses, Registered Nursing, Administrative Licensed Practice Nurses, Licensed Practice Nurses, Certified Nurse Aides, Certified Medication Aides, and Nurse Aides in Training. There are also other non-nurse staff categories provided in the data such as Respiratory Therapist, Occupational Therapist, and Social Worker. The datasets also include a facility’s daily census calculated using the Minimum Data Set (MDS) submission.The Payroll Based Journal (PBJ) Employee Detail Nursing Home Staffing datasets and technical information have been moved to a new location. Note: This full dataset contains more records than most spreadsheet programs can handle, which will result in an incomplete load of data. Use of a database or statistical software is required.
MHEALTH Dataset Data Set CSV
kaggle.com
zip
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmal Sankalana (2023). MHEALTH Dataset Data Set CSV [Dataset]. https://www.kaggle.com/datasets/nirmalsankalana/mhealth-dataset-data-set-csv
Explore at:
zip(78174751 bytes)Available download formats
Dataset updated
Jan 4, 2023
Authors
Nirmal Sankalana
Description
Source:

Oresti Banos, Department of Computer Architecture and Computer Technology, University of Granada Rafael Garcia, Department of Computer Architecture and Computer Technology, University of Granada Alejandro Saez, Department of Computer Architecture and Computer Technology, University of Granada

Email to whom correspondence should be addressed: oresti '@' ugr.es (oresti.bl '@' gmail.com)

Data Set Information:

The MHEALTH (Mobile HEALTH) dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing several physical activities. Sensors placed on the subject's chest, right wrist, and left ankle are used to measure the motion experienced by diverse body parts, namely, acceleration, rate of turn, and magnetic field orientation. The sensor positioned on the chest also provides 2-lead ECG measurements, which can be potentially used for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG.

DATASET SUMMARY:

Activities: 12

Sensor devices: 3

Subjects: 10

EXPERIMENTAL SETUP

The collected dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing 12 physical activities (Table 1). Shimmer2 [BUR10] wearable sensors were used for the recordings. The sensors were respectively placed on the subject's chest, right wrist, and left ankle and attached by using elastic straps (as shown in the figure in the attachment). The use of multiple sensors permits us to measure the motion experienced by diverse body parts, namely, the acceleration, the rate of turn, and the magnetic field orientation, thus better capturing the body dynamics. The sensor positioned on the chest also provides 2-lead ECG measurements which are not used for the development of the recognition model but rather collected for future work purposes. This information can be used, for example, for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG. All sensing modalities are recorded at a sampling rate of 50 Hz, which is considered sufficient for capturing human activity. Each session was recorded using a video camera. This dataset is found to generalize to common activities of daily living, given the diversity of body parts involved in each one (e.g., the frontal elevation of arms vs. knees bending), the intensity of the actions (e.g., cycling vs. sitting and relaxing) and their execution speed or dynamicity (e.g., running vs. standing still). The activities were collected in an out-of-lab environment with no constraints on the way these must be executed, with the exception that the subject should try their best when executing them.

ACTIVITY SET

The activity set is listed in the following: L1: Standing still (1 min) L2: Sitting and relaxing (1 min) L3: Lying down (1 min) L4: Walking (1 min) L5: Climbing stairs (1 min) L6: Waist bends forward (20x) L7: Frontal elevation of arms (20x) L8: Knees bending (crouching) (20x) L9: Cycling (1 min) L10: Jogging (1 min) L11: Running (1 min) L12: Jump front & back (20x) NOTE: In brackets are the number of repetitions (Nx) or the duration of the exercises (min).

A complete and illustrated description (including table of activities, sensor setup, etc.) of the dataset is provided in the papers presented in the section â€œCitation Requestsâ€ .

Attribute Information:

The data collected for each subject is stored in a different log file: 'mHealth_subject.log'. Each file contains the samples (by rows) recorded for all sensors (by columns). The labels used to identify the activities are similar to the abovementioned (e.g., the label for walking is '4').

The meaning of each column is detailed next: Column 1: acceleration from the chest sensor (X-axis) Column 2: acceleration from the chest sensor (Y axis) Column 3: acceleration from the chest sensor (Z axis) Column 4: electrocardiogram signal (lead 1) Column 5: electrocardiogram signal (lead 2) Column 6: acceleration from the left-ankle sensor (X-axis) Column 7: acceleration from the left-ankle sensor (Y axis) Column 8: acceleration from the left-ankle sensor (Z axis) Column 9: gyro from the left-ankle sensor (X-axis) Column 10: gyro from the left-ankle sensor (Y axis) Column 11: gyro from the left-ankle sensor (Z axis) Column 13: magnetometer from the left-ankle sensor (X-axis) Column 13: magnetometer from the left-ankle sensor (Y axis) Column 14: magnetometer from the left-ankle sensor (Z axis) Column 15: acceleration from the right-lower-arm sensor (X-axis) Column 16: acceleration from the right-lower-arm sensor (Y axis) Column 17: acceleration from the right-lower-arm sensor (Z axis) Column 18: gyro from the right-lower-arm sensor (X-axis) Column 19: gyro from the right-lower-arm sensor (Y axis) Column 20: gyro fro...
Sample data: Article minimum data set.
plos.figshare.com
application/x-rar
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhaoyang Pan; Liang Ma (2025). Sample data: Article minimum data set. [Dataset]. http://doi.org/10.1371/journal.pone.0316206.s001
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316206.s001
Dataset updated
Feb 18, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Zhaoyang Pan; Liang Ma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent decades, with the support and traction of a number of key policy steps, the scale of China’s sports industry has achieved a new leap. The optimization of industrial structure has made new progress. From "nascent" to "strong", China’s sports industry grows in importance of the national economy. In the meantime, sport is a significant way to promote health. With the rapid growth of people’s requirements for sport and health, it is urgent to re-evaluate the past development path and formulate new directions so as to continuously improve and optimize the system. This study systematically sorts out China’s sports industry documents at different stages, and describes the focus of each stage and the overall evolution track. On this basis, text mining and quantitative evaluation being used to extract high-frequency words of sports industry documents, and a sports industry document evaluation system including 9 first-level indicators and 47 second-level indicators is established. In this study, text similarity analysis is used to realize intelligent PMC index analysis, which effectively improves the analysis efficiency and makes up for the deficiency of simple qualitative analysis. According to the study, China’s sports industry policies are scientific and effective. Combining with the development direction of industrial transformation, it provides ideas for the future adjustment and optimization of sports industry evolution path.
Minimal dataset for ConFindr testing using pytest
figshare.com
txt
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liam Brown (2023). Minimal dataset for ConFindr testing using pytest [Dataset]. http://doi.org/10.6084/m9.figshare.22852937.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22852937.v3
Dataset updated
Jun 13, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Liam Brown
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A .tar.gz archive containing 14 .fastq.gz files, which correspond to paired-end Illumina whole-genome sequence data from different foodborne pathogens, and an associated tab-separated value file for the metadata of these samples. These sequence data were obtained by selecting samples from the originally published ConFindr dataset (doi: 10.7717/peerj.6995) and downsampling them. The metadata for these samples was obtained from the Supplemental Information of the original publication. The DownsampleFactor column in the metadata file corresponds to the factor by which the original samples were downsampled (e.g. 0.5 is 2-fold downsampling, 0.1 is 10-fold).

Changelog

Version 3

Changed test_samples archive from .zip to .tar.gz, as it was in Version 1.

Version 2

Renamed '_1' and '_2' file patterns to '_R1' and '_R2' to reflect default ConFindr parameters.
Environmental data associated to particular health events example dataset
data.europa.eu
unknown
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). Environmental data associated to particular health events example dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5823426?locale=el
Explore at:
unknown(6689542)Available download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set is a collection of environmental records associated with the individual events. The data set has been generated using the serdif-api wrapper (https://github.com/navarral/serdif-api) when sending a CSV file with example events for the Republic of Ireland. The serdif-api send a semantic query that (i) selects the environmental data sets within the region of the event, (ii) filters by the specific period of interest from the event, (iii) aggregates the data sets using the minimum, maximum, average or sum for each of the available variables for a specific time unit. The aggregation method and the time unit can be passed to the serdif-api through the Command Line Interface (CLI) (see example in https://github.com/navarral/serdif-api). The resulting data set format can be also specified as data table (CSV) or as graph (RDF) for analysis and publication as FAIR data. The open-ready data for research is retrieved as a zip file that contains: (i) data as csv: environmental data associated to particular events as a data table (ii) data as rdf: environmental data associated to particular events as a graph (iii) metadata for publication as rdf: metadata record with generalized information about the data that do not contain personal data as a graph; therefore, publishable. (iv) metadata for research as rdf: metadata records with detailed information about the data, such as individual dates, regions, data sets used and data lineage; which could lead to data privacy issues if published without approval from the Data Protection Officer (DPO) and data controller.
Z
Data from: Large Landing Trajectory Data Set for Go-Around Analysis
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
Explore at:
Dataset updated
Dec 16, 2022
Dataset provided by
ZHAW
Authors
Raphael Monstein; Benoit Figuet; Timothé Krauth; Manuel Waltert; Marcel Dettling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name Type Description time date time UTC time of landing or first GA attempt icao24 string Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned callsign string Aircraft identifier in air-ground communications airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed has_ga string "True" if at least one GA was performed, otherwise "False" n_approaches integer Number of approaches identified for this flight n_rwy_approached integer Number of unique runways approached by this flight registration string Aircraft registration typecode string Aircraft ICAO typecode icaoaircrafttype string ICAO aircraft type wtc string ICAO wake turbulence category glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection

string

Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometre airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania) operator_country string ISO Alpha-3 country code of the operator operator_region string Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania) wind_speed_knts integer METAR, surface wind speed in knots wind_dir_deg integer METAR, surface wind direction in degrees wind_gust_knts integer METAR, surface wind gust speed in knots visibility_m float METAR, visibility in m temperature_deg integer METAR, temperature in degrees Celsius press_sea_level_p float METAR, sea level pressure in hPa press_p float METAR, QNH in hPA weather_intensity list METAR, list of present weather codes: qualifier - intensity weather_precipitation list METAR, list of present weather codes: weather phenomena - precipitation weather_desc list METAR, list of present weather codes: qualifier - descriptor weather_obscuration list METAR, list of present weather codes: weather phenomena - obscuration weather_other list METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name Type Description airport string ICAO airport code where the aircraft is landing runway string Runway designator on which the aircraft landed n_landings integer Total number of landings observed on this runway in 2019 ga_rate float Go-around rate, per 1000 landings glide_slope_angle float Angle of the ILS glide slope in degrees has_intersection string Boolean that is true if the runway has an other runway intersecting it, otherwise false rwy_length float Length of the runway in kilometres airport_country string ISO Alpha-3 country code of the airport airport_region string Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

load minimum data set

df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

select London City Airport, go-arounds, and 2019-01-04

airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

iterate over flights and pull the data from OpenSky Network

flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

# fetch the data from OpenSky Network flights.append( opensky.history( start=start_time.strftime("%Y-%m-%d %H:%M:%S"), stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"), callsign=row["callsign"], return_flight=True, ) )

The flights can be converted into a Traffic object

Traffic.from_flights(flights)

Additional files

Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.
Minimal data set.
plos.figshare.com
datasetcatalog.nlm.nih.gov
+1more
xlsx
Updated Dec 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mila Pacheco; Pedro Sá; Gláucia Santos; Ney Boa-Sorte; Kilma Domingues; Larissa Assis; Marina Silva; Ana Oliveira; Daniel Santos; Jamile Ferreira; Rosemeire Fernandes; Flora Fortes; Raquel Rocha; Genoile Santana (2023). Minimal data set. [Dataset]. http://doi.org/10.1371/journal.pone.0295832.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295832.s004
Dataset updated
Dec 27, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mila Pacheco; Pedro Sá; Gláucia Santos; Ney Boa-Sorte; Kilma Domingues; Larissa Assis; Marina Silva; Ana Oliveira; Daniel Santos; Jamile Ferreira; Rosemeire Fernandes; Flora Fortes; Raquel Rocha; Genoile Santana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AimsEvaluate the impact of an intervention program in non-adherent patients with ulcerative colitis.MethodsParallel controlled randomized clinical trial (1:1), approved by the ethics committee (No. 3.068.511/2018) and registered at The Brazilian Clinical Trials Registry (No. RBR-79dn4k). Non-adherent ulcerative colitis patients according to the Morisky-Green-Levine-test were included. Recruitment began in August 2019 until August 2020, with 6-month follow-up. All participants received standard usual care, and additionally the intervention group received educational (video, educational leaflet, verbal guidance) and behavioral interventions (therapeutic scheme, motivational and reminder type short message services). Researchers were blinded for allocation prior to data collection at Visits 1 and 2 (0 and 6 months). Primary outcome: 180-day adherence rate, with relative risk 95%CI. Secondary outcome: 180-day quality of life according to SF-36 domains, using Student’s t test. Variables with p
f
Minimal dataset.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mujakovic, Suhreta; Gorgels, Koen M. F.; Hoebe, Christian J. P. A.; Hackert, Volker H.; Stallenberg, Eline (2024). Minimal dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001426515
Explore at:
Dataset updated
Jun 17, 2024
Authors
Mujakovic, Suhreta; Gorgels, Koen M. F.; Hoebe, Christian J. P. A.; Hackert, Volker H.; Stallenberg, Eline
Description
There has been a lot of discussion about the role of schools in the transmission of severe acute respiratory coronavirus 2 (SARS-CoV-2) during the coronavirus 2019 (COVID-19) pandemic, where many countries responded with school closures in 2020. Reopening of primary schools in the Netherlands in February 2021 was sustained by various non-pharmaceutical interventions (NPIs) following national recommendations. Our study attempted to assess the degree of regional implementation and effectiveness of these NPIs in South Limburg, Netherlands. We approached 150 primary schools with a structured questionnaire containing items on the implementation of NPIs, including items on ventilation. Based on our registry of cases, we determined the number of COVID-19 cases linked to each school, classifying cases by their source of transmission. We calculated a crude secondary attack rate by dividing the number of cases of within-school transmission by the total number of children and staff members. Two-sample proportion tests were performed to compare these rates between schools stratified by the presence of a ventilation system and mask mandates for staff members. A total of 69 schools responded. Most implemented NPIs were aimed at students, except for masking mandates, which preferentially targeted teachers over students (63% versus 22%). We observed lower crude secondary attack rates in schools with a ventilation system compared to schools without a ventilation system (1.2% versus 2.8%, p<0.01). Mandatory masking for staff members had no effect on the overall crude secondary attack rate (2.0% versus 2.1%, p = 0.03) but decreased the crude secondary attack rate among staff members (2.3% versus 1.7%, p<0.01). Schools varied in their implementation of NPIs, most of which targeted students. Rates of within-school transmission were higher compared to other studies, possibly due to a lack of proper ventilation. Our research may help improve guidance for primary schools in future outbreaks.
Phishing URL Content Dataset
kaggle.com
zip
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaditey Pillai (2024). Phishing URL Content Dataset [Dataset]. https://www.kaggle.com/datasets/aaditeypillai/phishing-website-content-dataset
Explore at:
zip(62701 bytes)Available download formats
Dataset updated
Nov 25, 2024
Authors
Aaditey Pillai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Phishing URL Content Dataset

Executive Summary

Motivation:
Phishing attacks are one of the most significant cyber threats in today’s digital era, tricking users into divulging sensitive information like passwords, credit card numbers, and personal details. This dataset aims to support research and development of machine learning models that can classify URLs as phishing or benign.

Applications:
- Building robust phishing detection systems.
- Enhancing security measures in email filtering and web browsing.
- Training cybersecurity practitioners in identifying malicious URLs.

The dataset contains diverse features extracted from URL structures, HTML content, and website metadata, enabling deep insights into phishing behavior patterns.

Description of Data

This dataset comprises two types of URLs:
1. Phishing URLs: Malicious URLs designed to deceive users. 2. Benign URLs: Legitimate URLs posing no harm to users.

Key Features:
- URL-based features: Domain, protocol type (HTTP/HTTPS), and IP-based links.
- Content-based features: Link density, iframe presence, external/internal links, and metadata.
- Certificate-based features: SSL/TLS details like validity period and organization.
- WHOIS data: Registration details like creation and expiration dates.

Statistics:
- Total Samples: 800 (400 phishing, 400 benign).
- Features: 22 including URL, domain, link density, and SSL attributes.

Power Analysis

To ensure statistical reliability, a power analysis was conducted to determine the minimum sample size required for binary classification with 22 features. Using a medium effect size (0.15), alpha = 0.05, and power = 0.80, the analysis indicated a minimum sample size of ~325 per class. Our dataset exceeds this requirement with 400 examples per class, ensuring robust model training.

Exploratory Data Analysis (EDA)

Insights from EDA:
- Distribution Plots: Histograms and density plots for numerical features like link density, URL length, and iframe counts. - Bar Plots: Class distribution and protocol usage trends. - Correlation Heatmap: Highlights relationships between numerical features to identify multicollinearity or strong patterns. - Box Plots: For SSL certificate validity and URL lengths, comparing phishing versus benign URLs.

EDA visualizations are provided in the repository.

Link to Publicly Available Data and Code

Dataset: Phishing URL Dataset

Code Repository: GitHub - Phishing Detection

The repository contains the Python code used to extract features, conduct EDA, and build the dataset.

Ethics Statement

Phishing detection datasets must balance the need for security research with the risk of misuse. This dataset:
1. Protects User Privacy: No personally identifiable information is included.
2. Promotes Ethical Use: Intended solely for academic and research purposes.
3. Avoids Reinforcement of Bias: Balanced class distribution ensures fairness in training models.

Risks:
- Misuse of the dataset for creating more deceptive phishing attacks.
- Over-reliance on outdated features as phishing tactics evolve.

Researchers are encouraged to pair this dataset with continuous updates and contextual studies of real-world phishing.

Open Source License

This dataset is shared under the MIT License, allowing free use, modification, and distribution for academic and non-commercial purposes. License details can be found here.
Data from: A field-based characterization of conductivity in areas of...
catalog.data.gov
datasets.ai
+3more
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). A field-based characterization of conductivity in areas of minimal alteration: a case example in the Cascades of northwestern United States [Dataset]. https://catalog.data.gov/dataset/a-field-based-characterization-of-conductivity-in-areas-of-minimal-alteration-a-case-examp
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Cascade Range, Northwestern United States, United States
Description
The data set contains 3 files from three sources (1) state (Washington and Oregon), (2) Combined EPA survey date from Griffith, (3) data from USGS. This dataset is associated with the following publication: Cormier, S., L. Zheng, G. Hayslip, and C. Flaherty. A field-based characterization of conductivity in areas of minimal alteration: a case example in the Cascades of northwestern United States. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 633: 1657-1666, (2018).
The associations between TOPICS-CEP scores and sample characteristics for...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis (2023). The associations between TOPICS-CEP scores and sample characteristics for the complete study sample and stratified by subgroup. [Dataset]. http://doi.org/10.1371/journal.pone.0173081.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0173081.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The associations between TOPICS-CEP scores and sample characteristics for the complete study sample and stratified by subgroup.
New 1000 Sales Records Data 2
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description
This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
Article minimum data set.
plos.figshare.com
xlsx
Updated Jun 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiong Wu; Yi Sun; Lei Gao; Wanxing Yin (2025). Article minimum data set. [Dataset]. http://doi.org/10.1371/journal.pone.0316200.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316200.s001
Dataset updated
Jun 2, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Qiong Wu; Yi Sun; Lei Gao; Wanxing Yin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To improve the competitive state of badminton athletes and summarize the technical characteristics of badminton players, this paper introduces multi-dimensional fuzzy removal intelligent computing. Taking 120 badminton students from a sports school as data samples, the sports images of athletes are collected, the images are enhanced using histogram equalization, and then the fuzzy clustering algorithm is used to analyze the characteristics of the pictures. The following results were obtained from the analysis of the understanding degree of motion decomposition, the analysis of the lasting effect, the study of the number of repetitions, and the analysis of the simulation results: The degree of understanding was 17.75% higher than that of traditional training methods; the effect was better than that of conventional training methods; the traditional training method had a small number of action repetitions; the performance of boys and girls in the temporary mock exam would be related to different training methods. Therefore, this paper had practical significance for this research, to help promote such academic and give reference. At the same time, most optimization problems needed to comprehensively consider many factors, so multi-objective optimization algorithms became a hot spot in academic research.
h
PPFT
huggingface.co
Updated Jun 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silin Meng (2024). PPFT [Dataset]. https://huggingface.co/datasets/SilinMeng0510/PPFT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2024
Authors
Silin Meng
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
PPFT: Path Planning on Fifty x Thirty Our dataset consists of 100 manually selected 50 × 30 maps from a randomly generated collection, each with 10 different start and goal positions. Therefore, there are 1000 samples in total. Our data conform to the standard of search-based algorithm environments in a continuous space. Each map includes the following parameters:

x range: The minimum and maximum x-coordinates of the environment boundary range as [x min, x max]. y range: The minimum and… See the full description on the dataset page: https://huggingface.co/datasets/SilinMeng0510/PPFT.
The mean (±SD) scores and floor and ceiling effects for the complete sample...
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis (2023). The mean (±SD) scores and floor and ceiling effects for the complete sample and stratified by subgroup. [Dataset]. http://doi.org/10.1371/journal.pone.0173081.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0173081.t001
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Cynthia S. Hofman; Jennifer E. Lutomski; Han Boter; Bianca M. Buurman; Anton J. M. de Craen; Rogier Donders; Marcel G. M. Olde Rikkert; Peter Makai; René J. F. Melis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The mean (±SD) scores and floor and ceiling effects for the complete sample and stratified by subgroup.
f
Minimal data set.
datasetcatalog.nlm.nih.gov
Updated Aug 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, Xiaodong; Ma, Yongqing; Qu, Xiaofu; Qu, Weiguo; Yang, Miaomiao; He, Ping (2023). Minimal data set. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000960638
Explore at:
Dataset updated
Aug 10, 2023
Authors
Chen, Xiaodong; Ma, Yongqing; Qu, Xiaofu; Qu, Weiguo; Yang, Miaomiao; He, Ping
Description
ObjectiveThe aim of this META-analysis was to evaluate the efficacy of photobiomodulation (PBM) therapy in the treatment of inferior alveolar nerve (IAN) injury due to orthognathic surgeries, extraction of impacted third molars and mandibular fractures.Methods and materialsA electric search was conducted by a combination of manual search and four electric databases including Pubmed, Embase, Cochrane library and Web of Science, with no limitation on language and publication date. Gray literature was searched in ClinicalTrials.gov and googlescholar. All retrieved articles were imported into ENDNOTE software (version X9) and screened by two independent reviewers. All analysis was performed using the REVMAN software (version 5.3)ResultsFinally, 15 randomized controlled trials met the inclusion criteria for qualitative analysis and 14 for META-analysis from 219 articles. The results showed that PBM therapy had no effect on nerve injury in a short period of time (0-48h, 14 days), but had significant effect over 30 days. However, the effect of photobiomodulation therapy on thermal discrimination was still controversial, most authors supported no significant improvement. By calculating the effective rate of PBM, it was found that there was no significant difference in the onset time of treatment, whether within or over 6 months.ConclusionsThe results of this META-analysis show that PBM therapy is effective in the treatment of IAN injures no matter it begins early or later. However, due to the limited number of well-designed RCTs and small number of patients in each study, it would be necessary to conduct randomized controlled trials with large sample size, long follow-up time and more standardized treatment and evaluation methods in the future to provide more accurate and clinically meaningful results.
B
Minimal Dataset for Characterization of a C9orf72 Knockout Danio rerio Model...
borealisdata.ca
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Emond; Carl Laflamme; Martine Therrien; Meijiang Liao; Claudia Maios; Audrey Labarre; Pierre Drapeau; Alex Parker (2025). Minimal Dataset for Characterization of a C9orf72 Knockout Danio rerio Model for ALS and Cross-Species Validation of Therapeutics in Caenorhabditis elegans [Dataset]. http://doi.org/10.5683/SP3/RTGSAG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/RTGSAG
Dataset updated
Mar 21, 2025
Dataset provided by
Borealis
Authors
Alexandre Emond; Carl Laflamme; Martine Therrien; Meijiang Liao; Claudia Maios; Audrey Labarre; Pierre Drapeau; Alex Parker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
The Canadian Institutes of Health Research (CIHR)
The Department of Defense (DoD)
Centre Hospitalier de l’Université de Montréal (CHUM)
The Amyotrophic Lateral Sclerosis (ALS) Society of Canada
Description
This dataset is associated with the manuscript "Characterization of a C9orf72 Knockout Danio rerio Model for ALS and Cross-Species Validation of Potential Therapeutics Screened in Caenorhabditis elegans", currently under review at PLOS Genetics. It represents the minimal dataset required for reproducibility under PLOS Genetics data-sharing policies. Abstract: Intronic hexanucleotide repeat expansions in the C9orf72 gene are the most common genetic cause of the neurodegenerative diseases amyotrophic lateral sclerosis (ALS) and frontotemporal dementia. This expansion reduces C9orf72 expression in affected patients, implicating loss of C9orf72 function (LOF) as a pathogenic mechanism. Various Danio rerio (zebrafish) models of C9orf72 depletion have been developed to investigate disease mechanisms and the effects of C9orf72 LOF. However, there are inconsistencies in reported phenotypes, and most have yet to be validated in stable germline ablation models. To address this, we generated a zebrafish C9orf72 knockout model using CRISPR/Cas9. The C9orf72 LOF model exhibits, in a generally dose-dependent manner, increased larval mortality, persistent growth reduction, and motor deficits. Additionally, homozygous C9orf72 LOF larvae displayed mild overbranching of spinal motoneurons. To identify potential therapeutic compounds, we conducted a screen in an established Caenorhabditis elegans (C. elegans) C9orf72 homologue (alfa-1) LOF model, identifying 12 compounds that improved the motility, neurodegeneration, and paralysis phenotypes. Prompted by the shared motor phenotype, 2 of those compounds were tested in our zebrafish C9orf72 LOF model. Pizotifen malate was found to significantly improve motor deficits in C9orf72 LOF zebrafish larvae. We present a novel zebrafish C9orf72 knockout model that exhibits phenotypic differences from depletion models, providing a valuable tool for in vivo C9orf72 research and ALS therapeutic validation. Furthermore, we identify pizotifen malate as a promising compound for further preclinical evaluation. Dataset Information: This dataset includes raw and processed numerical data from key experimental assays in zebrafish (Danio rerio) and Caenorhabditis elegans. In zebrafish, it contains data from qPCR, sequencing, western blots, survival assays, motor activity tracking, spinal motor neuron morphology analysis, neuromuscular junction integrity evaluation, and drug screening experiments. For C. elegans, it includes data on swimming activity, paralysis assays, neurodegeneration analysis, and drug screening experiments. This dataset represents the minimal dataset required for reproducibility in accordance with PLOS Genetics guidelines and provides all necessary numerical data to replicate the study’s findings. Where relevant, sample raw files are included to ensure data provenance and reproducibility.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra (2021). Minimal data set containing the radius determined for each sample used in this study, organized by Figure. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000900449

Minimal data set containing the radius determined for each sample used in this study, organized by Figure.

Explore at:

Dataset updated

Sep 27, 2021

Authors

Hutson, M. Shane; O’Connor, James; Page-McCaw, Andrea; Akbar, Fabiha Bushra

Description

This dataset also includes the values for the linear regression analyses used to derive Eqs 1–22. (XLSX)

Clear search

Close search

Google apps

Main menu

Minimal data set containing the radius determined for each sample used in...

Minimal Example Dataset

Dataset

Contents

WIC Participant and Program Characteristics 2020

CMS Payroll Based Journal Daily Non-Nurse Staffing

MHEALTH Dataset Data Set CSV

Source:

Data Set Information:

DATASET SUMMARY:

EXPERIMENTAL SETUP

ACTIVITY SET

Attribute Information:

Sample data: Article minimum data set.

Minimal dataset for ConFindr testing using pytest

Environmental data associated to particular health events example dataset

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

load minimum data set

select London City Airport, go-arounds, and 2019-01-04

iterate over flights and pull the data from OpenSky Network

The flights can be converted into a Traffic object

Minimal data set.

Minimal dataset.

Phishing URL Content Dataset

Phishing URL Content Dataset

Executive Summary

Description of Data

Power Analysis

Exploratory Data Analysis (EDA)

Link to Publicly Available Data and Code

Ethics Statement

Open Source License

Data from: A field-based characterization of conductivity in areas of...

The associations between TOPICS-CEP scores and sample characteristics for...

New 1000 Sales Records Data 2

Article minimum data set.

PPFT

The mean (±SD) scores and floor and ceiling effects for the complete sample...

Minimal data set.

Minimal Dataset for Characterization of a C9orf72 Knockout Danio rerio Model...

Minimal data set containing the radius determined for each sample used in this study, organized by Figure.