Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
5 Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
Five chemicals [2-ethylhexyl 4-hydroxybenzoate (2-EHHB), 4-nonylphenol-branched (4-NP), 4-tert-octylphenol (4-OP), benzyl butyl phthalate (BBP) and dibutyl phthalate (DBP) were subjected to a 21-day Amphibian Metamorphosis Assay (AMA) following OCSPP 890.1100 test guidelines. The selected chemicals exhibited estrogenic or androgenic bioactivity in high throughput screening data obtained from US EPA ToxCast models. Xenopus laevis larvae were exposed nominally to each chemical at 3.6, 10.9, 33.0 and 100 µg/L, except 4-NP for which concentrations were 1.8, 5.5, 16.5 and 50 µg/L. Endpoint data (daily or given study day (SD)) collected included: mortality (daily), developmental stage (SD 7 and 21), hind limb length (HLL) (SD 7 and 21), snout-vent length (SVL) (SD 7 and 21), wet body weight (BW) (SD 7 and 21), and thyroid histopathology (SD 21). 4-OP and BBP caused accelerated development compared to controls at the mean measured concentration of 39.8 and 3.5 µg/L, respectively. Normalized HLL was increased on SD 21 for all chemicals except 4-NP. Histopathology revealed mild thyroid follicular cell hypertrophy at all BBP concentrations, while moderate thyroid follicular cell hypertrophy occurred at the 105 µg /L BBP concentration. Evidence of accelerated metamorphic development was also observed histopathologically in BBP-treated frogs at concentrations as low as 3.5 µg/L. Increased BW relative to control occurred for all chemicals except 4-OP. Increase in SVL was observed in larvae exposed to 4-NP, BBP and DBP on SD 21. With the exception of 4-NP, four of the chemicals tested appeared to alter thyroid axis-driven metamorphosis, albeit through different lines of evidence, with BBP and DBP providing the strongest evidence of effects on the thyroid axis. Citation information for this dataset can be found in Data.gov's References section.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Graphs for all figures are provided along with codes that implement the results described in the paper. We simulate how a spin chain subject to timed local pulses develops long-range entanglement and how timed pulses can also drive a Hubbard chain to a maximally-correlated $\eta$-pairing state. All simulations are performed using exact diagonalization in Mathematica. In Figure 2 we obtain how the central-spin magnetization and the bipartite entanglement in an XY spin-1/2 chain evolves in time. We also obtain the distribution among symmetry sectors with different levels of entanglement and concurrence matrices that show the build-up of long-range Bell pairs. In Figure 3 we show how the result generalizes to larger systems and how the entanglement and preparation time scale with the system size. We also show how the protocol is not sensitive to random timing error of the pulses. In Figure 4 we calculate how the fidelity is affected by several types of imperfections, showing it is relatively robust. In Figure 7 we compute experimentally measurable spin-spin correlations at different stages of the protocol. In Figure 8 we calculate level statistics in the presence of integrability breaking and show that the scaling of entanglement and preparation time are largely unaffected. In Figure 5 we illustrate the protocol for $\eta$-pairing by simulating the evolution of a strongly-interacting, finite Hubbard chain. In Figure 6 we compute signatures of $eta$ pairing, including the average number of $\eta$ pairs, their momentum distribution, and the overlap with the maximally-correlated state as a function of system size.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Home-range estimation is an important application of animal tracking data that is frequently complicated by autocorrelation, sampling irregularity, and small effective sample sizes. We introduce a novel, optimal weighting method that accounts for temporal sampling bias in autocorrelated tracking data. This method corrects for irregular and missing data, such that oversampled times are downweighted and undersampled times are upweighted to minimize error in the home-range estimate. We also introduce computationally efficient algorithms that make this method feasible with large datasets. Generally speaking, there are three situations where weight optimization improves the accuracy of home-range estimates: with marine data, where the sampling schedule is highly irregular, with duty cycled data, where the sampling schedule changes during the observation period, and when a small number of home-range crossings are observed, making the beginning and end times more independent and informative than the intermediate times. Using both simulated data and empirical examples including reef manta ray, Mongolian gazelle, and African buffalo, optimal weighting is shown to reduce the error and increase the spatial resolution of home-range estimates. With a conveniently packaged and computationally efficient software implementation, this method broadens the array of datasets with which accurate space-use assessments can be made.
Merged HDR images of many multi-exposure datasets can be improved with accurate exposure estimation.
The primary objective of this study is to establish the dose-response relationship with regard to efficacy and safety of BIBR 1048 (50 mg bis in die(b.i.d), 150 mg b.i.d, 225 mg b.i.d. and 300 mg quaque die(q.d) ) in preventing venous thromboembolism(VTE) in patients undergoing primary elective total hip and knee replacement.
Aim: Species adapt differently to contrasting environments, such as open habitats with sparse vegetation and forested habitats with dense forest cover. We investigated colonization patterns in the open and forested environments in the Diagonal of Open Formations and surrounding rain forests (i.e., Amazon and Atlantic Forest) in Brazil, tested whether the diversification rates were affected by the environmental conditions, and identified traits that enabled species to persist in those environments.
Location: South America, Brazil.
Taxon: Squamata, Lizards
Methods: We estimated ancestral ranges to identify range shifts relative to traditional open and forested habitats for all species. We used phylogenetic information and the current distribution of species in open and forested environments. To evaluate whether these environments influenced species diversification, we tested 12 models using a Hidden Geographic State Speciation and Extinction analysis. Finally, we combined phylogenetic ...
The dataset consists of 181 HDR images. Each image includes: 1) a RAW exposure stack, 2) an HDR image, 3) simulated camera images at two different exposures 4) Results of 6 single-image HDR reconstruction methods: Endo et al. 2017, Eilertsen et al. 2017, Marnerides et al. 2018, Lee et al. 2018, Liu et al. 2020, and Santos et al. 2020
Project web page More details can be found at: https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr_benchmark/
Overview This dataset contains 181 RAW exposure stacks selected to cover a wide range of image content and lighting conditions. Each scene is composed of 5 RAW exposures and merged into an HDR image using the estimator that accounts photon noise 3. A simple color correction was applied using a reference white point and all merged HDR images were resized to 1920×1280 pixels.
The primary purpose of the dataset was to compare various single image HDR (SI-HDR) methods [1]. Thus, we selected a wide variety of content covering nature, portraits, cities, indoor and outdoor, daylight and night scenes. After merging and resizing, we simulated captures by applying a custom CRF and added realistic camera noise based on estimated noise parameters of Canon 5D Mark III.
The simulated captures were inputs to six selected SI-HDR methods. You can view the reconstructions of various methods for select scenes on our interactive viewer. For the remaining scenes, please download the appropriate zip files. We conducted a rigorous pairwise comparison experiment on these images to find that widely-used metrics did not correlate well with subjective data. We then proposed an improved evaluation protocol for SI-HDR [1].
If you find this dataset useful, please cite [1].
References [1] Param Hanji, Rafał K. Mantiuk, Gabriel Eilertsen, Saghi Hajisharif, and Jonas Unger. 2022. “Comparison of single image hdr reconstruction methods — the caveats of quality assessment.” In Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings (SIGGRAPH ’22 Conference Proceedings). [Online]. Available: https://www.cl.cam.ac.uk/research/rainbow/projects/sihdr_benchmark/
[2] Gabriel Eilertsen, Saghi Hajisharif, Param Hanji, Apostolia Tsirikoglou, Rafał K. Mantiuk, and Jonas Unger. 2021. “How to cheat with metrics in single-image HDR reconstruction.” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. 3998–4007.
[3] Param Hanji, Fangcheng Zhong, and Rafał K. Mantiuk. 2020. “Noise-Aware Merging of High Dynamic Range Image Stacks without Camera Calibration.” In Advances in Image Manipulation (ECCV workshop). Springer, 376–391. [Online]. Available: https://www.cl.cam.ac.uk/research/rainbow/projects/noise-aware-merging/
Raw data to calculate rate of adaptationRaw dataset for rate of adaptation calculations (Figure 1) and related statistics.dataall.csvR code to analyze raw data for rate of adaptationCompetition Analysis.RRaw data to calculate effective population sizesdatacount.csvR code to analayze effective population sizesR code used to analyze effective population sizes; Figure 2Cell Count Ne.RR code to determine our best estimate of the dominance coefficient in each environmentR code to produce figures 3, S4, S5 -- what is the best estimate of dominance? Note, competition and effective population size R code must be run first in the same session.what is h.R
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The lethal dose or concentration which kills 50% of the animals (LD50 or LC50) is an important parameter for scientists to understand the toxicity of chemicals in different scenarios that can be used to make go-no-go decisions, and ultimately assist in the choice of the right personal protective equipment needed for containment. The LD50 assessment process has also required the use of many animals although modern methods have reduced the number of rats needed. Since a compound is usually considered highly toxic when the LD50 is lower than 25 mg/kg, such a classification provides potentially valuable safety information to synthetic chemists and other safety assessment scientists. The need for finding alternative approaches such as computational methods is important to ultimately reduce animal use for this testing further still. We now summarize our efforts to use public data for building in vivo LD50 or LC50 classification and regression machine learning models for various species (rat, mouse, fish, and daphnia) and their fivefold cross-validation statistics with different machine learning algorithms as well as an external curated test set for mouse LD50. These datasets consist of different molecule classes, may cover different activity ranges, and also have a range of dataset sizes. The challenges of using such computational models are that their applicability domain will also need to be understood so that they can be used to make reliable predictions for novel molecules. These machine learning models will also need to be backed up with experimental validation. However, such models could also be used for efforts to bridge gaps in individual toxicity datasets. Making such models available also opens them up to potential misuse or dual use. We will summarize these efforts and propose that they could be used for scoring the millions of commercially available molecules, most of which likely do not have a known LD50 or for that matter any data in vitro or in vivo for toxicity.
This study will test the safety, tolerability, and pharmacokinetics of escalating doses of nusinersen (ISIS 396443) administered into the spinal fluid either two or three times over the duration of the trial, in participants with spinal muscular atrophy (SMA). Four dose levels will be evaluated sequentially. Each dose level will be studied in a cohort of approximately 8 participants, where all participants will receive active drug.
This geodatabase of point, line and polygon features is an effort to consolidate all of the range improvement locations on BLM-managed land in Idaho into one database. Currently, the line feature class has some data for all of the BLM field offices except the Coeur d'Alene and Cottonwood field offices. Range improvements are structures intended to enhance rangeland resources, including wildlife, watershed, and livestock management. Examples of range improvements include water troughs, spring headboxes, culverts, fences, water pipelines, gates, wildlife guzzlers, artificial nest structures, reservoirs, developed springs, corrals, exclosures, etc. These structures were first tracked by the Bureau of Land Management (BLM) in the Job Documentation Report (JDR) System in the early 1960s, which was predominately a paper-based tracking system. In 1988 the JDRs were migrated into and replaced by the automated Range Improvement Project System (RIPS), and version 2.0 is currently being used today. It tracks inventory, status, objectives, treatment, maintenance cycle, maintenance inspection, monetary contributions and reporting. Not all range improvements are documented in the RIPS database; there may be some older range improvements that were built before the JDR tracking system was established. There also may be unauthorized projects that are not in RIPS. Official project files of paper maps, reports, NEPA documents, checklists, etc., document the status of each project and are physically kept in the office with management authority for that project area. In addition, project data is entered into the RIPS system to enable managers to access the data to track progress, run reports, analyze the data, etc. Before Geographic Information System technology most offices kept paper atlases or overlay systems that mapped the locations of the range improvements. The objective of this geodatabase is to migrate the location of historic range improvement projects into a GIS for geospatial use with other data and to centralize the range improvement data for the state. This data set is a work in progress and does not have all range improvement projects that are on BLM lands. Some field offices have not migrated their data into this database, and others are partially completed. New projects may have been built but have not been entered into the system. Historic or unauthorized projects may not have case files and are being mapped and documented as they are found. Many field offices are trying to verify the locations and status of range improvements with GPS, and locations may change or projects that have been abandoned or removed on the ground may be deleted. Attributes may be incomplete or inaccurate. This data was created using the standard for range improvements set forth in Idaho IM 2009-044, dated 6/30/2009. However, it does not have all of the fields the standard requires. Fields that are missing from the line feature class that are in the standard are: ALLOT_NO, MGMT_AGCY, ADMIN_ST, ADMIN_OFF, SRCE_AGCY, MAX_PDOP, MAX_HDOP, CORR_TYPE, RCVR_TYPE, GPS_TIME, UPDATE_STA, UNFILT_POS, FILT_POS, DATA_DICTI, GPS_LENGTH, GPS_3DLGTH, AVE_VERT_P, AVE_HORZ_P, WORST_VERT, WORST_HORZ and CONF_LEVEL. Several additional fields have been added that are not part of the standard: top_fence, btm_fence, admin_fo_line and year_checked. There is no National BLM standard for GIS range improvement data at this time. For more information contact us at blm_id_stateoffice@blm.gov.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "
The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".
Files and variables
File 1: Data_and_Code.zip
Directory: Main_function
**Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.
MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.
Script_1_Ice_velocity_process_flow.m
Script_2_strain_rate_process_flow.m
Script_3_DROT_grounding_line_extraction.m
Script_4_Read_ICESat2_h5_files.m
Script_5_Extraction_results.m
MATLAB functions: Five Files that includes MATLAB functions that support the main script:
1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.
2_strain_rate: Include MATLAB functions related to strain rate calculation.
3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.
4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.
5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).
6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.
Director: data_and_result
Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.
1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.
2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).
3_rockpoint: Extract velocities at non-moving region
4_constant_detrend: removed orbit error
5_Tidal_correction: remove atmospheric and tidal induced error
6_rockpoint: Extract non-aggregated velocities at non-moving region
6_vx_vy_v: trasform velocities from va/vr to vx/vy
7_rockpoint: Extract aggregated velocities at non-moving region
7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps
8_strain_rate: calculated strain rate from aggregate ice velocity
9_compare: store the results before and after tidal correction and aggregation.
10_Block_result: times series results that extrac from 2D data.
11_MALAB_output_png_result: Store .png files and time serties result
12_DROT: Differential Range Offset Tracking results
13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)
14_MODIS_images: you can store MODIS images here
shp: grounding line, rock region, ice front, and other shape files.
File 2 : PIG_front_1947_2023.zip
Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.
File 3 : PIG_DROT_GL_2016_2021.zip
Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.
Data was derived from the following sources:
Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the digitized treatments in Plazi based on the original journal article Anker, Arthur (2021): Second finding, first complete specimen and range extension of the rare alpheid shrimp Bermudacaris britayevi Anker, Poddoubtchenko & Marin, 2006 (Caridea Alpheidae). Zootaxa 4966 (1): 54-60, DOI: 10.11646/zootaxa.4966.1.5
The extend_search extension enhances the CKAN data catalog by adding advanced search capabilities. It focuses on improving how users find datasets by introducing date range filtering based on the 'modified-on' metadata, and enables searching datasets by custodian. By incorporating these features, extend_search makes it easier for users to discover relevant datasets within a CKAN instance. Key Features: Date Range Search Filter: Allows users to filter datasets based on a date range applied to the 'modified-on' metadata field. This feature utilizes the bootstrap-daterangepicker library, crediting Dan Grossman’s work, to provide a user-friendly interface for selecting date ranges. Custodian Search Filter: Introduces the ability to search datasets based on the custodian responsible for the dataset. This facilitates finding datasets managed by specific organizations or individuals. Technical Integration: The extension is installed via standard CKAN extension installation procedures. This involves cloning the repository, installing the required Python packages using pip, installing the extension using setup.py, and enabling the extend_search plugin in the CKAN configuration file (.ini). Benefits & Impact: By implementing the extend_search extension, CKAN installations can improve the findability of datasets, saving users time and effort. Date range filtering is specifically useful when searching for recently updated datasets, while custodian filtering is helpful when looking for datasets managed by specific entities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NOTE: The data was recorded erroneously and will be updated soon.
This dataset contains high quality GNSS and jammer raw data generated during the in-lab validation activities of jamming detection and localization performed in the frame of the GATEMAN project. These files are grouped for each type of validation scenario defined. The scenarios are combinations of three different GNSS and three different jamming signals.
Content:
The archive contains a folder for each validation scenario. Each folder contains two files. One contains the GNSS data and the second file contains jamming data. A function to read the data with GNU Octave/Matlab is also included.
├── GALE1+AMtone
│ ├── AMtone@40MSps.bin
│ └── GALE1@40MSps.bin
├── GALE1+Chirp10MHz
│ ├── Chirp10MHz@40MSps.bin
│ └── GALE1@40MSps.bin
├── GALE1+Chirp20MHz
│ ├── Chirp20MHz@40MSps.bin
│ └── GALE1@40MSps.bin
├── GPSL1+AMtone
│ ├── AMtone@40MSps.bin
│ └── GPSL1@40MSps.bin
├── GPSL1+Chirp10MHz
│ ├── Chirp10MHz@40MSps.bin
│ └── GPSL1@40MSps.bin
├── GPSL1+Chirp20MHz
│ ├── Chirp20MHz@40MSps.bin
│ └── GPSL1@40MSps.bin
├── GPSL5+AMtone
│ ├── AMtone@40MSps.bin
│ └── GPSL5@40MSps.bin
├── GPSL5+Chirp10MHz
│ ├── Chirp10MHz@40MSps.bin
│ └── GPSL5@40MSps.bin
├── GPSL5+Chirp20MHz
│ ├── Chirp20MHz@40MSps.bin
│ └── GPSL5@40MSps.bin
└── readBinData.m
Data format:
Each file stores the received baseband samples as an array of complex, 16-bit signed integer data (range -32,768 to 32,767) of the corresponding signal. The data is stored in Big-endian, network order format, i.e. the most-significant byte occupies the lowest memory address. The real and imaginary components of the data correspond to the in-phase (I) and quadrature-phase (Q) data, respectively. I and Q-samples are interleaved [I, Q, I, Q, ...] in the array.
Data:
The recorded data of each validation scenario consists of two files, one containing the GNSS signal (L1, L5 or E1 band), the other one containing the jamming signal (tone, chirp 10 & 20 MHz bandwidth). The RF GNSS signal was generated with a GNSS signal generator, it contains only a single spreading sequence. The RF jamming signal was generated with a Vector Signal Transceiver. Both signals of a scenario were recorded synchronously with an USRP at an I\Q rate of 40 MS/s.
GNSS signals:
GNSS band PRN Elevation (deg) Azimuth (deg) Power (dBm)
GPS L1 G7 68 85 -70
GPS L5 G23 70 172 -70
Galileo E1 E15 76 229 -68.5
Jamming signals:
The data uploaded here are a truncated samples. Longer samples of data are available upon request.
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087. Abstract The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 571,831 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. Data Description The dataset consists of a total of 571,831 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th November (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files. Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022) Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022) Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022) Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022) Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022) Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022) Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022) Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022) Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022) Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022) Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022) Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022) Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset, the Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) may be used.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This objectives of this study are to evaluate the safety, tolerability, and pharmacokinetics of a single dose of nusinersen (ISIS 396443) administered intrathecally to participants with Spinal Muscular Atrophy (SMA).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Machine learning (ML) techniques have become powerful tools in both industrial and academic settings. Their ability to facilitate analysis of complex data and generation of predictive insights is transforming how scientific problems are approached across a wide range of disciplines. In this tutorial, we present a cursory introduction to three widely used ML techniqueslogistic regression, random forest, and multilayer perceptronapplied toward analyzing molecular dynamics (MD) trajectory data. We employ our chosen ML models to the study of the SARS-CoV-2 spike protein receptor binding domain interacting with the receptor ACE2. We develop a pipeline for processing MD simulation trajectory data and identifying residues that significantly impact the stability of the complex.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
5 Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78