22 datasets found

f
Data Sheet 1_Use of AI-methods over MD simulations in the sampling of...
frontiersin.figshare.com
pdf
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Souradeep Sil; Ishita Datta; Sankar Basu (2025). Data Sheet 1_Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.pdf [Dataset]. http://doi.org/10.3389/fmolb.2025.1542267.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmolb.2025.1542267.s001
Dataset updated
Apr 8, 2025
Dataset provided by
Frontiers
Authors
Souradeep Sil; Ishita Datta; Sankar Basu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Intrinsically Disordered Proteins (IDPs) challenge traditional structure-function paradigms by existing as dynamic ensembles rather than stable tertiary structures. Capturing these ensembles is critical to understanding their biological roles, yet Molecular Dynamics (MD) simulations, though accurate and widely used, are computationally expensive and struggle to sample rare, transient states. Artificial intelligence (AI) offers a transformative alternative, with deep learning (DL) enabling efficient and scalable conformational sampling. They leverage large-scale datasets to learn complex, non-linear, sequence-to-structure relationships, allowing for the modeling of conformational ensembles in IDPs without the constraints of traditional physics-based approaches. Such DL approaches have been shown to outperform MD in generating diverse ensembles with comparable accuracy. Most models rely primarily on simulated data for training and experimental data serves a critical role in validation, aligning the generated conformational ensembles with observable physical and biochemical properties. However, challenges remain, including dependence on data quality, limited interpretability, and scalability for larger proteins. Hybrid approaches combining AI and MD can bridge the gaps by integrating statistical learning with thermodynamic feasibility. Future directions include incorporating physics-based constraints and learning experimental observables into DL frameworks to refine predictions and enhance applicability. AI-driven methods hold significant promise in IDP research, offering novel insights into protein dynamics and therapeutic targeting while overcoming the limitations of traditional MD simulations.
d
Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...
catalog.data.gov
data.virginia.gov
+1more
Updated Jun 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case Raw Maryland Speed Data [Dataset]. https://catalog.data.gov/dataset/data-for-artificial-intelligence-data-centric-ai-for-transportation-work-zone-use-case-raw-14c41
Explore at:
Dataset updated
Jun 16, 2025
Dataset provided by
Federal Highway Administration
Description
Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case proposes a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms and introduces a novel deep learning model to predict the traffic speed and traffic collision likelihood during planned work zone events. This dataset is a raw sample of Maryland roadway speed data
h
champ_trainning_sample
huggingface.co
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fudan Generative AI (2024). champ_trainning_sample [Dataset]. https://huggingface.co/datasets/fudan-generative-ai/champ_trainning_sample
Explore at:
Dataset updated
May 7, 2024
Dataset authored and provided by
Fudan Generative AI
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset samples for Champ trainning

This dataset samples is used for Champ. Before trainning, you need to process the datasets by SMPL & DWPOSE methods. Refer to https://github.com/fudan-generative-vision/champ/blob/master/docs/data_process.md
h
invoices-example
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parsee.ai, invoices-example [Dataset]. https://huggingface.co/datasets/parsee-ai/invoices-example
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Parsee.ai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Inoices Sample Dataset

This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.
c
Medical Imaging Data Resource Center (MIDRC) - RSNA International COVID-19...
cancerimagingarchive.net
dicom, json and zip +2
Updated Jan 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2021). Medical Imaging Data Resource Center (MIDRC) - RSNA International COVID-19 Open Radiology Database (RICORD) Release 1c - Chest x-ray Covid+ [Dataset]. http://doi.org/10.7937/91ah-v663
Explore at:
dicom, n/a, json and zip, xlsxAvailable download formats
Unique identifier
https://doi.org/10.7937/91ah-v663
Dataset updated
Jan 15, 2021
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Jan 15, 2021
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Background
The COVID-19 pandemic is a global healthcare emergency. Prediction models for COVID-19 imaging are rapidly being developed to support medical decision making in imaging. However, inadequate availability of a diverse annotated dataset has limited the performance and generalizability of existing models.
Purpose
To create the first multi-institutional, multi-national expert annotated COVID-19 imaging dataset made freely available to the machine learning community as a research and educational resource for COVID-19 chest imaging. The Radiological Society of North America (RSNA) assembled the RSNA International COVID-19 Open Radiology Database (RICORD) collection of COVID-related imaging datasets and expert annotations to support research and education. RICORD data will be incorporated in the Medical Imaging and Data Resource Center (MIDRC), a multi-institutional research data repository funded by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health.
Materials and Methods
This dataset was created through a collaboration between the RSNA and Society of Thoracic Radiology (STR). Clinical annotation by thoracic radiology subspecialists was performed for all COVID positive chest radiography (CXR) imaging studies using a labeling schema based upon guidelines for reporting classification of COVID-19 findings in CXRs (see Review of Chest Radiograph Findings of COVID-19 Pneumonia and Suggested Reporting Language, Journal of Thoracic Imaging).
Results
The RSNA International COVID-19 Open Annotated Radiology Database (RICORD) consists of 998 chest x-rays from 361 patients at four international sites annotated with diagnostic labels.
Patient Selection: Patients at least 18 years in age receiving positive diagnosis for COVID-19.
Data Abstract
998 Chest x-ray examinations from 361 patients.
Annotations with labels:
Classification
Typical Appearance
Multifocal bilateral, peripheral opacities, and/or Opacities with rounded morphology
Lower lung-predominant distribution (Required Feature - must be present with either or both of the first two opacity patterns)
Indeterminate Appearance
Absence of typical findings AND Unilateral, central or upper lung predominant distribution of airspace disease
Atypical Appearance
Pneumothorax or pleural effusion, Pulmonary Edema, Lobar Consolidation, Solitary lung nodule or mass, Diffuse tiny nodules, Cavity
Negative for Pneumonia
No lung opacities
Airspace Disease Grading
Lungs are divided on frontal chest xray into 3 zones per lung (6 zones total). The upper zone extends from the apices to the superior hilum. The mid zone spans between the superior and inferior hilar margins. The lower zone extends from the inferior hilar margins to the costophrenic sulci.
Mild - Required if not negative for pneumonia
Opacities in 1-2 lung zones
Moderate - Required if not negative for pneumonia
Opacities in 3-4 lung zones
Severe - Required if not negative for pneumonia
Opacities in >4 lung zones
Supporting clinical variables: MRN*, Age, Study Date*, Exam Description, Sex, Study UID*, Image Count, Modality, Testing Result, Specimen Source (* pseudonymous values).
How to use the JSON annotations
More information about how the JSON annotations are organized can be found on https://docs.md.ai/data/json/. Steps 2 & 3 in this example code demonstrate how to to load the JSON into a Dataframe. The JSON file can be downloaded via the data access table below; it is not available via MD.ai. This Jupyter Notebook may also be helpful.
Research Benefits
RICORD is available for non-commercial use (and further enrichment) by the research and education communities which may include development of educational resources for COVID-19, use of RICORD to create AI systems for diagnosis and quantification, benchmarking performance for existing solutions, exploration of distributed/federated learning, further annotation or data augmentation efforts, and evaluation of the examinations for disease entities beyond COVID-19 pneumonia. Deliberate consideration of the detailed annotation schema, demographics, and other included meta-data will be critical when generating cohorts with RICORD, particularly as more public COVID-19 imaging datasets are made available via complementary and parallel efforts. It is important to emphasize that there are limitations to the clinical “ground truth” as the SARS-CoV-2 RT-PCR tests have widely documented limitations and are subject to both false-negative and false-positive results which impact the distribution of the included imaging data, and may have led to an unknown epidemiologic distortion of patients based on the inclusion criteria. These limitations notwithstanding, RICORD has achieved the stated objectives for data complexity, heterogeneity, and high-quality expert annotations as a comprehensive COVID-19 thoracic imaging data resource.
c
MD iMAP: Maryland Stream Health - Stream Reaches
s.cnmilf.com
opendata.maryland.gov
+3more
Updated May 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
opendata.maryland.gov (2025). MD iMAP: Maryland Stream Health - Stream Reaches [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/md-imap-maryland-stream-health-stream-reaches
Explore at:
Dataset updated
May 10, 2025
Dataset provided by
opendata.maryland.gov
Area covered
Maryland
Description
This is a MD iMAP hosted service layer. Find more information at http://imap.maryland.gov. The stream reaches are color coded according to the Combined Index of Biotic Integrity (CIBI) scores of the MBSS sites. The CIBI is the average of the fish IBI (FIBI) and the benthic IBI (BIBI). Anything below a CIBI score of 3.0 is red (Poor) - 3.0-3.9 is yellow (Fair) - and 4.0-5.0 is green (Good). The Maryland Biological Stream Survey (MBSS) was Maryland's first probability-based or random design stream sampling program intended to provide unbiased estimates of stream conditions with known precision at various spatial scales ranging from large 6-digit river basins and medium-sized 8-digit watersheds to the entire state. Last Updated: 06/2010 Feature Service Layer Link: https://mdgeodata.md.gov/imap/rest/services/Hydrology/MD_StreamHealth/FeatureServer ADDITIONAL LICENSE TERMS: The Spatial Data and the information therein (collectively "the Data") is provided "as is" without warranty of any kind either expressed implied or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct indirect incidental consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.
m
The Impact of AI and ChatGPT on Bangladeshi University Students
data.mendeley.com
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Jhirul Islam (2025). The Impact of AI and ChatGPT on Bangladeshi University Students [Dataset]. http://doi.org/10.17632/zykphpvbr7.2
Explore at:
Unique identifier
https://doi.org/10.17632/zykphpvbr7.2
Dataset updated
Jan 6, 2025
Authors
Md Jhirul Islam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
The data set records the perceptions of Bangladeshi university students on the influence that AI tools, especially ChatGPT, have on their academic practices, learning experiences, and problem-solving abilities. The varying role of AI in education, which covers common usage statistics, what AI does to our creative abilities, its impact on our learning, and whether it could invade our privacy. This dataset reveals perspective on how AI tools are changing education in the country and offering valuable information for researchers, educators, policymakers, to understand trends, challenges, and opportunities in the adoption of AI in the academic contex.

Methodology Data Collection Method: Online survey using google from Participants: A total of 3,512 students from various Bangladeshi universities participated. Survey Questions:The survey included questions on demographic information, frequency of AI tool usage, perceived benefits, concerns regarding privacy, and impacts on creativity and learning.

Sampling Technique: Random sampling of university students Data Collection Period: June 2024 to December 2024

Privacy Compliance This dataset has been anonymized to remove any personally identifiable information (PII). It adheres to relevant privacy regulations to ensure the confidentiality of participants.

For further inquiries, please contact: Name: Md Jhirul Islam, Daffodil International University Email: jhirul15-4063@diu.edu.bd Phone: 01316317573
A
‘Dashboard Template - Maryland Energy Administration’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Dashboard Template - Maryland Energy Administration’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-dashboard-template-maryland-energy-administration-dbd7/62e2bb22/?iid=001-671&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Maryland
Description
Analysis of ‘Dashboard Template - Maryland Energy Administration’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/702a5047-8470-4a53-a1e7-d629756d95c1 on 26 January 2022.

--- Dataset description provided by original source is as follows ---

This dataset includes data from the Maryland Energy Administration (MEA) about statewide energy consumption, energy savings programs, renewable energy, and electric and hybrid vehicles

--- Original source retains full ownership of the source dataset ---
m
PMRAM: Bangladeshi Brain Cancer - MRI Dataset
data.mendeley.com
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prottoy Md Shahriar Mannan (2024). PMRAM: Bangladeshi Brain Cancer - MRI Dataset [Dataset]. http://doi.org/10.17632/m7w55sw88b.1
Explore at:
Unique identifier
https://doi.org/10.17632/m7w55sw88b.1
Dataset updated
Dec 19, 2024
Authors
Prottoy Md Shahriar Mannan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This Bangladeshi Brain Cancer MRI Dataset is a large dataset of Magnetic Resonance Imaging (MRI) images created to aid researchers in medical diagnosis, especially for brain cancer research. This collection contains a total of 1600 raw photos (every class have 400 raw images) after augmentation it contains total 6000 images, which are wisely divided into four main categories as:

Glioma -1500 images

Meningioma -1500 images

Pituitary-1500 images

No Tumor-1500 images

All the images in this dataset were collected from different hospitals around Bangladesh. It brought diversity and representation into the sample. To make the images compatible with various image processing, machine learning and deep-learning pipelines as possible they are then resized to a standardize size of 512×512.

This dataset is incredibly significant since high-quality data, such as medical imaging data, are few and difficult to obtain, particularly in the context of brain cancer. Assume that four prominent doctors collaborate on data collection in order to give more accurate and helpful content. It made it feasible. The cooperation emphasizes the dataset's potential to improve medical practice today by providing a dependable supply of diagnoses for use in diagnostic tool creation and testing within current medicine.

This dataset can be used by researchers and practitioners for a variety of applications such as Dense net 201, yolov8x/s, CNN, resnet50v2, VGG-16, MobilenetV2 etc.

Image Processing Details:

Images are randomly rotated within a range of 45 degrees. (rotation range=45)

Images are horizontally shifted by up to 20% of the width of the image. (width_shift_range=0.2)

Images are vertically shifted by up to 20% of the height of the image. (height_shift_range=0.2)

Shear transformation is applied to the image within a range of 20%. (shear range=0.2)

Images are randomly zoomed in or out by up to 20%. (zoom range=0.2)

Images are randomly flipped horizontally. (horizontal flip=True)

When transformations like rotations or shifts leave empty areas in the image, they are filled in by the nearest pixel values. (fill mode='nearest')

Hospital List(for Data Collection):

Ibn Sina Medical College, Kollanpur, 1, 1-B Mirpur Rd, Dhaka 1207

Dhaka Medical College & Hospital, Secretariat Rd, Dhaka 1000

Cumilla Medical College, Kuchaitoli, Dr. Akhtar Hameed Khan Road, Cumilla 3500, Bangladesh

Supervisor & investigator:

Md. Mizanur Rahman

Lecturer,

Computer Science and Engineering

Daffodil International University

Dhaka, Bangladesh

mizanurrahman.cse@diu.edu.bd

Data Collectors:

Md Shahriar Mannan Prottoy

Mahtab Chowdhury

Redwan Rahman

Azim Ullah Tamim
d
Development of an AI/ML-ready knee ultrasound dataset in a population-based...
dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nelson, Amanda (2023). Development of an AI/ML-ready knee ultrasound dataset in a population-based cohort [Dataset]. http://doi.org/10.7910/DVN/SKP9IB
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKP9IB
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Nelson, Amanda
Description
About this data An ultrasound dataset to use in the discovery of ultrasound features associated with pain and radiographic change in KOA is highly innovative and will be a major step forward for the field. These ultrasound images originate from the diverse and inclusive population-based Johnston County Health Study (JoCoHS). This dataset is designed to adhere to FAIR principles and was funded in part by an Administrative Supplement to Improve the AI/ML-Readiness of NIH-Supported Data (3R01AR077060-03S1). Working with this dataset WorkingWithTheDataset.ipynb Jupyter notebook If you are familiar with working with Jupyter notebooks, we recommend using the WorkingWithTheDataset.ipynb Jupyter notebook to retrieve, validate, and learn more about the dataset. You should downloading the latest WorkingWithTheDataset.ipynb file and uploading it to an online Jupyter environment such as https://colab.research.google.com or use the notebook in your Jupyter environment of choice. You will also need to download the CONFIGURATION_SETTINGS.template.md file from this dataset since the contents are used to configure the Jupyter notebook. Note: at the time of this writing, we do not recommend using Binder (mybinder.org) if you are interested in only reviewing the WorkingWithTheDataset.ipynb notebook. When Binder loads the dataset, it will download all files from this dataset, resulting in a long build time. However, if you plan to work with all files in the dataset then Binder might work for you. We do not offer support for this service or other Jupyter Lab environments. Metadata The DatasetMetadata.json file contains general information about the files and variables within this dataset. We use it as our validation metadata to verify the data we are importing into this Dataverse dataset. This file is also the most comprehensive with regards to the dataset metadata. Data collection in progress This dataset is not complete and will be updated regularly as additional data is collected.
f
Two Hypothetical Examples of the Probability of Needing Referral for...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hui-Fen Mao; Ling-Hui Chang; Athena Yi-Jung Tsai; Wen-Ni Huang; Jye Wang (2023). Two Hypothetical Examples of the Probability of Needing Referral for Community-Based OT Based on the IADL Model of Referral Protocol. [Dataset]. http://doi.org/10.1371/journal.pone.0148414.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0148414.t004
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Hui-Fen Mao; Ling-Hui Chang; Athena Yi-Jung Tsai; Wen-Ni Huang; Jye Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Two Hypothetical Examples of the Probability of Needing Referral for Community-Based OT Based on the IADL Model of Referral Protocol.
Simulation data and code for "Optimal Rejection-Free Path Sampling"
zenodo.org
bin, zip
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gianmarco Lazzeri; Gianmarco Lazzeri (2025). Simulation data and code for "Optimal Rejection-Free Path Sampling" [Dataset]. http://doi.org/10.5281/zenodo.14922168
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14922168
Dataset updated
Mar 25, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gianmarco Lazzeri; Gianmarco Lazzeri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2025
Description
This repository contains the main data of the paper "Optimal Rejection-Free Path Sampling," and the source code for generating/appending the independent RFPS-AIMMD and AIMMD runs.

Due to size constraints, the data has been split into separate repositories. The following repositories contain the trajectory files generated by the runs:

all the WQ runs: 10.5281/zenodo.14830317
chignolin, fps0: 10.5281/zenodo.14826023
chignolin, fps1: 10.5281/zenodo.14830200
chignolin, fps2: 10.5281/zenodo.14830224
chignolin, tps0: 10.5281/zenodo.14830251
chignolin, tps1: 10.5281/zenodo.14830270
chignolin, tps2: 10.5281/zenodo.14830280

The trajectory files are not required for running the main analysis, as all necessary information for machine learning and path reweighting is contained in the "PatEnsemble" object files stored in this repository. However, these trajectories are essential for projecting the path ensemble estimate onto an arbitrary set of collective variables.

To reconstruct the full dataset, please merge all the data folders you find in the supplemental repositories.

Data structure and content

analysis (code for analyzing the data and generating the figures of the
| paper)
|- figures.ipynb (Jupyter notebook for the analysis)
|- figures (the figures created by the Jupyter notebook)
|- ...

data (all the AIMMD and reference runs, plus general info about the
| simulated systems)
|- chignolin
|- *.py: (code for generating/appending AIMMD runs on a Workstation or
| HPC cluster via Slurm; see the "src" folder below)
|- run.gro (full system positions in the native conformation)
|- mol.pdb (only the peptide positions in the native conformation)
|- topol.top (the system's topology for the GROMACS MD engine)
|- charmmm22star.ff (force field parameter files)
|- run.mdp (GROMACS MD parameters when appending a simulation)
|- randomvelocities.mdp (GROMACS MD parameters when initializing a
| simulation with random velocities)
|- signature.npy, r0.npy (parameters for defining the fraction of native
| contacts involved in the folded/unfolded states
| definition; used by params.py function
| "states_function")
|- dmax.npy, dmin.npy (parameters for defining the feature representation
| of the AIMMD NN model; used by params.py
| function "descriptors_function")
|- equilibrium (reference long equilibrium trajectory files; only the
| peptide positions are saved!)
|- run0.xtc, ..., run3.xtc
|- validation
|- validation.xtc (the validation SPs all together in an XTC file)
|- validation.npy (for each SP, collects the cumulative shooting results
after 10 two-way shooting simulations)
|- fps0 (the first AIMMD-RFPS independent run)
|- equilibriumA (the free simulations around A, already processed
| in PathEnsemble files)
|- traj000001.h5
|- traj000001.tpr (for running the simulation; in that case, please
| retrieve all the trajectory files in the right
| supplemental repository first)
|- traj000001.cpt (for appending the simulation; in that case, please
| retrieve all the trajectory files in the right
| supplemental repository first)
|- traj000002.h5 (in case of re-initialization)
|- ...
|- equilibriumB (the free simulations around B, ...)
|- ...
|- shots0
|- chain.h5 (the path sampling chain)
|- pool.h5 (the selection pool, containing the frames from which
| shooting points are currently selected from)
|- params.py (file containing the states and descriptors definitions,
| the NN fit function, and the AIMMD runs hyperparameters;
| if can be modified to allow for RFPS-AIMMD or the original
| algorithm AIMMD runs)
|- initial.trr (the initial transition for path sampling)
|- manager.log (reports info about the run)
|- network

src (code for generating/appending AIMMD runs on a Workstation or HPC
| cluster via Slurm)
|- generate.py (on a Workstation: initializes the processes; on an HPC
| cluster: creates the sh file for submitting a job)
|- slurm_options.py (to customize and use in case of running on HPC)
|- manager.py (controls SP selection; reweights the paths)
|- shooter.py (performs path sampling simulations)
|- equilibrium.py (performs free simulations)
|- pathensemble.py (code of the PathEnsemble class)
|- utils.py (auxiliary functions for data production and analysis)

Running/appending AIMMD runs

* To initialize a new RFPS-AIMMD (or AIMMD) run for the systems of this paper:

1. Create a "run directory" folder (same depth as "fps0")

2. Copy "initial.trr" and "params.py" from another AIMMD run folder. It is possible to change "params.py" to customize the run.

3. (On a Workstation) call:

python generate.py

where nsteps is the final number of path sampling steps for the run, n the number of independent path sampling chains, nA the number of independent free simulators around A, and nB that of free simulators around B.

4. (On a HPC cluster) call:

python generate.py
sbatch .

* To append to an existing RFPS-AIMMD or AIMMD run

1. Merge the supplemental repository with the trajectory files into this one.

2. Just call again (on a Workstation)

python generate.py

or (on a HPC cluster)

sbatch .

after updating the "nsteps" parameters.

* To run enhanced sampling for a new system: please keep the data structure as close as possible to the original. Different names for the files can generate incompatibilities. We are currently trying to make it easier.

Reproducing the analysis

Run the analysis/figures.ipynb notebook. Some groups of cells have to be run multiple times after changing the parameters in the preamble.
g
Grok 3 personas - AI Prompt Template
godtierprompts.com
jsonld
Updated Jun 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xiaolongbao (2025). Grok 3 personas - AI Prompt Template [Dataset]. https://www.godtierprompts.com/prompt/65c935f4-4eaa-473d-9dc5-ff28168b5452
Explore at:
jsonldAvailable download formats
Dataset updated
Jun 26, 2025
Authors
xiaolongbao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Quality Score
Description
A curated prompt template for AI language models: The system prompts for all the different personas used in grok 3

h/t: https://github.com/asgeirtj/system_prompts_leaks/blob/main/X.ai/grok-personas.md
g
Gemini 2.5 pro system prompt - AI Prompt Template
godtierprompts.com
jsonld
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2025). Gemini 2.5 pro system prompt - AI Prompt Template [Dataset]. https://www.godtierprompts.com/prompt/757942bb-dc48-40ce-aa83-8274f8a517a9
Explore at:
jsonldAvailable download formats
Dataset updated
Jun 30, 2025
Authors
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Quality Score
Description
A curated prompt template for AI language models: System prompts for gemini 2.5 pro

h/t: https://github.com/asgeirtj/system_prompts_leaks/blob/main/Google/gemini-2.5-pro-webapp.md, https://g.co/gemini/share/7390bd8330ef
d
Fish communities in PA and MD Piedmont mixed agricultural streams, 2023
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Fish communities in PA and MD Piedmont mixed agricultural streams, 2023 [Dataset]. https://catalog.data.gov/dataset/fish-communities-in-pa-and-md-piedmont-mixed-agricultural-streams-2023
Explore at:
Dataset updated
Jul 20, 2024
Dataset provided by
U.S. Geological Survey
Description
From 2-27 June, 2023, a Virginia Tech team of 5 sampled the fish community in 30 Piedmont streams (lower Susquehanna River and upper Chesapeake Bay tributaries, Pennsylvania and Maryland, USA) spanning a gradient of agricultural intensity as part of a larger stream-health study including other teams who surveyed geomorphology, water quality, flow, temperature, and macroinvertebrates at the same 30 streams. Upstream drainage area of these 30 streams ranged from approximately 10 to 50 sq. km, and width from 2 to 10 m. At each stream, we sampled fish from two reaches using two-pass backpack electrofishing and seining. Reach A was the main reach surveyed by all the interdisciplinary teams; Reach B was surveyed for fish communities only. Reach length was 20 wetted channel widths, capped at 150 meters. Fish were identified to species, assigned to a length class, inspected for anomalies, and released. Other measurements included water temperature, electrical conductivity, pH, dissolved oxygen, and clarity, and streambed area comprising pool, riffle, and run habitats.
h
nextjs-standard
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RELAI, nextjs-standard [Dataset]. https://huggingface.co/datasets/relai-ai/nextjs-standard
Explore at:
Dataset authored and provided by
RELAI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: Next.js Documentation Data Source Link: https://nextjs.org/docs Data Source License: https://github.com/vercel/next.js/blob/canary/license.md Data Source Authors: Vercel AI Benchmarks by Data Agents © 2025 RELAI.AI · Licensed under CC BY 4.0. Source: https://relai.ai
SDDF Energy Dataset
zenodo.org
application/gzip, bin +2
Updated May 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vahagn Altunyan; Vahagn Altunyan; Tsolak Ghukasyan; Tsolak Ghukasyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garegin Papoian; Garegin Papoian; Garik Petrosyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garik Petrosyan (2025). SDDF Energy Dataset [Dataset]. http://doi.org/10.5281/zenodo.14008357
Explore at:
bin, application/gzip, tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14008357
Dataset updated
May 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vahagn Altunyan; Vahagn Altunyan; Tsolak Ghukasyan; Tsolak Ghukasyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garegin Papoian; Garegin Papoian; Garik Petrosyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garik Petrosyan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This conformational energy dataset, developed as part of the Smart Distributed Data Factory (SDDF) project, contains over 2.17 million molecular conformations based on drug-like molecules sourced from the ENAMINE database. Energies were calculated using DFT with the ωB97x density functional and the 6–31G(d) basis set. The conformations were generated from SMILES using RDKit, MMFF94 optimization, and molecular dynamics (MD) simulations, providing a diverse set of molecular structures and energy states.

RDKit Conformations: 535,338

RDKit + MMFF94 Optimized: 1,151,936

MD-Generated: 483,279

This dataset serves as a benchmark for energy prediction models, with training (638,617 examples), validation (134,732 examples), and test subsets (24,890 examples) created using a strict scaffold-based split to ensure no overlap and less than 70% similarity between the training and test sets.

Dataset contents:

data.tar.gz: contains the conformations in Structured Data File format, grouped into separate folders based on the molecule ID. Each conformation's label is provided within its SDF file as a property named "energy".

INDEX.smi: specifies the molecule IDs and their corresponding SMILES.

SOURCES.csv: specifies the conformation generation method for each conformation.

SDDF_train.tsv, SDDF_validation.tsv, and SDDF_test.tsv specify the molecule IDs and conformations for each subset of the benchmark.

A detailed description is provided in the accompanying paper.
h
svelte-reasoning
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RELAI, svelte-reasoning [Dataset]. https://huggingface.co/datasets/relai-ai/svelte-reasoning
Explore at:
Dataset authored and provided by
RELAI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: Svelte Documentation Data Source Link: https://svelte.dev/docs Data Source License: https://github.com/sveltejs/svelte/blob/main/LICENSE.md Data Source Authors: Svelte Contributors AI Benchmarks by Data Agents © 2025 RELAI.AI · Licensed under CC BY 4.0. Source: https://relai.ai
Total active physicians in the U.S. 2025, by state
statista.com
ai-chatbox.pro
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Total active physicians in the U.S. 2025, by state [Dataset]. https://www.statista.com/statistics/186269/total-active-physicians-in-the-us/
Explore at:
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Apr 2025
Area covered
United States
Description
The number of physicians across the United States reveals significant variations, with California leading the pack at nearly ******* active doctors as of April 2025. This concentration of medical professionals in populous states highlights the ongoing challenge of ensuring adequate healthcare access nationwide. The stark contrast between California's physician count and Wyoming's mere ***** doctors underscores the need for targeted efforts to address healthcare workforce shortages in less populated areas. Primary care and specialist distribution California leads also in both primary care physicians and specialists, accounting for over ** percent of each category nationally. This concentration of medical expertise in California reflects broader trends, with New York and Texas following as the states with the highest numbers of active primary care physicians. The distribution of specialists also mirrors national patterns, with psychiatry, surgery, and anaesthesiology among the most common specialties. Physician burnout While the number of physicians continues to grow, physician burnout remains a significant issue. There are large variations in rates of burnout depending on a physician's gender and specialty. For example, burnout is disproportionally high among women, affecting ** percent of female physicians and ** percent of male physicians. Meanwhile, emergency medicine physicians reported the highest levels of burnout among specialists, highlighting the need for targeted interventions to support the individual needs of doctors depending on their different circumstances.
f
The relative root mean square error (RRMSE) and relative bias (RBIAS) with...
figshare.com
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Nabiul Islam Khan; Renske Hijbeek; Uta Berger; Nico Koedam; Uwe Grueters; S. M. Zahirul Islam; Md Asadul Hasan; Farid Dahdouh-Guebas (2023). The relative root mean square error (RRMSE) and relative bias (RBIAS) with “aggregated” spatial pattern, varying aggregation radius (AR), varying aggregation intensity (AI) and a fixed true density of 3000 ha-1. [Dataset]. http://doi.org/10.1371/journal.pone.0157985.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0157985.t004
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Md Nabiul Islam Khan; Renske Hijbeek; Uta Berger; Nico Koedam; Uwe Grueters; S. M. Zahirul Islam; Md Asadul Hasan; Farid Dahdouh-Guebas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The relative root mean square error (RRMSE) and relative bias (RBIAS) with “aggregated” spatial pattern, varying aggregation radius (AR), varying aggregation intensity (AI) and a fixed true density of 3000 ha-1.

Facebook

Twitter

Click to copy link

Link copied

Cite

Souradeep Sil; Ishita Datta; Sankar Basu (2025). Data Sheet 1_Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.pdf [Dataset]. http://doi.org/10.3389/fmolb.2025.1542267.s001

Data Sheet 1_Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fmolb.2025.1542267.s001

Dataset updated

Apr 8, 2025

Dataset provided by

Frontiers

Authors

Souradeep Sil; Ishita Datta; Sankar Basu

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Intrinsically Disordered Proteins (IDPs) challenge traditional structure-function paradigms by existing as dynamic ensembles rather than stable tertiary structures. Capturing these ensembles is critical to understanding their biological roles, yet Molecular Dynamics (MD) simulations, though accurate and widely used, are computationally expensive and struggle to sample rare, transient states. Artificial intelligence (AI) offers a transformative alternative, with deep learning (DL) enabling efficient and scalable conformational sampling. They leverage large-scale datasets to learn complex, non-linear, sequence-to-structure relationships, allowing for the modeling of conformational ensembles in IDPs without the constraints of traditional physics-based approaches. Such DL approaches have been shown to outperform MD in generating diverse ensembles with comparable accuracy. Most models rely primarily on simulated data for training and experimental data serves a critical role in validation, aligning the generated conformational ensembles with observable physical and biochemical properties. However, challenges remain, including dependence on data quality, limited interpretability, and scalability for larger proteins. Hybrid approaches combining AI and MD can bridge the gaps by integrating statistical learning with thermodynamic feasibility. Future directions include incorporating physics-based constraints and learning experimental observables into DL frameworks to refine predictions and enhance applicability. AI-driven methods hold significant promise in IDP research, offering novel insights into protein dynamics and therapeutic targeting while overcoming the limitations of traditional MD simulations.

Clear search

Close search

Google apps

Main menu

Data Sheet 1_Use of AI-methods over MD simulations in the sampling of...

Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...

champ_trainning_sample

invoices-example

Medical Imaging Data Resource Center (MIDRC) - RSNA International COVID-19...

Background

Purpose

Materials and Methods

Results

Data Abstract

Research Benefits

MD iMAP: Maryland Stream Health - Stream Reaches

The Impact of AI and ChatGPT on Bangladeshi University Students

‘Dashboard Template - Maryland Energy Administration’ analyzed by Analyst-2

PMRAM: Bangladeshi Brain Cancer - MRI Dataset

Development of an AI/ML-ready knee ultrasound dataset in a population-based...

Two Hypothetical Examples of the Probability of Needing Referral for...

Simulation data and code for "Optimal Rejection-Free Path Sampling"

Data structure and content

Running/appending AIMMD runs

Reproducing the analysis

Grok 3 personas - AI Prompt Template

Gemini 2.5 pro system prompt - AI Prompt Template

Fish communities in PA and MD Piedmont mixed agricultural streams, 2023

nextjs-standard

SDDF Energy Dataset

svelte-reasoning

Total active physicians in the U.S. 2025, by state

The relative root mean square error (RRMSE) and relative bias (RBIAS) with...

Data Sheet 1_Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.pdf