22 datasets found
  1. f

    Data Sheet 1_Use of AI-methods over MD simulations in the sampling of...

    • frontiersin.figshare.com
    pdf
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Souradeep Sil; Ishita Datta; Sankar Basu (2025). Data Sheet 1_Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.pdf [Dataset]. http://doi.org/10.3389/fmolb.2025.1542267.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Frontiers
    Authors
    Souradeep Sil; Ishita Datta; Sankar Basu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Intrinsically Disordered Proteins (IDPs) challenge traditional structure-function paradigms by existing as dynamic ensembles rather than stable tertiary structures. Capturing these ensembles is critical to understanding their biological roles, yet Molecular Dynamics (MD) simulations, though accurate and widely used, are computationally expensive and struggle to sample rare, transient states. Artificial intelligence (AI) offers a transformative alternative, with deep learning (DL) enabling efficient and scalable conformational sampling. They leverage large-scale datasets to learn complex, non-linear, sequence-to-structure relationships, allowing for the modeling of conformational ensembles in IDPs without the constraints of traditional physics-based approaches. Such DL approaches have been shown to outperform MD in generating diverse ensembles with comparable accuracy. Most models rely primarily on simulated data for training and experimental data serves a critical role in validation, aligning the generated conformational ensembles with observable physical and biochemical properties. However, challenges remain, including dependence on data quality, limited interpretability, and scalability for larger proteins. Hybrid approaches combining AI and MD can bridge the gaps by integrating statistical learning with thermodynamic feasibility. Future directions include incorporating physics-based constraints and learning experimental observables into DL frameworks to refine predictions and enhance applicability. AI-driven methods hold significant promise in IDP research, offering novel insights into protein dynamics and therapeutic targeting while overcoming the limitations of traditional MD simulations.

  2. d

    Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2025). Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case Raw Maryland Speed Data [Dataset]. https://catalog.data.gov/dataset/data-for-artificial-intelligence-data-centric-ai-for-transportation-work-zone-use-case-raw-14c41
    Explore at:
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    Federal Highway Administration
    Description

    Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case proposes a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms and introduces a novel deep learning model to predict the traffic speed and traffic collision likelihood during planned work zone events. This dataset is a raw sample of Maryland roadway speed data

  3. h

    champ_trainning_sample

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fudan Generative AI (2024). champ_trainning_sample [Dataset]. https://huggingface.co/datasets/fudan-generative-ai/champ_trainning_sample
    Explore at:
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    Fudan Generative AI
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset samples for Champ trainning

    This dataset samples is used for Champ. Before trainning, you need to process the datasets by SMPL & DWPOSE methods. Refer to https://github.com/fudan-generative-vision/champ/blob/master/docs/data_process.md

  4. h

    invoices-example

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parsee.ai, invoices-example [Dataset]. https://huggingface.co/datasets/parsee-ai/invoices-example
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Parsee.ai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Inoices Sample Dataset

    This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.

  5. c

    Medical Imaging Data Resource Center (MIDRC) - RSNA International COVID-19...

    • cancerimagingarchive.net
    dicom, json and zip +2
    Updated Jan 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2021). Medical Imaging Data Resource Center (MIDRC) - RSNA International COVID-19 Open Radiology Database (RICORD) Release 1c - Chest x-ray Covid+ [Dataset]. http://doi.org/10.7937/91ah-v663
    Explore at:
    dicom, n/a, json and zip, xlsxAvailable download formats
    Dataset updated
    Jan 15, 2021
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Jan 15, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Background

    The COVID-19 pandemic is a global healthcare emergency. Prediction models for COVID-19 imaging are rapidly being developed to support medical decision making in imaging. However, inadequate availability of a diverse annotated dataset has limited the performance and generalizability of existing models.

    Purpose

    To create the first multi-institutional, multi-national expert annotated COVID-19 imaging dataset made freely available to the machine learning community as a research and educational resource for COVID-19 chest imaging. The Radiological Society of North America (RSNA) assembled the RSNA International COVID-19 Open Radiology Database (RICORD) collection of COVID-related imaging datasets and expert annotations to support research and education. RICORD data will be incorporated in the Medical Imaging and Data Resource Center (MIDRC), a multi-institutional research data repository funded by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health.

    Materials and Methods

    This dataset was created through a collaboration between the RSNA and Society of Thoracic Radiology (STR). Clinical annotation by thoracic radiology subspecialists was performed for all COVID positive chest radiography (CXR) imaging studies using a labeling schema based upon guidelines for reporting classification of COVID-19 findings in CXRs (see Review of Chest Radiograph Findings of COVID-19 Pneumonia and Suggested Reporting Language, Journal of Thoracic Imaging).

    Results

    The RSNA International COVID-19 Open Annotated Radiology Database (RICORD) consists of 998 chest x-rays from 361 patients at four international sites annotated with diagnostic labels.

    Patient Selection: Patients at least 18 years in age receiving positive diagnosis for COVID-19.

    Data Abstract

    1. 998 Chest x-ray examinations from 361 patients.

    2. Annotations with labels:

      1. Classification

        • Typical Appearance
          Multifocal bilateral, peripheral opacities, and/or Opacities with rounded morphology
          Lower lung-predominant distribution (Required Feature - must be present with either or both of the first two opacity patterns)

        • Indeterminate Appearance
          Absence of typical findings AND Unilateral, central or upper lung predominant distribution of airspace disease

        • Atypical Appearance
          Pneumothorax or pleural effusion, Pulmonary Edema, Lobar Consolidation, Solitary lung nodule or mass, Diffuse tiny nodules, Cavity
        • Negative for Pneumonia
          No lung opacities

      2. Airspace Disease Grading
        Lungs are divided on frontal chest xray into 3 zones per lung (6 zones total). The upper zone extends from the apices to the superior hilum. The mid zone spans between the superior and inferior hilar margins. The lower zone extends from the inferior hilar margins to the costophrenic sulci.

        • Mild - Required if not negative for pneumonia
          Opacities in 1-2 lung zones

        • Moderate - Required if not negative for pneumonia
          Opacities in 3-4 lung zones

        • Severe - Required if not negative for pneumonia
          Opacities in >4 lung zones

    3. Supporting clinical variables: MRN*, Age, Study Date*, Exam Description, Sex, Study UID*, Image Count, Modality, Testing Result, Specimen Source (* pseudonymous values).

    How to use the JSON annotations

    More information about how the JSON annotations are organized can be found on https://docs.md.ai/data/json/. Steps 2 & 3 in this example code demonstrate how to to load the JSON into a Dataframe. The JSON file can be downloaded via the data access table below; it is not available via MD.ai. This Jupyter Notebook may also be helpful.

    Research Benefits

    RICORD is available for non-commercial use (and further enrichment) by the research and education communities which may include development of educational resources for COVID-19, use of RICORD to create AI systems for diagnosis and quantification, benchmarking performance for existing solutions, exploration of distributed/federated learning, further annotation or data augmentation efforts, and evaluation of the examinations for disease entities beyond COVID-19 pneumonia. Deliberate consideration of the detailed annotation schema, demographics, and other included meta-data will be critical when generating cohorts with RICORD, particularly as more public COVID-19 imaging datasets are made available via complementary and parallel efforts. It is important to emphasize that there are limitations to the clinical “ground truth” as the SARS-CoV-2 RT-PCR tests have widely documented limitations and are subject to both false-negative and false-positive results which impact the distribution of the included imaging data, and may have led to an unknown epidemiologic distortion of patients based on the inclusion criteria. These limitations notwithstanding, RICORD has achieved the stated objectives for data complexity, heterogeneity, and high-quality expert annotations as a comprehensive COVID-19 thoracic imaging data resource.

  6. c

    MD iMAP: Maryland Stream Health - Stream Reaches

    • s.cnmilf.com
    • opendata.maryland.gov
    • +3more
    Updated May 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2025). MD iMAP: Maryland Stream Health - Stream Reaches [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/md-imap-maryland-stream-health-stream-reaches
    Explore at:
    Dataset updated
    May 10, 2025
    Dataset provided by
    opendata.maryland.gov
    Area covered
    Maryland
    Description

    This is a MD iMAP hosted service layer. Find more information at http://imap.maryland.gov. The stream reaches are color coded according to the Combined Index of Biotic Integrity (CIBI) scores of the MBSS sites. The CIBI is the average of the fish IBI (FIBI) and the benthic IBI (BIBI). Anything below a CIBI score of 3.0 is red (Poor) - 3.0-3.9 is yellow (Fair) - and 4.0-5.0 is green (Good). The Maryland Biological Stream Survey (MBSS) was Maryland's first probability-based or random design stream sampling program intended to provide unbiased estimates of stream conditions with known precision at various spatial scales ranging from large 6-digit river basins and medium-sized 8-digit watersheds to the entire state. Last Updated: 06/2010 Feature Service Layer Link: https://mdgeodata.md.gov/imap/rest/services/Hydrology/MD_StreamHealth/FeatureServer ADDITIONAL LICENSE TERMS: The Spatial Data and the information therein (collectively "the Data") is provided "as is" without warranty of any kind either expressed implied or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct indirect incidental consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.

  7. m

    The Impact of AI and ChatGPT on Bangladeshi University Students

    • data.mendeley.com
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Jhirul Islam (2025). The Impact of AI and ChatGPT on Bangladeshi University Students [Dataset]. http://doi.org/10.17632/zykphpvbr7.2
    Explore at:
    Dataset updated
    Jan 6, 2025
    Authors
    Md Jhirul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    The data set records the perceptions of Bangladeshi university students on the influence that AI tools, especially ChatGPT, have on their academic practices, learning experiences, and problem-solving abilities. The varying role of AI in education, which covers common usage statistics, what AI does to our creative abilities, its impact on our learning, and whether it could invade our privacy. This dataset reveals perspective on how AI tools are changing education in the country and offering valuable information for researchers, educators, policymakers, to understand trends, challenges, and opportunities in the adoption of AI in the academic contex.

    Methodology Data Collection Method: Online survey using google from Participants: A total of 3,512 students from various Bangladeshi universities participated. Survey Questions:The survey included questions on demographic information, frequency of AI tool usage, perceived benefits, concerns regarding privacy, and impacts on creativity and learning.

    Sampling Technique: Random sampling of university students Data Collection Period: June 2024 to December 2024

    Privacy Compliance This dataset has been anonymized to remove any personally identifiable information (PII). It adheres to relevant privacy regulations to ensure the confidentiality of participants.

    For further inquiries, please contact: Name: Md Jhirul Islam, Daffodil International University Email: jhirul15-4063@diu.edu.bd Phone: 01316317573

  8. A

    ‘Dashboard Template - Maryland Energy Administration’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Dashboard Template - Maryland Energy Administration’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-dashboard-template-maryland-energy-administration-dbd7/62e2bb22/?iid=001-671&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Maryland
    Description

    Analysis of ‘Dashboard Template - Maryland Energy Administration’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/702a5047-8470-4a53-a1e7-d629756d95c1 on 26 January 2022.

    --- Dataset description provided by original source is as follows ---

    This dataset includes data from the Maryland Energy Administration (MEA) about statewide energy consumption, energy savings programs, renewable energy, and electric and hybrid vehicles

    --- Original source retains full ownership of the source dataset ---

  9. m

    PMRAM: Bangladeshi Brain Cancer - MRI Dataset

    • data.mendeley.com
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prottoy Md Shahriar Mannan (2024). PMRAM: Bangladeshi Brain Cancer - MRI Dataset [Dataset]. http://doi.org/10.17632/m7w55sw88b.1
    Explore at:
    Dataset updated
    Dec 19, 2024
    Authors
    Prottoy Md Shahriar Mannan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Bangladeshi Brain Cancer MRI Dataset is a large dataset of Magnetic Resonance Imaging (MRI) images created to aid researchers in medical diagnosis, especially for brain cancer research. This collection contains a total of 1600 raw photos (every class have 400 raw images) after augmentation it contains total 6000 images, which are wisely divided into four main categories as:

    Glioma -1500 images

    Meningioma -1500 images

    Pituitary-1500 images

    No Tumor-1500 images

    All the images in this dataset were collected from different hospitals around Bangladesh. It brought diversity and representation into the sample. To make the images compatible with various image processing, machine learning and deep-learning pipelines as possible they are then resized to a standardize size of 512×512.

    This dataset is incredibly significant since high-quality data, such as medical imaging data, are few and difficult to obtain, particularly in the context of brain cancer. Assume that four prominent doctors collaborate on data collection in order to give more accurate and helpful content. It made it feasible. The cooperation emphasizes the dataset's potential to improve medical practice today by providing a dependable supply of diagnoses for use in diagnostic tool creation and testing within current medicine.

    This dataset can be used by researchers and practitioners for a variety of applications such as Dense net 201, yolov8x/s, CNN, resnet50v2, VGG-16, MobilenetV2 etc.

    Image Processing Details:

    Images are randomly rotated within a range of 45 degrees. (rotation range=45)

    Images are horizontally shifted by up to 20% of the width of the image. (width_shift_range=0.2)

    Images are vertically shifted by up to 20% of the height of the image. (height_shift_range=0.2)

    Shear transformation is applied to the image within a range of 20%. (shear range=0.2)

    Images are randomly zoomed in or out by up to 20%. (zoom range=0.2)

    Images are randomly flipped horizontally. (horizontal flip=True)

    When transformations like rotations or shifts leave empty areas in the image, they are filled in by the nearest pixel values. (fill mode='nearest')

    Hospital List(for Data Collection):

    Ibn Sina Medical College, Kollanpur, 1, 1-B Mirpur Rd, Dhaka 1207

    Dhaka Medical College & Hospital, Secretariat Rd, Dhaka 1000

    Cumilla Medical College, Kuchaitoli, Dr. Akhtar Hameed Khan Road, Cumilla 3500, Bangladesh

    Supervisor & investigator:

    Md. Mizanur Rahman

    Lecturer,

    Computer Science and Engineering

    Daffodil International University

    Dhaka, Bangladesh

    mizanurrahman.cse@diu.edu.bd

    Data Collectors:

    Md Shahriar Mannan Prottoy

    Mahtab Chowdhury

    Redwan Rahman

    Azim Ullah Tamim

  10. d

    Development of an AI/ML-ready knee ultrasound dataset in a population-based...

    • dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nelson, Amanda (2023). Development of an AI/ML-ready knee ultrasound dataset in a population-based cohort [Dataset]. http://doi.org/10.7910/DVN/SKP9IB
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Nelson, Amanda
    Description

    About this data An ultrasound dataset to use in the discovery of ultrasound features associated with pain and radiographic change in KOA is highly innovative and will be a major step forward for the field. These ultrasound images originate from the diverse and inclusive population-based Johnston County Health Study (JoCoHS). This dataset is designed to adhere to FAIR principles and was funded in part by an Administrative Supplement to Improve the AI/ML-Readiness of NIH-Supported Data (3R01AR077060-03S1). Working with this dataset WorkingWithTheDataset.ipynb Jupyter notebook If you are familiar with working with Jupyter notebooks, we recommend using the WorkingWithTheDataset.ipynb Jupyter notebook to retrieve, validate, and learn more about the dataset. You should downloading the latest WorkingWithTheDataset.ipynb file and uploading it to an online Jupyter environment such as https://colab.research.google.com or use the notebook in your Jupyter environment of choice. You will also need to download the CONFIGURATION_SETTINGS.template.md file from this dataset since the contents are used to configure the Jupyter notebook. Note: at the time of this writing, we do not recommend using Binder (mybinder.org) if you are interested in only reviewing the WorkingWithTheDataset.ipynb notebook. When Binder loads the dataset, it will download all files from this dataset, resulting in a long build time. However, if you plan to work with all files in the dataset then Binder might work for you. We do not offer support for this service or other Jupyter Lab environments. Metadata The DatasetMetadata.json file contains general information about the files and variables within this dataset. We use it as our validation metadata to verify the data we are importing into this Dataverse dataset. This file is also the most comprehensive with regards to the dataset metadata. Data collection in progress This dataset is not complete and will be updated regularly as additional data is collected.

  11. f

    Two Hypothetical Examples of the Probability of Needing Referral for...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hui-Fen Mao; Ling-Hui Chang; Athena Yi-Jung Tsai; Wen-Ni Huang; Jye Wang (2023). Two Hypothetical Examples of the Probability of Needing Referral for Community-Based OT Based on the IADL Model of Referral Protocol. [Dataset]. http://doi.org/10.1371/journal.pone.0148414.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hui-Fen Mao; Ling-Hui Chang; Athena Yi-Jung Tsai; Wen-Ni Huang; Jye Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Two Hypothetical Examples of the Probability of Needing Referral for Community-Based OT Based on the IADL Model of Referral Protocol.

  12. Simulation data and code for "Optimal Rejection-Free Path Sampling"

    • zenodo.org
    bin, zip
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gianmarco Lazzeri; Gianmarco Lazzeri (2025). Simulation data and code for "Optimal Rejection-Free Path Sampling" [Dataset]. http://doi.org/10.5281/zenodo.14922168
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gianmarco Lazzeri; Gianmarco Lazzeri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025
    Description

    This repository contains the main data of the paper "Optimal Rejection-Free Path Sampling," and the source code for generating/appending the independent RFPS-AIMMD and AIMMD runs.

    Due to size constraints, the data has been split into separate repositories. The following repositories contain the trajectory files generated by the runs:

    all the WQ runs: 10.5281/zenodo.14830317
    chignolin, fps0: 10.5281/zenodo.14826023
    chignolin, fps1: 10.5281/zenodo.14830200
    chignolin, fps2: 10.5281/zenodo.14830224
    chignolin, tps0: 10.5281/zenodo.14830251
    chignolin, tps1: 10.5281/zenodo.14830270
    chignolin, tps2: 10.5281/zenodo.14830280

    The trajectory files are not required for running the main analysis, as all necessary information for machine learning and path reweighting is contained in the "PatEnsemble" object files stored in this repository. However, these trajectories are essential for projecting the path ensemble estimate onto an arbitrary set of collective variables.

    To reconstruct the full dataset, please merge all the data folders you find in the supplemental repositories.

    Data structure and content

    analysis (code for analyzing the data and generating the figures of the
    | paper)
    |- figures.ipynb (Jupyter notebook for the analysis)
    |- figures (the figures created by the Jupyter notebook)
    |- ...

    data (all the AIMMD and reference runs, plus general info about the
    | simulated systems)
    |- chignolin
    |- *.py: (code for generating/appending AIMMD runs on a Workstation or
    | HPC cluster via Slurm; see the "src" folder below)
    |- run.gro (full system positions in the native conformation)
    |- mol.pdb (only the peptide positions in the native conformation)
    |- topol.top (the system's topology for the GROMACS MD engine)
    |- charmmm22star.ff (force field parameter files)
    |- run.mdp (GROMACS MD parameters when appending a simulation)
    |- randomvelocities.mdp (GROMACS MD parameters when initializing a
    | simulation with random velocities)
    |- signature.npy, r0.npy (parameters for defining the fraction of native
    | contacts involved in the folded/unfolded states
    | definition; used by params.py function
    | "states_function")
    |- dmax.npy, dmin.npy (parameters for defining the feature representation
    | of the AIMMD NN model; used by params.py
    | function "descriptors_function")
    |- equilibrium (reference long equilibrium trajectory files; only the
    | peptide positions are saved!)
    |- run0.xtc, ..., run3.xtc
    |- validation
    |- validation.xtc (the validation SPs all together in an XTC file)
    |- validation.npy (for each SP, collects the cumulative shooting results
    after 10 two-way shooting simulations)
    |- fps0 (the first AIMMD-RFPS independent run)
    |- equilibriumA (the free simulations around A, already processed
    | in PathEnsemble files)
    |- traj000001.h5
    |- traj000001.tpr (for running the simulation; in that case, please
    | retrieve all the trajectory files in the right
    | supplemental repository first)
    |- traj000001.cpt (for appending the simulation; in that case, please
    | retrieve all the trajectory files in the right
    | supplemental repository first)
    |- traj000002.h5 (in case of re-initialization)
    |- ...
    |- equilibriumB (the free simulations around B, ...)
    |- ...
    |- shots0
    |- chain.h5 (the path sampling chain)
    |- pool.h5 (the selection pool, containing the frames from which
    | shooting points are currently selected from)
    |- params.py (file containing the states and descriptors definitions,
    | the NN fit function, and the AIMMD runs hyperparameters;
    | if can be modified to allow for RFPS-AIMMD or the original
    | algorithm AIMMD runs)
    |- initial.trr (the initial transition for path sampling)
    |- manager.log (reports info about the run)
    |- network

    src (code for generating/appending AIMMD runs on a Workstation or HPC
    | cluster via Slurm)
    |- generate.py (on a Workstation: initializes the processes; on an HPC
    | cluster: creates the sh file for submitting a job)
    |- slurm_options.py (to customize and use in case of running on HPC)
    |- manager.py (controls SP selection; reweights the paths)
    |- shooter.py (performs path sampling simulations)
    |- equilibrium.py (performs free simulations)
    |- pathensemble.py (code of the PathEnsemble class)
    |- utils.py (auxiliary functions for data production and analysis)

    Running/appending AIMMD runs

    * To initialize a new RFPS-AIMMD (or AIMMD) run for the systems of this paper:

    1. Create a "run directory" folder (same depth as "fps0")

    2. Copy "initial.trr" and "params.py" from another AIMMD run folder. It is possible to change "params.py" to customize the run.

    3. (On a Workstation) call:

    python generate.py

    where nsteps is the final number of path sampling steps for the run, n the number of independent path sampling chains, nA the number of independent free simulators around A, and nB that of free simulators around B.

    4. (On a HPC cluster) call:

    python generate.py
    sbatch .

    * To append to an existing RFPS-AIMMD or AIMMD run

    1. Merge the supplemental repository with the trajectory files into this one.

    2. Just call again (on a Workstation)

    python generate.py

    or (on a HPC cluster)

    sbatch .

    after updating the "nsteps" parameters.

    * To run enhanced sampling for a new system: please keep the data structure as close as possible to the original. Different names for the files can generate incompatibilities. We are currently trying to make it easier.

    Reproducing the analysis

    Run the analysis/figures.ipynb notebook. Some groups of cells have to be run multiple times after changing the parameters in the preamble.

  13. g

    Grok 3 personas - AI Prompt Template

    • godtierprompts.com
    jsonld
    Updated Jun 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xiaolongbao (2025). Grok 3 personas - AI Prompt Template [Dataset]. https://www.godtierprompts.com/prompt/65c935f4-4eaa-473d-9dc5-ff28168b5452
    Explore at:
    jsonldAvailable download formats
    Dataset updated
    Jun 26, 2025
    Authors
    xiaolongbao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Quality Score
    Description

    A curated prompt template for AI language models: The system prompts for all the different personas used in grok 3

    h/t: https://github.com/asgeirtj/system_prompts_leaks/blob/main/X.ai/grok-personas.md

  14. g

    Gemini 2.5 pro system prompt - AI Prompt Template

    • godtierprompts.com
    jsonld
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2025). Gemini 2.5 pro system prompt - AI Prompt Template [Dataset]. https://www.godtierprompts.com/prompt/757942bb-dc48-40ce-aa83-8274f8a517a9
    Explore at:
    jsonldAvailable download formats
    Dataset updated
    Jun 30, 2025
    Authors
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Quality Score
    Description

    A curated prompt template for AI language models: System prompts for gemini 2.5 pro

    h/t: https://github.com/asgeirtj/system_prompts_leaks/blob/main/Google/gemini-2.5-pro-webapp.md, https://g.co/gemini/share/7390bd8330ef

  15. d

    Fish communities in PA and MD Piedmont mixed agricultural streams, 2023

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Fish communities in PA and MD Piedmont mixed agricultural streams, 2023 [Dataset]. https://catalog.data.gov/dataset/fish-communities-in-pa-and-md-piedmont-mixed-agricultural-streams-2023
    Explore at:
    Dataset updated
    Jul 20, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    From 2-27 June, 2023, a Virginia Tech team of 5 sampled the fish community in 30 Piedmont streams (lower Susquehanna River and upper Chesapeake Bay tributaries, Pennsylvania and Maryland, USA) spanning a gradient of agricultural intensity as part of a larger stream-health study including other teams who surveyed geomorphology, water quality, flow, temperature, and macroinvertebrates at the same 30 streams. Upstream drainage area of these 30 streams ranged from approximately 10 to 50 sq. km, and width from 2 to 10 m. At each stream, we sampled fish from two reaches using two-pass backpack electrofishing and seining. Reach A was the main reach surveyed by all the interdisciplinary teams; Reach B was surveyed for fish communities only. Reach length was 20 wetted channel widths, capped at 150 meters. Fish were identified to species, assigned to a length class, inspected for anomalies, and released. Other measurements included water temperature, electrical conductivity, pH, dissolved oxygen, and clarity, and streambed area comprising pool, riffle, and run habitats.

  16. h

    nextjs-standard

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RELAI, nextjs-standard [Dataset]. https://huggingface.co/datasets/relai-ai/nextjs-standard
    Explore at:
    Dataset authored and provided by
    RELAI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: Next.js Documentation Data Source Link: https://nextjs.org/docs Data Source License: https://github.com/vercel/next.js/blob/canary/license.md Data Source Authors: Vercel AI Benchmarks by Data Agents © 2025 RELAI.AI · Licensed under CC BY 4.0. Source: https://relai.ai

  17. SDDF Energy Dataset

    • zenodo.org
    application/gzip, bin +2
    Updated May 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vahagn Altunyan; Vahagn Altunyan; Tsolak Ghukasyan; Tsolak Ghukasyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garegin Papoian; Garegin Papoian; Garik Petrosyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garik Petrosyan (2025). SDDF Energy Dataset [Dataset]. http://doi.org/10.5281/zenodo.14008357
    Explore at:
    bin, application/gzip, tsv, csvAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vahagn Altunyan; Vahagn Altunyan; Tsolak Ghukasyan; Tsolak Ghukasyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garegin Papoian; Garegin Papoian; Garik Petrosyan; Aram Bughdaryan; Tigran Aghajanyan; Khachik Smbatyan; Garik Petrosyan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This conformational energy dataset, developed as part of the Smart Distributed Data Factory (SDDF) project, contains over 2.17 million molecular conformations based on drug-like molecules sourced from the ENAMINE database. Energies were calculated using DFT with the ωB97x density functional and the 6–31G(d) basis set. The conformations were generated from SMILES using RDKit, MMFF94 optimization, and molecular dynamics (MD) simulations, providing a diverse set of molecular structures and energy states.

    • RDKit Conformations: 535,338
    • RDKit + MMFF94 Optimized: 1,151,936
    • MD-Generated: 483,279

    This dataset serves as a benchmark for energy prediction models, with training (638,617 examples), validation (134,732 examples), and test subsets (24,890 examples) created using a strict scaffold-based split to ensure no overlap and less than 70% similarity between the training and test sets.

    Dataset contents:

    • data.tar.gz: contains the conformations in Structured Data File format, grouped into separate folders based on the molecule ID. Each conformation's label is provided within its SDF file as a property named "energy".
    • INDEX.smi: specifies the molecule IDs and their corresponding SMILES.
    • SOURCES.csv: specifies the conformation generation method for each conformation.
    • SDDF_train.tsv, SDDF_validation.tsv, and SDDF_test.tsv specify the molecule IDs and conformations for each subset of the benchmark.

    A detailed description is provided in the accompanying paper.

  18. h

    svelte-reasoning

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RELAI, svelte-reasoning [Dataset]. https://huggingface.co/datasets/relai-ai/svelte-reasoning
    Explore at:
    Dataset authored and provided by
    RELAI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: Svelte Documentation Data Source Link: https://svelte.dev/docs Data Source License: https://github.com/sveltejs/svelte/blob/main/LICENSE.md Data Source Authors: Svelte Contributors AI Benchmarks by Data Agents © 2025 RELAI.AI · Licensed under CC BY 4.0. Source: https://relai.ai

  19. Total active physicians in the U.S. 2025, by state

    • statista.com
    • ai-chatbox.pro
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Total active physicians in the U.S. 2025, by state [Dataset]. https://www.statista.com/statistics/186269/total-active-physicians-in-the-us/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2025
    Area covered
    United States
    Description

    The number of physicians across the United States reveals significant variations, with California leading the pack at nearly ******* active doctors as of April 2025. This concentration of medical professionals in populous states highlights the ongoing challenge of ensuring adequate healthcare access nationwide. The stark contrast between California's physician count and Wyoming's mere ***** doctors underscores the need for targeted efforts to address healthcare workforce shortages in less populated areas. Primary care and specialist distribution California leads also in both primary care physicians and specialists, accounting for over ** percent of each category nationally. This concentration of medical expertise in California reflects broader trends, with New York and Texas following as the states with the highest numbers of active primary care physicians. The distribution of specialists also mirrors national patterns, with psychiatry, surgery, and anaesthesiology among the most common specialties. Physician burnout While the number of physicians continues to grow, physician burnout remains a significant issue. There are large variations in rates of burnout depending on a physician's gender and specialty. For example, burnout is disproportionally high among women, affecting ** percent of female physicians and ** percent of male physicians. Meanwhile, emergency medicine physicians reported the highest levels of burnout among specialists, highlighting the need for targeted interventions to support the individual needs of doctors depending on their different circumstances.

  20. f

    The relative root mean square error (RRMSE) and relative bias (RBIAS) with...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Nabiul Islam Khan; Renske Hijbeek; Uta Berger; Nico Koedam; Uwe Grueters; S. M. Zahirul Islam; Md Asadul Hasan; Farid Dahdouh-Guebas (2023). The relative root mean square error (RRMSE) and relative bias (RBIAS) with “aggregated” spatial pattern, varying aggregation radius (AR), varying aggregation intensity (AI) and a fixed true density of 3000 ha-1. [Dataset]. http://doi.org/10.1371/journal.pone.0157985.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Md Nabiul Islam Khan; Renske Hijbeek; Uta Berger; Nico Koedam; Uwe Grueters; S. M. Zahirul Islam; Md Asadul Hasan; Farid Dahdouh-Guebas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The relative root mean square error (RRMSE) and relative bias (RBIAS) with “aggregated” spatial pattern, varying aggregation radius (AR), varying aggregation intensity (AI) and a fixed true density of 3000 ha-1.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Souradeep Sil; Ishita Datta; Sankar Basu (2025). Data Sheet 1_Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.pdf [Dataset]. http://doi.org/10.3389/fmolb.2025.1542267.s001

Data Sheet 1_Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.pdf

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
Frontiers
Authors
Souradeep Sil; Ishita Datta; Sankar Basu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Intrinsically Disordered Proteins (IDPs) challenge traditional structure-function paradigms by existing as dynamic ensembles rather than stable tertiary structures. Capturing these ensembles is critical to understanding their biological roles, yet Molecular Dynamics (MD) simulations, though accurate and widely used, are computationally expensive and struggle to sample rare, transient states. Artificial intelligence (AI) offers a transformative alternative, with deep learning (DL) enabling efficient and scalable conformational sampling. They leverage large-scale datasets to learn complex, non-linear, sequence-to-structure relationships, allowing for the modeling of conformational ensembles in IDPs without the constraints of traditional physics-based approaches. Such DL approaches have been shown to outperform MD in generating diverse ensembles with comparable accuracy. Most models rely primarily on simulated data for training and experimental data serves a critical role in validation, aligning the generated conformational ensembles with observable physical and biochemical properties. However, challenges remain, including dependence on data quality, limited interpretability, and scalability for larger proteins. Hybrid approaches combining AI and MD can bridge the gaps by integrating statistical learning with thermodynamic feasibility. Future directions include incorporating physics-based constraints and learning experimental observables into DL frameworks to refine predictions and enhance applicability. AI-driven methods hold significant promise in IDP research, offering novel insights into protein dynamics and therapeutic targeting while overcoming the limitations of traditional MD simulations.

Search
Clear search
Close search
Google apps
Main menu