100+ datasets found
  1. h

    Data-Synthesis-422K

    • huggingface.co
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kasım Yıldırım (2024). Data-Synthesis-422K [Dataset]. https://huggingface.co/datasets/Kasimyildirim/Data-Synthesis-422K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Authors
    Kasım Yıldırım
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Veri Setleri Hakkında / About the Datasets

    Bu dosya, çeşitli veri setlerinin özelliklerini ve kullanım alanlarını özetlemektedir. / This document summarizes the features and use cases of various datasets.

      anthracite-org/kalo-opus-instruct-22k-no-refusal
    

    Açıklama / Description: Bu veri seti, çeşitli talimat ve yanıt çiftlerini içeren geniş bir koleksiyondur. Eğitim ve değerlendirme süreçlerinde kullanılmak üzere tasarlanmıştır. / This dataset contains a large collection… See the full description on the dataset page: https://huggingface.co/datasets/Kasimyildirim/Data-Synthesis-422K.

  2. Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...

    • datarade.ai
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-speech-synthesis-data-400-hours-a-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Finland, Belgium, Philippines, Singapore, Colombia, Sweden, Hong Kong, Austria, Canada, Malaysia
    Description
    1. Specifications Format : 44.1 kHz/48 kHz, 16bit/24bit, uncompressed wav, mono channel.

    Recording environment : professional recording studio.

    Recording content : general narrative sentences, interrogative sentences, etc.

    Speaker : native speaker

    Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.

    Device : Microphone

    Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish

    Application scenarios : speech synthesis

    Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go AI & ML Training Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/tts?source=Datarade
  3. NOAA/WDS Paleoclimatology - PAGES Ocean2k Synthesis Data Set

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact); NOAA World Data Service for Paleoclimatology (Point of Contact) (2025). NOAA/WDS Paleoclimatology - PAGES Ocean2k Synthesis Data Set [Dataset]. https://catalog.data.gov/dataset/noaa-wds-paleoclimatology-pages-ocean2k-synthesis-data-set1
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    This archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Paleoceanography. The data include parameters of paleocean (reconstruction) with a geographic location of Global. The time period coverage is from 1950 to -50 in calendar years before present (BP). See metadata information for parameter and study location details. Please cite this study when using the data.

  4. t

    High-resolution image synthesis dataset - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). High-resolution image synthesis dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/high-resolution-image-synthesis-dataset
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The dataset used in the paper is a high-resolution image synthesis dataset, which consists of images generated using a latent diffusion model.

  5. Data from: Eicosanoid synthesis

    • wikipathways.org
    Updated Nov 14, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiPathways (2008). Eicosanoid synthesis [Dataset]. https://www.wikipathways.org/pathways/WP167.html
    Explore at:
    Dataset updated
    Nov 14, 2008
    Dataset authored and provided by
    WikiPathwayshttp://wikipathways.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In biochemistry, eicosanoids are signaling molecules made by oxidation of twenty-carbon essential fatty acids, (EFAs). They exert complex control over many bodily systems, mainly in inflammation or immunity, and as messengers in the central nervous system. Source: Wikipedia. This pathway has been updated with information from LIPID MAPS>Eicosanoids. Metabolites and proteins from this pathway are orange coloured and have an rounded rectangle shape (where an rectangle shape indicates that the node only occures in the LIPID MAPS pathway). Reactions occurring in the LIPID MAPS pathways are coloured orange (where a dashed line indicates that the reaction only occures in the LIPID MAPS pathway). Proteins on this pathway have targeted assays available via the CPTAC Assay Portal

  6. m

    Gene Synthesis Service Market Size, Share | CAGR Of 22.5%

    • market.us
    csv, pdf
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us (2025). Gene Synthesis Service Market Size, Share | CAGR Of 22.5% [Dataset]. https://market.us/report/gene-synthesis-service-market/
    Explore at:
    pdf, csvAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Market.us
    License

    https://market.us/privacy-policy/https://market.us/privacy-policy/

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Gene Synthesis Service Market size is expected to reach US$ 16 Billion by 2034, from US$ 2.1 Billion in 2024, growing at a CAGR of 22.5%.

  7. ULB ChocoFountainBxl

    • zenodo.org
    • data.niaid.nih.gov
    bin, json, png, zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele Bonatto; Daniele Bonatto; Sarah Fachada; Sarah Fachada; Mehrdad Teratani; Mehrdad Teratani; Gauthier Lafruit; Gauthier Lafruit (2024). ULB ChocoFountainBxl [Dataset]. http://doi.org/10.5281/zenodo.5960227
    Explore at:
    bin, zip, png, jsonAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniele Bonatto; Daniele Bonatto; Sarah Fachada; Sarah Fachada; Mehrdad Teratani; Mehrdad Teratani; Gauthier Lafruit; Gauthier Lafruit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ULB ChocoFountainBxl sequence by LISA ULB

    The test sequence "ULB ChocoFountainBxl" is provided by Daniele Bonatto, Sarah Fachada, Mehrdad Teratani and Gauthier Lafruit, members of the LISA department, EPB (Ecole Polytechnique de Bruxelles), ULB (Université Libre de Bruxelles), Belgium.

    View Synthesis Sample

    License

    License Creative Commons 4.0 - CC BY 4.0

    Terms of Use

    Any kind of publication or report using this sequence should refer to the following references.

    [1] Daniele Bonatto, Sarah Fachada, Mehrdad Teratani, Gauthier Lafruit, "ULB ChocoFountainBxl", Zenodo, 10.5281/zenodo.5960227, 2022.

    @misc{bonatto_chocofountainbxl_2022,
    title = {{ULB} {ChocoFountainBxl}},
    author = {Bonatto, Daniele and Fachada, Sarah and Teratani, Mehrdad and Lafruit, Gauthier},
    publisher = {Zenodo}
    month = feb,
    year = {2022},
    doi = {10.5281/zenodo.5960227}
    }

    [2] A. Schenkel, D. Bonatto, S. Fachada, H.-L. Guillaume, et G. Lafruit, "Natural Scenes Datasets for Exploration in 6DOF Navigation", in 2018 International Conference on 3D Immersion (IC3D), Brussels, Belgium, déc. 2018, p. 1-8. doi: 10.1109/IC3D.2018.8657865.

    @inproceedings{schenkel_natural_b_2018,
    address = {Brussels, Belgium},
    title = {Natural {Scenes} {Datasets} for {Exploration} in {6DOF} {Navigation}},
    isbn = {978-1-5386-7590-8},
    url = {https://doi.org/10.1109/IC3D.2018.8657865},
    doi = {10.1109/IC3D.2018.8657865},
    language = {en},
    urldate = {2019-04-11},
    booktitle = {2018 {International} {Conference} on {3D} {Immersion} ({IC3D})},
    publisher = {IEEE},
    author = {Schenkel, Arnaud and Bonatto, Daniele and Fachada, Sarah and Guillaume, Henry-Louis and Lafruit, Gauthier},
    month = dec,
    year = {2018},
    pages = {1--8}
    }

    Production

    Laboratory of Image Synthesis and Analysis, LISA department, Ecole Polytechnique de Bruxelles, Universite Libre de Bruxelles, Belgium.

    Content

    This dataset contains a dynamic test scene created using the acquisition system described in [2] (3x5 array with a baseline of 10 cm (vertical) and 15cm horizontal).
    We provide color corrected [4] 97 frames RGB textures (YUV420p10le format) captured using 15 4k micro studio Blackmagic cameras (3840x2160 pixels @ 30 fps cropped to 3712x2064).
    We also provide corresponding depth maps (YUV420p16le format) estimated using MPEG's Immersive Video Depth Estimation (IVDE) [5] and refined using PDR [6].

    The scene display two actors interacting with difficult objects to render in view synthesis. In particular the scene contains transparent, specular and smooth areas objects.
    The videos were taken in a controlled light environment.

    The views are disposed as follow:

    v00v01v02v03v04
    v10v11v12v13v14
    v20v21v22v23v24

    In addition to the images and their depth maps, an accurate camera calibration file is provided following the format of [8].

    The dataset contains:

    - a `camera.json` file in OMAF coordinates system (Camera position: X: forwards, Y:left, Z: up, Rotation: yaw, pitch, roll) [9],
    - a `view_synthesis_config.zip` folder containing configuration files for RVS [7,8] to synthesize every view with its closest 4 neighbors in a "plus" configuration,
    - a `view_synthesis_results.zip` folder containing videos (scaled to 710x516) corresponding to the configuration files in `view_synthesis_config` and a multiview videos displaying all the results merged together. Views synthesized with RVS [7,8],
    - a `vXY_depth_3712x2064_yuv420p16le.zip` Depth maps for every XY view in yuv420p16le format,
    - a `vXY_texture_3712x2064_yuv420p10le.zip` RGB textures for every XY view in yuv420p10le format.

    References and links

    [4] A. Dziembowski, D. Mieloch, S. Różek and M. Domański, "Color Correction for Immersive Video Applications," in IEEE Access, vol. 9, pp. 75626-75640, 2021, doi: 10.1109/ACCESS.2021.3081870.
    [5] D. Mieloch, O. Stankiewicz and M. Domański, "Depth Map Estimation for Free-Viewpoint Television and Virtual Navigation", IEEE Access, vol. 8, pp. 5760-5776, 2020, doi: 10.1109/ACCESS.2019.2963487.
    [6] D. Mieloch, A. Dziembowski and M. Domański, "Depth Map Refinement for Immersive Video," in IEEE Access, vol. 9, pp. 10778-10788, 2021, doi: 10.1109/ACCESS.2021.3050554.
    [7] D. Bonatto, S. Fachada, S. Rogge, A. Munteanu and G. Lafruit, "Real-Time Depth Video-Based Rendering for 6-DoF HMD Navigation and Light Field Displays," in IEEE Access, vol. 9, pp. 146868-146887, 2021, doi: 10.1109/ACCESS.2021.3123529.
    [8] S. Fachada, B. Kroon, D. Bonatto, B. Sonneveldt, et G. Lafruit, "Reference View Synthesizer (RVS) 2.0 manual, [N17759]", july. 2018.
    [9] S. Fachada, D. Bonatto, M. Teratani, and G. Lafruit, "Intechopen - View Synthesis tool for VR Immersive Video", 2022.

    Acknowledgments

    [G1] EU project HoviTron, Grant Agreement n$^o$951989 on Interactive Technologies, Horizon 2020.
    [G2] Innoviris, the Brussels Institute for Research and Innovation, Belgium, under contract No.: 2015-DS-39a/b & 2015-R-39c/d, 3DLicorneA.
    [G3] Sarah Fachada is a Research Fellow of the Fonds de la Recherche Scientifique - FNRS, Belgium.

  8. R

    Data from: Selenocysteine synthesis

    • reactome.org
    biopax2, biopax3 +5
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Selenocysteine synthesis [Dataset]. https://reactome.org/content/detail/R-HSA-2408557
    Explore at:
    biopax3, pdf, owl, biopax2, docx, sbgn, sbmlAvailable download formats
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Selenocysteine, the 21st genetically encoded amino acid, is the major form of the antioxidant trace element selenium in the human body. In eukaryotes and archaea its synthesis proceeds through a phosphorylated intermediate in a tRNA-dependent fashion. The final step of selenocysteine formation is catalyzed by O-phosphoseryl-tRNA:selenocysteinyl-tRNA synthase (SEPSECS) that converts phosphoseryl-tRNA(Sec) to selenocysteinyl-tRNA(Sec).

  9. h

    gpt4o-mini-context-synthesis

    • huggingface.co
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenhao Zhu (2025). gpt4o-mini-context-synthesis [Dataset]. https://huggingface.co/datasets/Wenhao97/gpt4o-mini-context-synthesis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2025
    Authors
    Wenhao Zhu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Wenhao97/gpt4o-mini-context-synthesis dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    Synthesis of CT images from digital body phantoms using CycleGAN [dataset]

    • heidata.uni-heidelberg.de
    zip
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Zöllner; Frank Zöllner (2023). Synthesis of CT images from digital body phantoms using CycleGAN [dataset] [Dataset]. http://doi.org/10.11588/DATA/7NRFYC
    Explore at:
    zip(53512131857)Available download formats
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    heiDATA
    Authors
    Frank Zöllner; Frank Zöllner
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    German Federal Ministry of Education and Research (BMBF)
    Description

    The potential of medical image analysis with neural networks is limited by the restricted availability of extensive data sets. The incorporation of synthetic training data is one approach to bypass this shortcoming, as synthetic data offer accurate annotations and unlimited data size. We evaluated eleven CycleGAN for the synthesis of computed tomography (CT) images based on XCAT body phantoms.

  11. Oligonucleotide Synthesis Market Study by Reagents & Consumables, Equipment,...

    • factmr.com
    csv, pdf
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fact.MR (2024). Oligonucleotide Synthesis Market Study by Reagents & Consumables, Equipment, and Synthesized Oligonucleotides for Research, Therapeutics, and Diagnostics from 2024 to 2034 [Dataset]. https://www.factmr.com/report/4853/oligonucleotide-synthesis-market
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Fact.MR
    License

    https://www.factmr.com/privacy-policyhttps://www.factmr.com/privacy-policy

    Time period covered
    2024 - 2034
    Area covered
    Worldwide
    Description

    The global oligonucleotide synthesis market is approximated at US$ 6.76 billion in 2024 and is foreseen to expand at a CAGR of 10.9% to reach US$ 19.01 billion by the end of 2034.

    Report AttributesDetails
    Oligonucleotide Synthesis Market Size (2024E)US$ 6.76 Billion
    Forecasted Market Value (2034F)US$ 19.01 Billion
    Global Market Growth Rate (2024 to 2034)10.9% CAGR
    South Korea Market Value (2034F)US$ 795.3 Million
    Key Companies Profiled
    • Kaneka Eurogentec S.A.
    • TriLink Biotechnologies LLC
    • BioAutomation
    • Danaher Corporation
    • Dharmacon Inc.
    • Thermo Fisher Scientific Corporation
    • GE Healthcare
    • ATDBio Ltd.
    • Twist Bioscience
    • Merck KGaA
    • Integrated DNA Technologies Inc.
    • Sigma-Aldrich Corporation
    • Agilent Technologies
    • LGC Biosearch Technologies

    Country-wise Analysis

    AttributeUnited States
    Market Value (2024E)US$ 2.16 Billion
    Growth Rate (2024 to 2034)12.7% CAGR
    Projected Value (2034F)US$ 7.15 Billion
    AttributeSouth Korea
    Market Value (2024E)US$ 269 Million
    Growth Rate (2024 to 2034)11.4% CAGR
    Projected Value (2034F)US$ 795.3 Million

    Category-wise Analysis

    AttributeResearch
    Segment Value (2024E)US$ 2.68 Billion
    Growth Rate (2024 to 2034)11.7% CAGR
    Projected Value (2034F)US$ 8.09 Billion
    AttributeReagents & Consumables
    Segment Value (2024E)US$ 3.43 Billion
    Growth Rate (2024 to 2034)11.1% CAGR
    Projected Value (2034F)US$ 9.86 Billion
  12. REVIVE Final Synthesis Study Data

    • catalog.data.gov
    • gimi9.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). REVIVE Final Synthesis Study Data [Dataset]. https://catalog.data.gov/dataset/revive-final-synthesis-study-data-82851
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    Final study conducted on the REVIVE project in the Bale Zone of Ethiopia. Focus of the study is on project outcomes related to resilience, use and perceived value of the SAPARM intervention, project sustainability, application of the D-RISK process and other outcomes.

  13. Gene Synthesis Market Size, Analysis, Forecast & Growth Drivers 2030

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2025). Gene Synthesis Market Size, Analysis, Forecast & Growth Drivers 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/gene-synthesis-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    Global
    Description

    The Gene Synthesis Market is Segmented by Synthesis Method (Chemical Oligonucleotide Synthesis and Gene Assembly[PCR-Mediated and Ligation-Mediated), Service Type (Antibody DNA Synthesis and More), Application (Gene and Cell Therapy Developments and More), End User (Biopharmaceutical Companies and More), and Geography (North America, Europe, Asia-Pacific, and More). The Market and Forecasts are Provided in Terms of Value (USD).

  14. DNA Synthesis Market Size & Share Analysis - Industry Research Report -...

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2025). DNA Synthesis Market Size & Share Analysis - Industry Research Report - Growth Trends [Dataset]. https://www.mordorintelligence.com/industry-reports/dna-synthesis-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Feb 20, 2025
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    Global
    Description

    The DNA Synthesis Market report segments the industry into By Product and Service (Instruments, Reagents and Consumables, DNA Synthesis Services), By Type (Oligonucleotide Synthesis, Gene Synthesis), By Application (Diagnostics, Therapeutics, Research and Development), By End User (Pharmaceutical and Biotechnology Companies, CROs and CDMOs, Academic and Research Institutes, and more), and Geography.

  15. h

    clinical-synthetic-text-llm

    • huggingface.co
    Updated Jul 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ran Xu (2024). clinical-synthetic-text-llm [Dataset]. https://huggingface.co/datasets/ritaranx/clinical-synthetic-text-llm
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2024
    Authors
    Ran Xu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data Description

    We release the synthetic data generated using the method described in the paper Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models (ACL 2024 Findings). The external knowledge we use is based on LLM-generated topics and writing styles.

      Generated Datasets
    

    The original train/validation/test data, and the generated synthetic training data are listed as follows. For each dataset, we generate 5000… See the full description on the dataset page: https://huggingface.co/datasets/ritaranx/clinical-synthetic-text-llm.

  16. b

    Synthesis Product for Ocean Time Series (SPOTS)

    • bco-dmo.org
    • datacart.bco-dmo.org
    • +1more
    csv
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nico Lange; Björn Fiedler; Marta Álvarez; Alice Benoit-Cattin; Heather Benway; Pier Luigi Buttigieg; Laurent Coppola; Kim I. Currie; Susana Flecha; Dana Stuart Gerlach; Makio C Honda; Emma I. Huertas; Danie Kinkade; Frank Muller-Karger; Siv Kari Lauvset; Arne Körtzinger; Kevin M. O'Brien; Sólveig Ólafsdóttir; Fernando Carvalho Pacheco; Digna Rueda-Roa; Ingunn Skjelvan; Masahide Wakita; Angelicque E. White; Toste Tanhua (2024). Synthesis Product for Ocean Time Series (SPOTS) [Dataset]. http://doi.org/10.26008/1912/bco-dmo.896862.2
    Explore at:
    csv(52.67 MB)Available download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    Nico Lange; Björn Fiedler; Marta Álvarez; Alice Benoit-Cattin; Heather Benway; Pier Luigi Buttigieg; Laurent Coppola; Kim I. Currie; Susana Flecha; Dana Stuart Gerlach; Makio C Honda; Emma I. Huertas; Danie Kinkade; Frank Muller-Karger; Siv Kari Lauvset; Arne Körtzinger; Kevin M. O'Brien; Sólveig Ólafsdóttir; Fernando Carvalho Pacheco; Digna Rueda-Roa; Ingunn Skjelvan; Masahide Wakita; Angelicque E. White; Toste Tanhua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 5, 1983 - Jul 30, 2021
    Area covered
    Variables measured
    DOC, DOI, NH4, POC, PON, POP, TPC, TPN, TPP, DATE, and 99 more
    Measurement technique
    CTD Sea-Bird, Elemental Analyzer, Oxygen Sensor, Nutrient Autoanalyzer, Salinometer, Titrator, Inorganic Carbon Analyzer, Total Organic Carbon Analyzer, Salinity Sensor, Spectrophotometer, and 1 more
    Description

    This time-series data synthesis pilot product includes data from 12 fixed ship-based time-series programs with a focus on biogeochemical essential ocean variables. Data used in this synthesis product were made possible with funding through the following:

    • EU Horizon 2020 through the EuroSea Innovation Action (grant agreement 862626)
    • EU Horizon 2020 iAtlantic programme (grant agreement 818123)
    • European Union’s Horizon 2020 research and innovation program (grant agreement 820989; COMFORT).
    • WASCAL MRP-CCMS project from the German Federal Ministry of Education and Research (BMBF; grant agreement no. 01LG1805A).
    • National Science Foundation (OCE-1259043, OCE-175651, and RISE-2028291).
    • Norwegian Environment Agency under grant agreement nos. 14078029, 15078033, 16078007, 17018007, and 21087110.
    • Grant-in-Aid for Scientific Research (20H04349) from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) KAKENHI.
    • Mediterranean Ocean Observing System for the Environment program (MOOSE) coordinated by CNRS-INSU and the Research Infrastructure ILICO (CNRS-IFREMER).
    • The European projects CARBOOCEAN, CARBOCHANGE, SESAME, PERSEUS and COMFORT
    • The Spanish Ministry of Science through the grants CTM2005/01091-MAR and CTM2008-05680-C02-01 and the Junta de Andalucía through the TECADE project (PY20_00293)
    • Centro Nacional Instituto Español de Oceanografía (IEO-CSIC)
  17. Data from: Prostaglandin synthesis and regulation

    • wikipathways.org
    Updated Nov 14, 2008
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiPathways (2008). Prostaglandin synthesis and regulation [Dataset]. https://www.wikipathways.org/pathways/WP98.html
    Explore at:
    Dataset updated
    Nov 14, 2008
    Dataset authored and provided by
    WikiPathwayshttp://wikipathways.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A prostaglandin is any member of a group of lipid compounds that are derived enzymatically from fatty acids and have important functions in the animal body. Every prostaglandin contains 20 carbon atoms, including a 5-carbon ring. They are mediators and have a variety of strong physiological effects, such as regulating the contraction and relaxation of smooth muscle tissue.[1] Prostaglandins are not hormones, but autocrine or paracrine, which are locally acting messenger molecules. They differ from hormones in that they are not produced at a discrete site but in many places throughout the human body. Also, their target cells are present in the immediate vicinity of the site of their secretion (of which there are many). The prostaglandins, together with the thromboxanes and prostacyclins, form the prostanoid class of fatty acid derivatives, a subclass of eicosanoids. Adapted from Gross, G et al. 2000, Society for Gynecologic Investigation; 7:88-95. Description adapted from Wikipedia. Proteins on this pathway have targeted assays available via the CPTAC Assay Portal.

  18. REVIVE Final Synthesis Study

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). REVIVE Final Synthesis Study [Dataset]. https://catalog.data.gov/dataset/revive-final-synthesis-study
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    Final study conducted on the REVIVE (Restoring Vibrant Villages and Environments) project in the Bale Zone of Ethiopia. Focus of the study is on project outcomes related to resilience, use and perceived value of the SAPARM (Satellite Assisted Pastoral Resource Management) intervention, project sustainability, application of the D-RISK process and other outcomes.

  19. Gene Synthesis Tool Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Gene Synthesis Tool Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/gene-synthesis-tool-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Gene Synthesis Tool Market Outlook



    The global gene synthesis tool market size was valued at USD 1.2 billion in 2023 and is projected to reach USD 3.8 billion by 2032, growing at a CAGR of 13.4% during the forecast period. This remarkable growth is driven by advancements in synthetic biology and the increasing demand for gene synthesis across various sectors, such as pharmaceuticals, biotechnology, and academic research. Improvements in gene editing technologies and the decreasing cost of DNA sequencing are also significant factors contributing to the market's expansion.



    The growing field of synthetic biology is a major growth factor for the gene synthesis tool market. Synthetic biology involves the redesigning and constructing of new biological entities such as enzymes, genetic circuits, and cells, and gene synthesis tools are fundamental to these processes. The ability to design and assemble long DNA sequences accurately and efficiently has revolutionized various applications, including the development of new therapeutic approaches, creating genetically modified organisms, and advancing agricultural biotechnology. As the synthetic biology field expands, the demand for precise and efficient gene synthesis tools continues to surge.



    Another driving force is the escalating investment in research and development by pharmaceutical and biotechnology companies. The pursuit of innovative therapies for diseases, including genetic disorders and cancers, relies heavily on gene synthesis tools for creating custom gene sequences tailored to specific research needs. Additionally, the increasing adoption of personalized medicine, which requires custom synthesis of patient-specific genes, further propels the market growth. With the continuous push for novel drug discovery and the expansion of biopharmaceutical pipelines, the gene synthesis tool market is set for substantial growth.



    Technological advancements in gene synthesis methods and automation are also crucial contributors to the market's growth. The development of high-throughput synthesis platforms has significantly boosted the speed and accuracy of gene synthesis, reducing turnaround times and costs. Furthermore, innovations such as error-correction technologies and improved synthesis chemistries have enhanced the reliability and efficiency of gene synthesis, making these tools more accessible to a broader range of end-users. These technological enhancements are expected to drive the market further, making gene synthesis more efficient and cost-effective.



    Synthetic Biology in Medical applications is increasingly becoming a pivotal area of interest within the gene synthesis tool market. This interdisciplinary field combines principles from biology and engineering to design and construct new biological parts and systems. In the medical sector, synthetic biology is being harnessed to develop innovative therapeutic solutions, such as engineered cells and gene circuits, which can be used to treat complex diseases. The ability to synthesize and assemble genetic components with precision is crucial for these advancements, enabling the creation of novel treatments that were previously unimaginable. As the integration of synthetic biology in medical applications continues to grow, it is expected to drive further demand for advanced gene synthesis tools, facilitating breakthroughs in personalized medicine and regenerative therapies.



    Regionally, North America currently dominates the gene synthesis tool market, driven by robust research and funding in the field of biotechnology and pharmaceuticals. However, Asia Pacific is anticipated to witness the highest growth rate during the forecast period, owing to the increasing investment in biotechnology research, the growth of the biopharmaceutical industry, and supportive government initiatives. European countries are also expected to show significant growth, supported by a strong academic research base and collaborations between research institutions and industry players.



    Product Type Analysis



    The gene synthesis tool market can be segmented by product type into custom gene synthesis, gene library synthesis, and others. Custom gene synthesis is expected to hold the largest market share, driven by the increasing need for tailor-made DNA sequences in research and therapeutic applications. Custom gene synthesis allows for the creation of specific gene sequences that can be used in various experimental setups, offering a high degree of flexibility an

  20. C

    Synthetic Integrated Services Data

    • data.wprdc.org
    csv, html, pdf, zip
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allegheny County (2024). Synthetic Integrated Services Data [Dataset]. https://data.wprdc.org/dataset/synthetic-integrated-services-data
    Explore at:
    html, csv(1375554033), zip(39231637), pdfAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    Allegheny County
    Description

    Motivation

    This dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.

    This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.

    Collection

    The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.

    Preprocessing

    Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.

    For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.

    Recommended Uses

    This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.

    Known Limitations/Biases

    Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.

    Feedback

    Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).

    Further Documentation and Resources

    1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
    2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
    3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
    4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kasım Yıldırım (2024). Data-Synthesis-422K [Dataset]. https://huggingface.co/datasets/Kasimyildirim/Data-Synthesis-422K

Data-Synthesis-422K

Kasimyildirim/Data-Synthesis-422K

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2024
Authors
Kasım Yıldırım
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Veri Setleri Hakkında / About the Datasets

Bu dosya, çeşitli veri setlerinin özelliklerini ve kullanım alanlarını özetlemektedir. / This document summarizes the features and use cases of various datasets.

  anthracite-org/kalo-opus-instruct-22k-no-refusal

Açıklama / Description: Bu veri seti, çeşitli talimat ve yanıt çiftlerini içeren geniş bir koleksiyondur. Eğitim ve değerlendirme süreçlerinde kullanılmak üzere tasarlanmıştır. / This dataset contains a large collection… See the full description on the dataset page: https://huggingface.co/datasets/Kasimyildirim/Data-Synthesis-422K.

Search
Clear search
Close search
Google apps
Main menu