Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Veri Setleri Hakkında / About the Datasets
Bu dosya, çeşitli veri setlerinin özelliklerini ve kullanım alanlarını özetlemektedir. / This document summarizes the features and use cases of various datasets.
anthracite-org/kalo-opus-instruct-22k-no-refusal
Açıklama / Description: Bu veri seti, çeşitli talimat ve yanıt çiftlerini içeren geniş bir koleksiyondur. Eğitim ve değerlendirme süreçlerinde kullanılmak üzere tasarlanmıştır. / This dataset contains a large collection… See the full description on the dataset page: https://huggingface.co/datasets/Kasimyildirim/Data-Synthesis-422K.
Recording environment : professional recording studio.
Recording content : general narrative sentences, interrogative sentences, etc.
Speaker : native speaker
Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.
Device : Microphone
Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish
Application scenarios : speech synthesis
Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)
This archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Paleoceanography. The data include parameters of paleocean (reconstruction) with a geographic location of Global. The time period coverage is from 1950 to -50 in calendar years before present (BP). See metadata information for parameter and study location details. Please cite this study when using the data.
The dataset used in the paper is a high-resolution image synthesis dataset, which consists of images generated using a latent diffusion model.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In biochemistry, eicosanoids are signaling molecules made by oxidation of twenty-carbon essential fatty acids, (EFAs). They exert complex control over many bodily systems, mainly in inflammation or immunity, and as messengers in the central nervous system. Source: Wikipedia. This pathway has been updated with information from LIPID MAPS>Eicosanoids. Metabolites and proteins from this pathway are orange coloured and have an rounded rectangle shape (where an rectangle shape indicates that the node only occures in the LIPID MAPS pathway). Reactions occurring in the LIPID MAPS pathways are coloured orange (where a dashed line indicates that the reaction only occures in the LIPID MAPS pathway). Proteins on this pathway have targeted assays available via the CPTAC Assay Portal
https://market.us/privacy-policy/https://market.us/privacy-policy/
Gene Synthesis Service Market size is expected to reach US$ 16 Billion by 2034, from US$ 2.1 Billion in 2024, growing at a CAGR of 22.5%.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ULB ChocoFountainBxl sequence by LISA ULB
The test sequence "ULB ChocoFountainBxl" is provided by Daniele Bonatto, Sarah Fachada, Mehrdad Teratani and Gauthier Lafruit, members of the LISA department, EPB (Ecole Polytechnique de Bruxelles), ULB (Université Libre de Bruxelles), Belgium.
License
License Creative Commons 4.0 - CC BY 4.0
Terms of Use
Any kind of publication or report using this sequence should refer to the following references.
[1] Daniele Bonatto, Sarah Fachada, Mehrdad Teratani, Gauthier Lafruit, "ULB ChocoFountainBxl", Zenodo, 10.5281/zenodo.5960227, 2022.
@misc{bonatto_chocofountainbxl_2022,
title = {{ULB} {ChocoFountainBxl}},
author = {Bonatto, Daniele and Fachada, Sarah and Teratani, Mehrdad and Lafruit, Gauthier},
publisher = {Zenodo}
month = feb,
year = {2022},
doi = {10.5281/zenodo.5960227}
}
[2] A. Schenkel, D. Bonatto, S. Fachada, H.-L. Guillaume, et G. Lafruit, "Natural Scenes Datasets for Exploration in 6DOF Navigation", in 2018 International Conference on 3D Immersion (IC3D), Brussels, Belgium, déc. 2018, p. 1-8. doi: 10.1109/IC3D.2018.8657865.
@inproceedings{schenkel_natural_b_2018,
address = {Brussels, Belgium},
title = {Natural {Scenes} {Datasets} for {Exploration} in {6DOF} {Navigation}},
isbn = {978-1-5386-7590-8},
url = {https://doi.org/10.1109/IC3D.2018.8657865},
doi = {10.1109/IC3D.2018.8657865},
language = {en},
urldate = {2019-04-11},
booktitle = {2018 {International} {Conference} on {3D} {Immersion} ({IC3D})},
publisher = {IEEE},
author = {Schenkel, Arnaud and Bonatto, Daniele and Fachada, Sarah and Guillaume, Henry-Louis and Lafruit, Gauthier},
month = dec,
year = {2018},
pages = {1--8}
}
Production
Laboratory of Image Synthesis and Analysis, LISA department, Ecole Polytechnique de Bruxelles, Universite Libre de Bruxelles, Belgium.
Content
This dataset contains a dynamic test scene created using the acquisition system described in [2] (3x5 array with a baseline of 10 cm (vertical) and 15cm horizontal).
We provide color corrected [4] 97 frames RGB textures (YUV420p10le format) captured using 15 4k micro studio Blackmagic cameras (3840x2160 pixels @ 30 fps cropped to 3712x2064).
We also provide corresponding depth maps (YUV420p16le format) estimated using MPEG's Immersive Video Depth Estimation (IVDE) [5] and refined using PDR [6].
The scene display two actors interacting with difficult objects to render in view synthesis. In particular the scene contains transparent, specular and smooth areas objects.
The videos were taken in a controlled light environment.
The views are disposed as follow:
v00 | v01 | v02 | v03 | v04 |
v10 | v11 | v12 | v13 | v14 |
v20 | v21 | v22 | v23 | v24 |
In addition to the images and their depth maps, an accurate camera calibration file is provided following the format of [8].
The dataset contains:
- a `camera.json` file in OMAF coordinates system (Camera position: X: forwards, Y:left, Z: up, Rotation: yaw, pitch, roll) [9],
- a `view_synthesis_config.zip` folder containing configuration files for RVS [7,8] to synthesize every view with its closest 4 neighbors in a "plus" configuration,
- a `view_synthesis_results.zip` folder containing videos (scaled to 710x516) corresponding to the configuration files in `view_synthesis_config` and a multiview videos displaying all the results merged together. Views synthesized with RVS [7,8],
- a `vXY_depth_3712x2064_yuv420p16le.zip` Depth maps for every XY view in yuv420p16le format,
- a `vXY_texture_3712x2064_yuv420p10le.zip` RGB textures for every XY view in yuv420p10le format.
References and links
[4] A. Dziembowski, D. Mieloch, S. Różek and M. Domański, "Color Correction for Immersive Video Applications," in IEEE Access, vol. 9, pp. 75626-75640, 2021, doi: 10.1109/ACCESS.2021.3081870.
[5] D. Mieloch, O. Stankiewicz and M. Domański, "Depth Map Estimation for Free-Viewpoint Television and Virtual Navigation", IEEE Access, vol. 8, pp. 5760-5776, 2020, doi: 10.1109/ACCESS.2019.2963487.
[6] D. Mieloch, A. Dziembowski and M. Domański, "Depth Map Refinement for Immersive Video," in IEEE Access, vol. 9, pp. 10778-10788, 2021, doi: 10.1109/ACCESS.2021.3050554.
[7] D. Bonatto, S. Fachada, S. Rogge, A. Munteanu and G. Lafruit, "Real-Time Depth Video-Based Rendering for 6-DoF HMD Navigation and Light Field Displays," in IEEE Access, vol. 9, pp. 146868-146887, 2021, doi: 10.1109/ACCESS.2021.3123529.
[8] S. Fachada, B. Kroon, D. Bonatto, B. Sonneveldt, et G. Lafruit, "Reference View Synthesizer (RVS) 2.0 manual, [N17759]", july. 2018.
[9] S. Fachada, D. Bonatto, M. Teratani, and G. Lafruit, "Intechopen - View Synthesis tool for VR Immersive Video", 2022.
Acknowledgments
[G1] EU project HoviTron, Grant Agreement n$^o$951989 on Interactive Technologies, Horizon 2020.
[G2] Innoviris, the Brussels Institute for Research and Innovation, Belgium, under contract No.: 2015-DS-39a/b & 2015-R-39c/d, 3DLicorneA.
[G3] Sarah Fachada is a Research Fellow of the Fonds de la Recherche Scientifique - FNRS, Belgium.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Selenocysteine, the 21st genetically encoded amino acid, is the major form of the antioxidant trace element selenium in the human body. In eukaryotes and archaea its synthesis proceeds through a phosphorylated intermediate in a tRNA-dependent fashion. The final step of selenocysteine formation is catalyzed by O-phosphoseryl-tRNA:selenocysteinyl-tRNA synthase (SEPSECS) that converts phosphoseryl-tRNA(Sec) to selenocysteinyl-tRNA(Sec).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Wenhao97/gpt4o-mini-context-synthesis dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The potential of medical image analysis with neural networks is limited by the restricted availability of extensive data sets. The incorporation of synthetic training data is one approach to bypass this shortcoming, as synthetic data offer accurate annotations and unlimited data size. We evaluated eleven CycleGAN for the synthesis of computed tomography (CT) images based on XCAT body phantoms.
https://www.factmr.com/privacy-policyhttps://www.factmr.com/privacy-policy
The global oligonucleotide synthesis market is approximated at US$ 6.76 billion in 2024 and is foreseen to expand at a CAGR of 10.9% to reach US$ 19.01 billion by the end of 2034.
Report Attributes | Details |
---|---|
Oligonucleotide Synthesis Market Size (2024E) | US$ 6.76 Billion |
Forecasted Market Value (2034F) | US$ 19.01 Billion |
Global Market Growth Rate (2024 to 2034) | 10.9% CAGR |
South Korea Market Value (2034F) | US$ 795.3 Million |
Key Companies Profiled |
|
Country-wise Analysis
Attribute | United States |
---|---|
Market Value (2024E) | US$ 2.16 Billion |
Growth Rate (2024 to 2034) | 12.7% CAGR |
Projected Value (2034F) | US$ 7.15 Billion |
Attribute | South Korea |
---|---|
Market Value (2024E) | US$ 269 Million |
Growth Rate (2024 to 2034) | 11.4% CAGR |
Projected Value (2034F) | US$ 795.3 Million |
Category-wise Analysis
Attribute | Research |
---|---|
Segment Value (2024E) | US$ 2.68 Billion |
Growth Rate (2024 to 2034) | 11.7% CAGR |
Projected Value (2034F) | US$ 8.09 Billion |
Attribute | Reagents & Consumables |
---|---|
Segment Value (2024E) | US$ 3.43 Billion |
Growth Rate (2024 to 2034) | 11.1% CAGR |
Projected Value (2034F) | US$ 9.86 Billion |
Final study conducted on the REVIVE project in the Bale Zone of Ethiopia. Focus of the study is on project outcomes related to resilience, use and perceived value of the SAPARM intervention, project sustainability, application of the D-RISK process and other outcomes.
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The Gene Synthesis Market is Segmented by Synthesis Method (Chemical Oligonucleotide Synthesis and Gene Assembly[PCR-Mediated and Ligation-Mediated), Service Type (Antibody DNA Synthesis and More), Application (Gene and Cell Therapy Developments and More), End User (Biopharmaceutical Companies and More), and Geography (North America, Europe, Asia-Pacific, and More). The Market and Forecasts are Provided in Terms of Value (USD).
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The DNA Synthesis Market report segments the industry into By Product and Service (Instruments, Reagents and Consumables, DNA Synthesis Services), By Type (Oligonucleotide Synthesis, Gene Synthesis), By Application (Diagnostics, Therapeutics, Research and Development), By End User (Pharmaceutical and Biotechnology Companies, CROs and CDMOs, Academic and Research Institutes, and more), and Geography.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data Description
We release the synthetic data generated using the method described in the paper Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models (ACL 2024 Findings). The external knowledge we use is based on LLM-generated topics and writing styles.
Generated Datasets
The original train/validation/test data, and the generated synthetic training data are listed as follows. For each dataset, we generate 5000… See the full description on the dataset page: https://huggingface.co/datasets/ritaranx/clinical-synthetic-text-llm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This time-series data synthesis pilot product includes data from 12 fixed ship-based time-series programs with a focus on biogeochemical essential ocean variables. Data used in this synthesis product were made possible with funding through the following:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A prostaglandin is any member of a group of lipid compounds that are derived enzymatically from fatty acids and have important functions in the animal body. Every prostaglandin contains 20 carbon atoms, including a 5-carbon ring. They are mediators and have a variety of strong physiological effects, such as regulating the contraction and relaxation of smooth muscle tissue.[1] Prostaglandins are not hormones, but autocrine or paracrine, which are locally acting messenger molecules. They differ from hormones in that they are not produced at a discrete site but in many places throughout the human body. Also, their target cells are present in the immediate vicinity of the site of their secretion (of which there are many). The prostaglandins, together with the thromboxanes and prostacyclins, form the prostanoid class of fatty acid derivatives, a subclass of eicosanoids. Adapted from Gross, G et al. 2000, Society for Gynecologic Investigation; 7:88-95. Description adapted from Wikipedia. Proteins on this pathway have targeted assays available via the CPTAC Assay Portal.
Final study conducted on the REVIVE (Restoring Vibrant Villages and Environments) project in the Bale Zone of Ethiopia. Focus of the study is on project outcomes related to resilience, use and perceived value of the SAPARM (Satellite Assisted Pastoral Resource Management) intervention, project sustainability, application of the D-RISK process and other outcomes.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global gene synthesis tool market size was valued at USD 1.2 billion in 2023 and is projected to reach USD 3.8 billion by 2032, growing at a CAGR of 13.4% during the forecast period. This remarkable growth is driven by advancements in synthetic biology and the increasing demand for gene synthesis across various sectors, such as pharmaceuticals, biotechnology, and academic research. Improvements in gene editing technologies and the decreasing cost of DNA sequencing are also significant factors contributing to the market's expansion.
The growing field of synthetic biology is a major growth factor for the gene synthesis tool market. Synthetic biology involves the redesigning and constructing of new biological entities such as enzymes, genetic circuits, and cells, and gene synthesis tools are fundamental to these processes. The ability to design and assemble long DNA sequences accurately and efficiently has revolutionized various applications, including the development of new therapeutic approaches, creating genetically modified organisms, and advancing agricultural biotechnology. As the synthetic biology field expands, the demand for precise and efficient gene synthesis tools continues to surge.
Another driving force is the escalating investment in research and development by pharmaceutical and biotechnology companies. The pursuit of innovative therapies for diseases, including genetic disorders and cancers, relies heavily on gene synthesis tools for creating custom gene sequences tailored to specific research needs. Additionally, the increasing adoption of personalized medicine, which requires custom synthesis of patient-specific genes, further propels the market growth. With the continuous push for novel drug discovery and the expansion of biopharmaceutical pipelines, the gene synthesis tool market is set for substantial growth.
Technological advancements in gene synthesis methods and automation are also crucial contributors to the market's growth. The development of high-throughput synthesis platforms has significantly boosted the speed and accuracy of gene synthesis, reducing turnaround times and costs. Furthermore, innovations such as error-correction technologies and improved synthesis chemistries have enhanced the reliability and efficiency of gene synthesis, making these tools more accessible to a broader range of end-users. These technological enhancements are expected to drive the market further, making gene synthesis more efficient and cost-effective.
Synthetic Biology in Medical applications is increasingly becoming a pivotal area of interest within the gene synthesis tool market. This interdisciplinary field combines principles from biology and engineering to design and construct new biological parts and systems. In the medical sector, synthetic biology is being harnessed to develop innovative therapeutic solutions, such as engineered cells and gene circuits, which can be used to treat complex diseases. The ability to synthesize and assemble genetic components with precision is crucial for these advancements, enabling the creation of novel treatments that were previously unimaginable. As the integration of synthetic biology in medical applications continues to grow, it is expected to drive further demand for advanced gene synthesis tools, facilitating breakthroughs in personalized medicine and regenerative therapies.
Regionally, North America currently dominates the gene synthesis tool market, driven by robust research and funding in the field of biotechnology and pharmaceuticals. However, Asia Pacific is anticipated to witness the highest growth rate during the forecast period, owing to the increasing investment in biotechnology research, the growth of the biopharmaceutical industry, and supportive government initiatives. European countries are also expected to show significant growth, supported by a strong academic research base and collaborations between research institutions and industry players.
The gene synthesis tool market can be segmented by product type into custom gene synthesis, gene library synthesis, and others. Custom gene synthesis is expected to hold the largest market share, driven by the increasing need for tailor-made DNA sequences in research and therapeutic applications. Custom gene synthesis allows for the creation of specific gene sequences that can be used in various experimental setups, offering a high degree of flexibility an
This dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.
This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.
The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.
Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.
For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.
This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.
Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.
Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).
1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Veri Setleri Hakkında / About the Datasets
Bu dosya, çeşitli veri setlerinin özelliklerini ve kullanım alanlarını özetlemektedir. / This document summarizes the features and use cases of various datasets.
anthracite-org/kalo-opus-instruct-22k-no-refusal
Açıklama / Description: Bu veri seti, çeşitli talimat ve yanıt çiftlerini içeren geniş bir koleksiyondur. Eğitim ve değerlendirme süreçlerinde kullanılmak üzere tasarlanmıştır. / This dataset contains a large collection… See the full description on the dataset page: https://huggingface.co/datasets/Kasimyildirim/Data-Synthesis-422K.