65 datasets found
  1. Data sources used by companies for training AI models South Korea 2023

    • statista.com
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Data sources used by companies for training AI models South Korea 2023 [Dataset]. https://www.statista.com/statistics/1452822/south-korea-data-sources-for-training-artificial-intelligence-models/
    Explore at:
    Dataset updated
    Sep 19, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2023 - Nov 2023
    Area covered
    South Korea
    Description

    As of 2023, customer data was the leading source of information used to train artificial intelligence (AI) models in South Korea, with nearly 70 percent of surveyed companies answering that way. About 62 percent responded to use existing data within the company when training their AI model.

  2. Synthetic Data Generation Market Analysis North America, Europe, APAC,...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Synthetic Data Generation Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, Germany, UK, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/synthetic-data-generation-market-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global
    Description

    Snapshot img

    Synthetic Data Generation Market Size 2024-2028

    The synthetic data generation market size is forecast to increase by USD 2.88 billion at a CAGR of 60.02% between 2023 and 2028.

    The global synthetic data generation market is expanding steadily, driven by the growing need for privacy-compliant data solutions and advancements in AI technology. Key factors include the increasing demand for data to train machine learning models, particularly in industries like healthcare services and finance where privacy regulations are strict and the use of predictive analytics is critical, and the use of generative AI and machine learning algorithms, which create high-quality synthetic datasets that mimic real-world data without compromising security.
    This report provides a detailed analysis of the global synthetic data generation market, covering market size, growth forecasts, and key segments such as agent-based modeling and data synthesis. It offers practical insights for business strategy, technology adoption, and compliance planning. A significant trend highlighted is the rise of synthetic data in AI training, enabling faster and more ethical development of models. One major challenge addressed is the difficulty in ensuring data quality, as poorly generated synthetic data can lead to inaccurate outcomes.
    For businesses aiming to stay competitive in a data-driven global landscape, this report delivers essential data and strategies to leverage synthetic data trends and address quality challenges, ensuring they remain leaders in innovation while meeting regulatory demands
    

    What will be the Size of the Market During the Forecast Period?

    Request Free Sample

    Synthetic data generation offers a more time-efficient solution compared to traditional methods of data collection and labeling, making it an attractive option for businesses looking to accelerate their AI and machine learning projects. The market represents a promising opportunity for organizations seeking to overcome the challenges of data scarcity and privacy concerns while maintaining data diversity and improving the efficiency of their artificial intelligence and machine learning initiatives. By leveraging this technology, technology decision-makers can drive innovation and gain a competitive edge in their respective industries.

    Market Segmentation

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Healthcare and life sciences
      Retail and e-commerce
      Transportation and logistics
      IT and telecommunication
      BFSI and others
    
    
    Type
    
      Agent-based modelling
      Direct modelling
    
    
    Data
    
      Tabular Data
      Text Data
      Image & Video Data
      Others
    
    
    Offering Band
    
      Fully Synthetic Data
      Partially Synthetic Data
      Hybrid Synthetic Data
    
    
    Application
    
      Data Protection
      Data Sharing
      Predictive Analytics
      Natural Language Processing
      Computer Vision Algorithms
      Others
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        Germany
        UK
        France
        Italy
    
    
      APAC
    
        China
        Japan
        India
    
    
      Middle East and Africa
    
    
    
      South America
    

    By End-user Insights

    The healthcare and life sciences segment is estimated to witness significant growth during the forecast period. In the thriving healthcare and life sciences sector, synthetic data generation is gaining significant traction as a cost-effective and time-efficient alternative to utilizing real-world data. This market segment's rapid expansion is driven by the increasing demand for data-driven insights and the importance of safeguarding sensitive information. One noteworthy application of synthetic data generation is in the realm of computer vision, specifically with geospatial imagery and medical imaging.

    For instance, in healthcare, synthetic data can be generated to replicate medical imaging, such as MRI scans and X-rays, for research and machine learning model development without compromising patient privacy. Similarly, in the field of physical security, synthetic data can be employed to enhance autonomous vehicle simulation, ensuring optimal performance and safety without the need for real-world data. By generating artificial datasets, organizations can diversify their data sources and improve the overall quality and accuracy of their machine learning models.

    Get a glance at the share of various segments. Request Free Sample

    The healthcare and life sciences segment was valued at USD 12.60 million in 2018 and showed a gradual increase during the forecast period.

    Regional Insights

    North America is estimated to contribute 36% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the m

  3. U

    U.S. AI Training Dataset Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2024). U.S. AI Training Dataset Market Report [Dataset]. https://www.archivemarketresearch.com/reports/us-ai-training-dataset-market-4957
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    United States
    Variables measured
    Market Size
    Description

    The U.S. AI Training Dataset Market size was valued at USD 590.4 million in 2023 and is projected to reach USD 1880.70 million by 2032, exhibiting a CAGR of 18.0 % during the forecasts period. The U. S. AI training dataset market deals with the generation, selection, and organization of datasets used in training artificial intelligence. These datasets contain the requisite information that the machine learning algorithms need to infer and learn from. Conducts include the advancement and improvement of AI solutions in different fields of business like transport, medical analysis, computing language, and money related measurements. The applications include training the models for activities such as image classification, predictive modeling, and natural language interface. Other emerging trends are the change in direction of more and better-quality, various and annotated data for the improvement of model efficiency, synthetic data generation for data shortage, and data confidentiality and ethical issues in dataset management. Furthermore, due to arising technologies in artificial intelligence and machine learning, there is a noticeable development in building and using the datasets. Recent developments include: In February 2024, Google struck a deal worth USD 60 million per year with Reddit that will give the former real-time access to the latter’s data and use Google AI to enhance Reddit’s search capabilities. , In February 2024, Microsoft announced around USD 2.1 billion investment in Mistral AI to expedite the growth and deployment of large language models. The U.S. giant is expected to underpin Mistral AI with Azure AI supercomputing infrastructure to provide top-notch scale and performance for AI training and inference workloads. .

  4. Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035...

    • rootsanalysis.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roots Analysis (2024). Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/synthetic-data-generation-market
    Explore at:
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Authors
    Roots Analysis
    License

    https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html

    Time period covered
    2021 - 2031
    Area covered
    Global
    Description

    The global synthetic data market size is projected to grow from USD 0.4 billion in the current year to USD 19.22 billion by 2035, representing a CAGR of 42.14%, during the forecast period till 2035

  5. Data center chip architecture used for AI training phase 2017-2025

    • statista.com
    Updated May 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data center chip architecture used for AI training phase 2017-2025 [Dataset]. https://www.statista.com/statistics/1104879/data-center-chip-architecture-for-ai-training/
    Explore at:
    Dataset updated
    May 23, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    Worldwide
    Description

    As of November 2019, application-specific integrated circuits (ASIC) are forecast to have a growing share of the training phase artificial intelligence (AI) applications in data centers, making up for a projected 50 percent of it by 2025. Comparatively, graphics processing units (GPUs) will lose their presence by that time, dropping from 97 percent down to 40 percent.

    AI chips

    In order to provide greater security and efficiency, many data centers are overseeing the widespread implementation of artificial intelligence (AI) in their processes and systems. AI technologies and tasks require specialized AI chips that are more powerful and optimized for advanced machine learning (ML) algorithms, owning to an overall growth in data center chip revenues.

    The edge

    An interesting development for the data center industry is the rise of the edge computing. IT infrastructure is moved into edge data centers, specialized facilities that are located nearer to end-users. The global edge data center market size is expected to reach 13.5 billion U.S. dollars in 2024, twice the size of the market in 2020, with experts suggesting that the growth of emerging technologies like 5G and IoT will contribute to this growth.

  6. Data sources used by public sector for training AI models South Korea 2022

    • statista.com
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Data sources used by public sector for training AI models South Korea 2022 [Dataset]. https://www.statista.com/statistics/1453708/south-korea-public-sector-ai-training-data/
    Explore at:
    Dataset updated
    Feb 29, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Aug 19, 2022 - Oct 21, 2022
    Area covered
    South Korea
    Description

    According to a survey conducted in 2022 in the public sector in South Korea, more than 56 percent answered to use non-customer in-house data for training artificial intelligence (AI) models. More than a third of the surveyed public organizations were using public data.

  7. Opinions on artificial intelligence's impact on jobs in the U.S. 2022, by...

    • statista.com
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Opinions on artificial intelligence's impact on jobs in the U.S. 2022, by age [Dataset]. https://www.statista.com/statistics/1357711/opinions-on-artificial-intelligence-impact-on-jobs-by-age-us/
    Explore at:
    Dataset updated
    Feb 6, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022
    Area covered
    Worldwide, United States
    Description

    During a 2022 survey conducted in the United States, it was found that 18 percent of respondents thought that artificial intelligence will lead to there being many fewer jobs. By contrast, 25 percent of respondents aged between 30 and 44 years stated that AI will create many more jobs.

    Artificial intelligence

    Artificial intelligence (AI) is the ability of a computer or machine to mimic the competencies of the human mind, learning from previous experiences to understand and respond to language, decisions, and problems. Particularly, a large amount of data is often used to train AI into developing algorithms and skills. The AI ecosystem consists of machine learning (ML), robotics, artificial neural networks, and natural language processing (NLP). Nowadays, tech and telecom, financial services, healthcare, and pharmaceutical industries are prominent for AI adoption in companies.

    AI companies and startups

    More and more companies and startups are engaging in the artificial intelligence market, which is forecast to grow rapidly in the coming years. Examples of big tech firms are IBM, Microsoft, Baidu, and Tencent, with the last owning the highest number of AI and ML patent families, amounting to over nine thousand. Moreover, driven by the excitement for this new technology and by the large investments in it, the number of startups involved in the industry around the world has grown in recent years. For instance, in the United States, the New York company UiPath was the top-funded AI startup.

  8. Trojan Detection Software Challenge - object-detection-feb2023-train

    • catalog.data.gov
    • data.nist.gov
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). Trojan Detection Software Challenge - object-detection-feb2023-train [Dataset]. https://catalog.data.gov/dataset/trojan-detection-software-challenge-object-detection-feb2023-train
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Round 13 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of object detection AIs trained both on synthetic image data build from Cityscapes and the DOTA_v2 dataset. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 128 AI models using a small set of model architectures. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.

  9. P

    CIFAKE: Real and AI-Generated Synthetic Images Dataset

    • paperswithcode.com
    Updated Mar 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CIFAKE: Real and AI-Generated Synthetic Images Dataset [Dataset]. https://paperswithcode.com/dataset/cifake-real-and-ai-generated-synthetic-images
    Explore at:
    Dataset updated
    Mar 23, 2023
    Description

    The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.

    CIFAKE is a dataset that contains 60,000 synthetically-generated images and 60,000 real images (collected from CIFAR-10). Can computer vision techniques be used to detect when an image is real or has been generated by AI?

    Dataset details The dataset contains two classes - REAL and FAKE. For REAL, we collected the images from Krizhevsky & Hinton's CIFAR-10 dataset For the FAKE images, we generated the equivalent of CIFAR-10 with Stable Diffusion version 1.4 There are 100,000 images for training (50k per class) and 20,000 for testing (10k per class)

    References If you use this dataset, you must cite the following sources

    Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

    Bird, J.J., Lotfi, A. (2023). CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. arXiv preprint arXiv:2303.14126.

    Real images are from Krizhevsky & Hinton (2009), fake images are from Bird & Lotfi (2023). The Bird & Lotfi study is a preprint currently available on ArXiv and this description will be updated when the paper is published.

    License This dataset is published under the same MIT license as CIFAR-10:

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  10. G

    Training dataset and results for geothermal exploration artificial...

    • gdr.openei.org
    • data.openei.org
    • +4more
    archive, data, image
    Updated Sep 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jim Moraga; Mahmut Cavur; H. Sebnem Duzgun; Hilal Soydan; Jim Moraga; Mahmut Cavur; H. Sebnem Duzgun; Hilal Soydan (2020). Training dataset and results for geothermal exploration artificial intelligence, applied to Brady Hot Springs and Desert Peak [Dataset]. http://doi.org/10.15121/1773692
    Explore at:
    data, image, archiveAvailable download formats
    Dataset updated
    Sep 1, 2020
    Dataset provided by
    Geothermal Data Repository
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Geothermal Technologies Program (EE-4G)
    Colorado School of Mines
    Authors
    Jim Moraga; Mahmut Cavur; H. Sebnem Duzgun; Hilal Soydan; Jim Moraga; Mahmut Cavur; H. Sebnem Duzgun; Hilal Soydan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The submission includes the labeled datasets, as ESRI Grid files (.gri, .grd) used for training and classification results for our machine leaning model: - brady_som_output.gri, brady_som_output.grd, brady_som_output.* - desert_som_output.gri, desert_som_output.grd, desert_som_output.*
    The data corresponds to two sites: Brady Hot Springs and Desert Peak, both located near Fallon, NV.

    Input layers include: - Geothermal: Labeled data (0: Non-geothermal; 1: Geothermal) - Minerals: Hydrothermal mineral alterations, as a result of spectral analysis using Chalcedony, Kaolinite, Gypsum, Hematite and Epsomite - Temperature: Land surface temperature (% of times a pixel was classified as "Hot" by K-Means) - Faults: Fault density with a 300mradius - Subsidence: PSInSAR results showing subsidence displacement of more than 5mm - Uplift: PSInSAR results showing subsidence displacement of more than 5mm

    Also, the results of the classification using Brady and Desert Peak to build 2 Convolutional Neural Networks. These were applied to the training site as well as the other site, the results are in GeoTiff format. - brady_classification: Results of classification of the Brady-trained model - desert_classification: Results of classification of the Desert Peak-trained model - b2d_classification: Results of classification of Desert Peak using the Brady-trained model - d2b_classification: Results of classification of Brady using the Desert Peak-trained model

  11. A

    Artificial Intelligence (AI) in Corporate Training Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Artificial Intelligence (AI) in Corporate Training Report [Dataset]. https://www.datainsightsmarket.com/reports/artificial-intelligence-ai-in-corporate-training-1418633
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The artificial intelligence (AI) market in corporate training is rapidly growing, with a market size of USD 388.9 million in 2025 and a CAGR of 21.7% forecast for the period 2025-2033. The growth of this market is driven by the increasing adoption of AI technologies by businesses, the growing need for effective and personalized training, and the increasing availability of data. Key trends include the increasing use of machine learning and deep learning technologies, the development of intelligent tutoring systems, and the integration of AI into learning platforms and virtual facilitators. Among the key players in the AI market for corporate training are Amazon Web Services, Blackboard Inc., Blippar, Century Tech Limited, Cerevrum Inc., CheckiO, Pearson PLC, TrueShelf, Querium Corporation, Knewton, Cognii Inc., Google Inc., Microsoft Corporation, Nuance Communication Inc., IBM Corporation, Jenzabar Inc., Yuguan Information Technology LLC, Pixatel Systems, PleiQ Smart Toys SpA, and Quantum Adaptive Learning LLC. These companies offer a range of AI-powered solutions for corporate training, including learning platforms, virtual facilitators, intelligent tutoring systems, and content management systems.

  12. Opinions on artificial intelligence's impact on life in the U.S. 2022, by...

    • statista.com
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Opinions on artificial intelligence's impact on life in the U.S. 2022, by age [Dataset]. https://www.statista.com/statistics/1357551/opinions-on-artificial-intelligence-by-age-us/
    Explore at:
    Dataset updated
    Feb 6, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022
    Area covered
    Worldwide, United States
    Description

    During a 2022 survey conducted in the United States, it was found that 26 percent of respondents thought that artificial intelligence will not make their lives either easier or harder. Moreover, 31 percent of respondents aged between 30 and 44 years stated that AI will make their lives much easier.

    Artificial intelligence

    Artificial intelligence (AI) is the ability of a computer or machine to mimic the competencies of the human mind, learning from previous experiences to understand and respond to language, decisions, and problems. Particularly, a large amount of data is often used to train AI into developing algorithms and skills. The AI ecosystem consists of machine learning (ML), robotics, artificial neural networks, and natural language processing (NLP). Nowadays, tech and telecom, financial services, healthcare, and pharmaceutical industries are prominent for AI adoption in companies.

    AI companies and startups

    More and more companies and startups are engaging in the artificial intelligence market, which is forecast to grow rapidly in the coming years. Examples of big tech firms are IBM, Microsoft, Baidu, and Tencent, with the last owning the highest number of AI and ML patent families, amounting to over nine thousand. Moreover, driven by the excitement for this new technology and by the large investments in it, the number of startups involved in the industry around the world has grown in recent years. For instance, in the United States, the New York company UiPath was the top-funded AI startup.

  13. Trojan Detection Software Challenge -...

    • datasets.ai
    • catalog.data.gov
    0, 19, 47
    Updated Feb 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2020). Trojan Detection Software Challenge - nlp-sentiment-classification-apr2021-train part2 [Dataset]. https://datasets.ai/datasets/trojan-detection-software-challenge-round-6-train-dataset-part2
    Explore at:
    47, 0, 19Available download formats
    Dataset updated
    Feb 16, 2020
    Dataset authored and provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Round 6 Train Dataset part2This is the training data used to construct and evaluate trojan detection software solutions. This data, generated at NIST, consists of natural language processing (NLP) AIs trained to perform text sentiment classification on English text. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 96 sentiment classification AI models using a small set of model architectures. The models were trained on text data drawn from product reviews. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.

  14. d

    Data from: Appendices for Geothermal Exploration Artificial Intelligence...

    • catalog.data.gov
    • data.openei.org
    • +4more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colorado School of Mines (2025). Appendices for Geothermal Exploration Artificial Intelligence Report [Dataset]. https://catalog.data.gov/dataset/appendices-for-geothermal-exploration-artificial-intelligence-report-46b5f
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Colorado School of Mines
    Description

    The Geothermal Exploration Artificial Intelligence looks to use machine learning to spot geothermal identifiers from land maps. This is done to remotely detect geothermal sites for the purpose of energy uses. Such uses include enhanced geothermal system (EGS) applications, especially regarding finding locations for viable EGS sites. This submission includes the appendices and reports formerly attached to the Geothermal Exploration Artificial Intelligence Quarterly and Final Reports. The appendices below include methodologies, results, and some data regarding what was used to train the Geothermal Exploration AI. The methodology reports explain how specific anomaly detection modes were selected for use with the Geo Exploration AI. This also includes how the detection mode is useful for finding geothermal sites. Some methodology reports also include small amounts of code. Results from these reports explain the accuracy of methods used for the selected sites (Brady Desert Peak and Salton Sea). Data from these detection modes can be found in some of the reports, such as the Mineral Markers Maps, but most of the raw data is included the DOE Database which includes Brady, Desert Peak, and Salton Sea Geothermal Sites.

  15. i

    IIITDMJ_Maize

    • ieee-dataport.org
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poornima Singh Thakur (2023). IIITDMJ_Maize [Dataset]. http://doi.org/10.21227/jrw1-md38
    Explore at:
    Dataset updated
    Dec 20, 2023
    Dataset provided by
    IEEE Dataport
    Authors
    Poornima Singh Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The existing datasets lack the diversity required to train the model so that it performs equally well in real fields under varying environmental conditions. To address this limitation, we propose to collect a small number of in-field data and use the GAN to generate synthetic data for training the deep learning network. To demonstrate the proposed method, a maize dataset 'IIITDMJ_Maize' was collected using a drone camera under different weather conditions, including both sunny and cloudy days. The recorded video was processed to sample image frames that were later resized to 224 x 224. Keeping some raw images intact for evaluation purpose, images were further processed to crop only the portion containing diseases and selecting healthy plant images. With the help of agriculture experts, the raw and cropped images were subsequently categorized into four distinct classes -- (a) common rust, (b) northern leaf blight, (c) gray leaf spot, and (d) healthy. In total, 416 images were collected and labeled. Further, 50 raw (un-cropped) images of each category were also selected for testing the model's performance.

  16. Trojan Detection Software Challenge - image-classification-sep2022-train

    • catalog.data.gov
    • data.nist.gov
    Updated Sep 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). Trojan Detection Software Challenge - image-classification-sep2022-train [Dataset]. https://catalog.data.gov/dataset/trojan-detection-software-challenge-image-classification-sep2022-train
    Explore at:
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Round 11 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of image classification AIs trained on synthetic image data build from Cityscapes. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 288 AI models using a small set of model architectures. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.

  17. Vision-Based Obstacle Detection on Rail Tracks

    • zenodo.org
    • data.niaid.nih.gov
    mp4, zip
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorenzo De Donato; Lorenzo De Donato; Valeria Vittorini; Valeria Vittorini; Francesco Flammini; Francesco Flammini; Stefano Marrone; Stefano Marrone (2023). Vision-Based Obstacle Detection on Rail Tracks [Dataset]. http://doi.org/10.5281/zenodo.7924875
    Explore at:
    mp4, zipAvailable download formats
    Dataset updated
    May 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorenzo De Donato; Lorenzo De Donato; Valeria Vittorini; Valeria Vittorini; Francesco Flammini; Francesco Flammini; Stefano Marrone; Stefano Marrone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Acknowledgement and Disclaimers

    These data are a product of a research activity conducted in the context of the RAILS (Roadmaps for AI integration in the raiL Sector) project. RAILS has received funding from the Shift2Rail Joint Undertaking (JU) under the European Union’s Horizon 2020 research and innovation programme under grant agreement n. 881782 Rails. The JU receives support from the European Union’s Horizon 2020 research and innovation program and the Shift2Rail JU members other than the Union.

    The information and views set out in this description are those of the author(s) and do not necessarily reflect the official opinion of Shift2Rail Joint Undertaking. The JU does not guarantee the accuracy of the data included in this dataset. Neither the JU nor any person acting on the JU’s behalf may be held responsible for the use which may be made of the information contained therein.

    This "dataset" has been created for scientific purposes only to study the potentials of Deep Learning (DL) approaches when used to analyse Video Data in order to detect possible obstacles on rail tracks and thus avoid collisions. The authors DO NOT ASSUME any responsibility for the use that other researchers or users will make of these data.

    Objectives of the Study

    RAILS defined some pilot case studies to develop Proofs-of-Concept (PoCs), which are conceived as benchmarks, with the aim of providing insight towards the definition of technology roadmaps that could support future research and/or the deployment of AI applications in the rail sector. In this context, the main objectives of the specific PoC "Vision-Based Obstacle Detection on Rail Tracks" were to investigate: i) solutions for the generation of synthetic data, suitable for the training of DL models; and ii) the potential of DL applications when it comes to detecting any kind of obstacles on rail tracks while exploiting video data from a single RGB camera.

    A Brief Overview of the Approach

    A multi-modular approach has been proposed to achieve the objectives mentioned above. The resulting architecture includes the following modules:

    • The Rails Detection Module (RDM) detects rail tracks. The output of the RDM is used by the ODM and ADM.
    • The Object Detection Module (ODM) detects obstacles whose type is known in advance.
    • The Anomaly Detection Module (ADM) identifies any possible anomaly on rail tracks. These include obstacles whose type is not known in advance.
    • The Obstacle Detection Module merges the outputs from the ODM and the ADM.
    • The Distance Estimation Module estimates the distance of objects and anomalies from the train.

    The research was specifically oriented at implementing the RDM-ADM pipeline. Indeed, the object detection approaches that would be used to implement the ODM have been widely investigated by the research community, instead, to the best of our knowledge, limited work has been done in the rails field in the context of anomaly detection. The RDM has been realised by adopting a Semantic Segmentation approach based on U-Net; while, to develop the ADM, a Vector-Quantized Variational Autoencoder trained in Unsupervised mode was leveraged. Further details can be found in the RAILS "Deliverable D2.3: WP2 Report on experimentation, analysis, and discussion of results".

    Steps to implement the RDM-ADM pipeline and description of shared Data

    The following list reports all the steps that have been performed to implement the RDM-ADM pipeline; the words in bold-italic refer to the files that are shared within this dataset:

    1. A Railway Scenario was generated in MathWorks' RoadRunner.
    2. A video (FreeTrackVideo) was recorded by simulating an RGB camera mounted in front of the train; no obstacles on rail tracks were considered in this phase.
    3. 2000 frames (FreeTrack2KFrames) were extracted from the aforementioned video. The video contains 4143 frames, however, only 2000 (each other frame starting from the first one) were taken into account due to training time and GPU RAM constraints.
    4. Only 10% of the 2000 frames were manually labelled (i.e., 200 frames, a frame every 10 frames) by exploiting LabelMe; these frames were then subdivided into training and validation sets (InitialLabelledSet).
    5. Hence, a Semi-Automatic labelling algorithm was developed by leveraging self-training and transfer learning. This algorithm made it possible to label all the FreeTrack2KFrames starting from the InitialLabelledSet. The resulting labels can be found in FreeTrack2KLabels.
    6. Data Augmentation was then performed in order to introduce some aleatory in the dataset. Because of the same time and RAM constraints mentioned above, the FreeTrack2KFrames set of data was reduced further: 1600 frames were selected among the aforementioned 2000 and then 5 transformations (Bright, Dark, Rain, Shadow, and Sun Flare) were applied to obtain the dataset (FreeTrack16TrainSet, FreeTrack16ValSet, FreeTrack16TestSet) that was used to train, validate, and test the RDM.
    7. Once the RDM was trained, the FreeTrackVideo was processed to obtain the masked frames that were then used to build the dataset(s) to train, validate, and test the ADM. The ADM was studied by considering two different datasets: the Non-Anomaly Dataset (NAD), which basically contains all the frames of the FreeTrackVideo once processed by the RDM; and the Augmented Non-Anomaly Dataset (A-NAD), which contains 9000 frames, 1500 of which were extracted from the NAD, while the remaining 7500 were obtained by applying the same transformations mentioned above.
    8. Lastly, when both the RDM and the ADM were trained, the performances of the whole RDM-ADM pipeline were tested on the WithCarVideo which depicts the same scenario as the FreeTrackVideo but it also depicts a car laying on the rail tracks (i.e., an obstacle).
  18. d

    The National Artificial Intelligence Research And Development Strategic Plan...

    • catalog.data.gov
    • datadiscoverystudio.org
    • +2more
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCO NITRD (2023). The National Artificial Intelligence Research And Development Strategic Plan [Dataset]. https://catalog.data.gov/dataset/the-national-artificial-intelligence-research-and-development-strategic-plan
    Explore at:
    Dataset updated
    Oct 16, 2023
    Dataset provided by
    NCO NITRD
    Description

    Executive Summary: Artificial intelligence (AI) is a transformative technology that holds promise for tremendous societal and economic benefit. AI has the potential to revolutionize how we live, work, learn, discover, and communicate. AI research can further our national priorities, including increased economic prosperity, improved educational opportunities and quality of life, and enhanced national and homeland security. Because of these potential benefits, the U.S. government has invested in AI research for many years. Yet, as with any significant technology in which the Federal government has interest, there are not only tremendous opportunities but also a number of considerations that must be taken into account in guiding the overall direction of Federally-funded R&D in AI. On May 3, 2016,the Administration announced the formation of a new NSTC Subcommittee on Machine Learning and Artificial intelligence, to help coordinate Federal activity in AI.1 This Subcommittee, on June 15, 2016, directed the Subcommittee on Networking and Information Technology Research and Development (NITRD) to create a National Artificial Intelligence Research and Development Strategic Plan. A NITRD Task Force on Artificial Intelligence was then formed to define the Federal strategic priorities for AI R&D, with particular attention on areas that industry is unlikely to address. This National Artificial Intelligence R&D Strategic Plan establishes a set of objectives for Federallyfunded AI research, both research occurring within the government as well as Federally-funded research occurring outside of government, such as in academia. The ultimate goal of this research is to produce new AI knowledge and technologies that provide a range of positive benefits to society, while minimizing the negative impacts. To achieve this goal, this AI R&D Strategic Plan identifies the following priorities for Federally-funded AI research: Strategy 1: Make long-term investments in AI research. Prioritize investments in the next generation of AI that will drive discovery and insight and enable the United States to remain a world leader in AI. Strategy 2: Develop effective methods for human-AI collaboration. Rather than replace humans, most AI systems will collaborate with humans to achieve optimal performance. Research is needed to create effective interactions between humans and AI systems. Strategy 3: Understand and address the ethical, legal, and societal implications of AI. We expect AI technologies to behave according to the formal and informal norms to which we hold our fellow humans. Research is needed to understand the ethical, legal, and social implications of AI, and to develop methods for designing AI systems that align with ethical, legal, and societal goals. Strategy 4: Ensure the safety and security of AI systems. Before AI systems are in widespread use, assurance is needed that the systems will operate safely and securely, in a controlled, well-defined, and well-understood manner. Further progress in research is needed to address this challenge of creating AI systems that are reliable, dependable, and trustworthy. Strategy 5: Develop shared public datasets and environments for AI training and testing. The depth, quality, and accuracy of training datasets and resources significantly affect AI performance. Researchers need to develop high quality datasets and environments and enable responsible access to high-quality datasets as well as to testing and training resources. Strategy 6: Measure and evaluate AI technologies through standards and benchmarks. . Essential to advancements in AI are standards, benchmarks, testbeds, and community engagement that guide and evaluate progress in AI. Additional research is needed to develop a broad spectrum of evaluative techniques. Strategy 7: Better understand the national AI R&D workforce needs. Advances in AI will require a strong community of AI researchers. An improved understanding of current and future R&D workforce demands in AI is needed to help ensure that sufficient AI experts are available to address the strategic R&D areas outlined in this plan. The AI R&D Strategic Plan closes with two recommendations: Recommendation 1: Develop an AI R&D implementation framework to identify S&T opportunities and support effective coordination of AI R&D investments, consistent with Strategies 1-6 of this plan. Recommendation 2: Study the national landscape for creating and sustaining a healthy AI R&D workforce, consistent with Strategy 7 of this plan.

  19. Trojan Detection Software Challenge -...

    • data.nist.gov
    • catalog.data.gov
    Updated May 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Paul Majurski (2021). Trojan Detection Software Challenge - nlp-named-entity-recognition-may2021-train [Dataset]. http://doi.org/10.18434/mds2-2407
    Explore at:
    Dataset updated
    May 14, 2021
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Michael Paul Majurski
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    Round 7 Train Dataset This is the training data used to construct and evaluate trojan detection software solutions. This data, generated at NIST, consists of natural language processing (NLP) AIs trained to perform named entity recognition (NER) on English text. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 192 sentiment classification AI models using a small set of model architectures. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.

  20. u

    SynthCity Dataset - All Areas

    • rdr.ucl.ac.uk
    • figshare.com
    zip
    Updated Sep 11, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Griffiths; Jan Boehm (2019). SynthCity Dataset - All Areas [Dataset]. http://doi.org/10.5522/04/8850974.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 11, 2019
    Dataset provided by
    University College London
    Authors
    David Griffiths; Jan Boehm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist, which if proved effective can be exploited. We therefore argue that research in this domain would be hugely useful. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is labelled from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender. See our project website http://www.synthcity.xyz or paper https://arxiv.org/abs/1907.04758 for more information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). Data sources used by companies for training AI models South Korea 2023 [Dataset]. https://www.statista.com/statistics/1452822/south-korea-data-sources-for-training-artificial-intelligence-models/
Organization logo

Data sources used by companies for training AI models South Korea 2023

Explore at:
Dataset updated
Sep 19, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Sep 2023 - Nov 2023
Area covered
South Korea
Description

As of 2023, customer data was the leading source of information used to train artificial intelligence (AI) models in South Korea, with nearly 70 percent of surveyed companies answering that way. About 62 percent responded to use existing data within the company when training their AI model.

Search
Clear search
Close search
Google apps
Main menu