Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Deep Learning data for the 2021 AMS AI Short course. Predictors are from the HREFv2 ensemble and labels from SPC storm reports. All data between 1 May 2017 and 31 August 2020.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Created a data set counting both the average and cumulative interest of Generative AI and Large Language Model (respectively) of multiple regions based off of google trends, created basic visuals for it through code.
Port of Los Angeles - Historic Tonage Data Short Ton(1920-1970)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
AI & Human Generated Text
I am Using this dataset for AI Text Detection for https://exnrt.com.
Check Original DataSet GitHub Repository Here: https://github.com/panagiotisanagnostou/AI-GA
Description
The AI-GA dataset, short for Artificial Intelligence Generated Abstracts, comprises abstracts and titles. Half of these abstracts are generated by AI, while the remaining half are original. Primarily intended for research and experimentation in natural language⌠See the full description on the dataset page: https://huggingface.co/datasets/Ateeqq/AI-and-Human-Generated-Text.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âShort-Term Industry Employment Projectionsâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/65934d1b-d469-4c0b-a41f-a109b7211f89 on 11 February 2022.
--- Dataset description provided by original source is as follows ---
Short-term Industry Projections for a 2-year time horizon are produced for the State to provide individuals and organizations with an insight into future industry trends to make informed decisions on employment opportunities and organizational program development. Short-term projections are revised annually. Data are not available for geographies below the state level, including labor market regions. Data is based on second quarter averages and may be subject to seasonality. Detail may not add to summary lines due to suppression of confidential data.
--- Original source retains full ownership of the source dataset ---
https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
US tariffs on imported hardware, software, and cloud solutions have led to increased costs for companies in the AI in data science market. These price hikes have a direct impact on both solution providers and end-users, especially those relying on international suppliers for AI-driven software and cloud infrastructure.
Financial institutions, which are key adopters of AI technologies, face higher operational expenses, potentially slowing down the adoption of AI in data science. The increased cost of cloud-based deployments, which dominate the market, further exacerbates this issue, particularly for small to medium-sized enterprises (SMEs) that may find it difficult to absorb these increased expenses.
⪤⪤⪤ Get More Detailed Insights about US Tariff Impact @ https://market.us/report/ai-in-data-science-market/free-sample/
The market's growth trajectory in North America and other regions with high reliance on US-based solutions may be negatively affected in the short term due to these tariff-induced challenges.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global synthetic data software market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 7.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 22.4% during the forecast period. The growth of this market can be attributed to the increasing demand for data privacy and security, advancements in artificial intelligence (AI) and machine learning (ML), and the rising need for high-quality data to train AI models.
One of the primary growth factors for the synthetic data software market is the escalating concern over data privacy and governance. With the rise of stringent data protection regulations like GDPR in Europe and CCPA in California, organizations are increasingly seeking alternatives to real data that can still provide meaningful insights without compromising privacy. Synthetic data software offers a solution by generating artificial data that mimics real-world data distributions, thereby mitigating privacy risks while still allowing for robust data analysis and model training.
Another significant driver of market growth is the rapid advancement in AI and ML technologies. These technologies require vast amounts of data to train models effectively. Traditional data collection methods often fall short in terms of volume, variety, and veracity. Synthetic data software addresses these limitations by creating scalable, diverse, and accurate datasets, enabling more effective and efficient model training. As AI and ML applications continue to expand across various industries, the demand for synthetic data software is expected to surge.
The increasing application of synthetic data software across diverse sectors such as healthcare, finance, automotive, and retail also acts as a catalyst for market growth. In healthcare, synthetic data can be used to simulate patient records for research without violating patient privacy laws. In finance, it can help in creating realistic datasets for fraud detection and risk assessment without exposing sensitive financial information. Similarly, in automotive, synthetic data is crucial for training autonomous driving systems by simulating various driving scenarios.
From a regional perspective, North America holds the largest market share due to its early adoption of advanced technologies and the presence of key market players. Europe follows closely, driven by stringent data protection regulations and a strong focus on privacy. The Asia Pacific region is expected to witness the highest growth rate owing to the rapid digital transformation, increasing investments in AI and ML, and a burgeoning tech-savvy population. Latin America and the Middle East & Africa are also anticipated to experience steady growth, supported by emerging technological ecosystems and increasing awareness of data privacy.
When examining the synthetic data software market by component, it is essential to consider both software and services. The software segment dominates the market as it encompasses the actual tools and platforms that generate synthetic data. These tools leverage advanced algorithms and statistical methods to produce artificial datasets that closely resemble real-world data. The demand for such software is growing rapidly as organizations across various sectors seek to enhance their data capabilities without compromising on security and privacy.
On the other hand, the services segment includes consulting, implementation, and support services that help organizations integrate synthetic data software into their existing systems. As the market matures, the services segment is expected to grow significantly. This growth can be attributed to the increasing complexity of synthetic data generation and the need for specialized expertise to optimize its use. Service providers offer valuable insights and best practices, ensuring that organizations maximize the benefits of synthetic data while minimizing risks.
The interplay between software and services is crucial for the holistic growth of the synthetic data software market. While software provides the necessary tools for data generation, services ensure that these tools are effectively implemented and utilized. Together, they create a comprehensive solution that addresses the diverse needs of organizations, from initial setup to ongoing maintenance and support. As more organizations recognize the value of synthetic data, the demand for both software and services is expected to rise, driving overall market growth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a collection of data and code used in the article AI-Boosted ESG: Transforming Enterprise ESG Performance Through Artificial Intelligence. The hypotheses of this paper include: 1. AI can promote ESG performance; 2.AI can improve ESG performance by improving green technology innovation, labor employment quality and analyst attention, as well as reducing management expense rate; 3. The enhancement effect of AI on ESG performance is more obvious in large-scale enterprises, manufacturing enterprises and enterprises in the eastern region. This dataset includes the three relevant tests above, as well as the relevant procedure codes for several robustness tests, including changing the AI word frequency statistics, using the multi-time-point difference-in-differences model, changing the model type to Tobit model, lagging one stage, and shortening the sample period. The data provided is collated to a certain extent. If you need specific original data or some other related material, you can contact corresponding author Jiayi Yu to ask for it at yu_jiayi20@126.com.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
In this Dataset contains both AI Generated Essay and Human Written Essay for Training Purpose This dataset challenge is to to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM. The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.
Dataset contains more than 28,000 essay written by student and AI generated.
Features : 1. text : Which contains essay text 2. generated : This is target label . 0 - Human Written Essay , 1 - AI Generated Essay
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is provided by AIxBlock, an unified platform for AI development and AI workflows automation. This dataset contains ~500k sentences in Chinese, making it a valuable resource for a wide range of language technology applications. All data has undergone quality assurance (QA) checks to ensure clarity, correctness, and natural phrasing. The dataset is well-suited for: Speech data generation (e.g., recording short audio clips lasting 8â30 seconds per sentence) Natural Language⌠See the full description on the dataset page: https://huggingface.co/datasets/AIxBlock/Chinese-short-sentences.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A special issue of Journal of eScience Librarianship was brought to my attention. The issue was on the topic of responsible AI in libraries and archives. I did a bit of distant reading against the issue, and outlined here are some of my take-aways. In short, AI is something to consider in Library Land, but not without some forethought.
Short Tandem Repeat DNA Internet DataBase is intended to benefit research and application of short tandem repeat DNA markers for human identity testing. Facts and sequence information on each STR system, population data, commonly used multiplex STR systems, PCR primers and conditions, and a review of various technologies for analysis of STR alleles have been included.
The Vocal Characterizer Dataset is a human nonverbal vocal sound dataset consisting of 56.7 hours of short clips from 1419 speakers, crowdsourced by the general public in South Korea and validated by the AI data platform. Also, the dataset includes metadata such as age, sex, noise level, and quality of utterance. 16 classes of Included human nonverbal sound contain âteeth-chatteringâ, âteeth-grindingâ, âtongue-clickingâ, ânose-blowingâ, âcoughingâ, âyawningâ, âthroat-clearingâ, âsighingâ, âlip-poppingâ, âlip-smackingâ, âpantingâ, âcryingâ, âlaughingâ, âsneezingâ, âmoaningâ, and âscreamingâ.
The dataset is the first dataset to the world due to its large volume, various types of nonverbal vocal cues, and various participants.
We expect that the utilization of this dataset would bring precise detection of the nonverbal vocal cues, and a better understanding of the human conversation.
We're ready to deliver further information, statistics, or samples upon request. Don't hesitate to reach out!
The dataset can be delivered as either original wav files(44,100Hz, 16-bit PCM, 1-channel) or a single compressed h5 file(resampled to 16,000Hz).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Debilitating hearing loss (HL) affects ~6% of the human population. Only 20% of the people in need of a hearing assistive device will eventually seek and acquire one. The number of people that are satisfied with their Hearing Aids (HAids) and continue using them in the long term is even lower. Understanding the personal, behavioral, environmental, or other factors that correlate with the optimal HAid fitting and with users' experience of HAids is a significant step in improving patient satisfaction and quality of life, while reducing societal and financial burden. In SMART BEAR we are addressing this need by making use of the capacity of modern HAids to provide dynamic logging of their operation and by combining this information with a big amount of information about the medical, environmental, and social context of each HAid user. We are studying hearing rehabilitation through a 12-month continuous monitoring of HL patients, collecting data, such as participants' demographics, audiometric and medical data, their cognitive and mental status, their habits, and preferences, through a set of medical devices and wearables, as well as through face-to-face and remote clinical assessments and fitting/fine-tuning sessions. Descriptive, AI-based analysis and assessment of the relationships between heterogeneous data and HL-related parameters will help clinical researchers to better understand the overall health profiles of HL patients, and to identify patterns or relations that may be proven essential for future clinical trials. In addition, the future state and behavioral (e.g., HAids Satisfiability and HAids usage) of the patients will be predicted with time-dependent machine learning models to assist the clinical researchers to decide on the nature of the interventions. Explainable Artificial Intelligence (XAI) techniques will be leveraged to better understand the factors that play a significant role in the success of a hearing rehabilitation program, constructing patient profiles. This paper is a conceptual one aiming to describe the upcoming data collection process and proposed framework for providing a comprehensive profile for patients with HL in the context of EU-funded SMART BEAR project. Such patient profiles can be invaluable in HL treatment as they can help to identify the characteristics making patients more prone to drop out and stop using their HAids, using their HAids sufficiently long during the day, and being more satisfied by their HAids experience. They can also help decrease the number of needed remote sessions with their Audiologist for counseling, and/or HAids fine tuning, or the number of manual changes of HAids program (as indication of poor sound quality and bad adaptation of HAids configuration to patients' real needs and daily challenges), leading to reduced healthcare cost.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âShort-Term Occupational Employment Projectionsâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/9809172e-5193-4a6c-8a62-b886eda39c64 on 11 February 2022.
--- Dataset description provided by original source is as follows ---
Short-term Occupational Projections for a 2-year time horizon are produced for the State to provide individuals and organizations with an occupational outlook to make informed decisions on individual career and organizational program development. Short-term projections are revised annually. Data are not available for geographies below the state level, including labor market regions. Data is based on second quarter averages and may be subject to seasonality. Detail may not add to summary lines due to suppression of data because of confidentiality and/or quality.
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was used to train CNN used in my graduation project Wireless Chess Robotic Arm (WCRA-AI for short) is a robotic arm capable of playing chess ( it can be controlled over the network as well), The CNN was used to get human moves from the physical board. it takes a single square picture as input and gives one of three outputs (empty square, white piece, black piece). we can use that to detect the human chess move since the initial board state is known.
All pictures were taken with a 5mp Raspberry Pi camera with the legacy camera settings turned off.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7966647%2Fb31d5195aea01ca0db26688f2ab40c98%2F20240110_034249.gif?generation=1704851736745352&alt=media" alt="">
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Creativity is core to being human. Generative AIâmade readily available by powerful large language models (LLMs)âholds promise for humans to be more creative by offering new ideas, or less creative by anchoring on generative AI ideas. We study the causal impact of generative AI ideas on the production of short stories in an online experiment where some writers obtained story ideas from an LLM. We find that access to generative AI ideas causes stories to be evaluated as more creative, better written, and more enjoyable, especially among less creative writers. However, generative AI-enabled stories are more similar to each other than stories by humans alone. These results point to an increase in individual creativity at the risk of losing collective novelty. This dynamic resembles a social dilemma: with generative AI, writers are individually better off, but collectively a narrower scope of novel content is produced. Our results have implications for researchers, policy-makers, and practitioners interested in bolstering creativity. Methods This dataset is based on a pre-registered, two-phase experimental online study. In the first phase of our study, we recruited a group of N=293 participants (âwritersâ) who are asked to write a short, eight sentence story. Participants are randomly assigned to one of three conditions: Human only, Human with 1 GenAI idea, and Human with 5 GenAI ideas. In our Human only baseline condition, writers are assigned the task with no mention of or access to GenAI. In the two GenAI conditions, we provide writers with the option to call upon a GenAI technology (OpenAIâs GPT-4 model) to provide a three-sentence starting idea to inspire their own story writing. In one of the two GenAI conditions (Human with 5 GenAI ideas), writers can choose to receive up to five GenAI ideas, each providing a possibly different inspiration for their story. After completing their story, writers are asked to self-evaluate their story on novelty, usefulness, and several emotional characteristics. In the second phase, the stories composed by the writers are then evaluated by a separate group of N=600 participants (âevaluatorsâ). Evaluators read six randomly selected stories without being informed about writers being randomly assigned to access GenAI in some conditions (or not). All stories are evaluated by multiple evaluators on novelty, usefulness, and several emotional characteristics. After disclosing to evaluators whether GenAI was used during the creative process, we ask evaluators to rate the extent to which ownership and hypothetical profits should be split between the writer and the AI. Finally, we elicit evaluatorsâ general views on the extent to which they believe that the use of AI in producing creative output is ethical, how story ownership and hypothetical profits should be shared between AI creators and human creators, and how AI should be credited in the involvement of the creative output. The data was collected on the online study platform Prolific. The data was then cleaned, processed and analyzed with Stata. For the Writer Study, of the 500 participants who began the study, 169 exited the study prior to giving consent, 22 were dropped for not giving consent, and 13 dropped out prior to completing the study. Three participants in the Human only condition admitted to using GenAI during their story writing exercise andâas per our pre-registrationâthey were therefore dropped from the analysis, resulting in a total number of writers and stories of 293. For the Evaluator Study, each evaluator was shown 6 stories (2 stories from each topic). The evaluations associated with the writers who did not complete the writer study and those in the Human only condition who acknowledged using AI to complete the story were dropped. Thus, there are a total of 3,519 evaluations of 293 stories made by 600 evaluators. Four evaluations remained for five evaluators, five evaluations remained for 71, and all six remained for 524 evaluators.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global short-video content parsing software market is experiencing robust growth, driven by the explosive popularity of short-form video content across social media platforms and the increasing need for efficient content moderation and analysis. This market is projected to reach a value of $2.5 billion in 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 20% from 2025 to 2033. This significant expansion is fueled by several key factors. Firstly, the rise of AI-powered video analytics allows for automated tagging, transcription, and content moderation, significantly reducing manual workload and improving efficiency for businesses and platforms. Secondly, growing concerns around misinformation, harmful content, and copyright infringement are driving demand for sophisticated parsing software capable of identifying and addressing these issues. The market segmentation reveals strong performance across various sectors, with the government and public sector, industrial, and transport and logistics sectors leading the adoption of these technologies. Camera-based systems currently dominate the market, but server-based systems are gaining traction due to their scalability and ability to process large volumes of video data. The competitive landscape is highly dynamic, with established players like IBM, Siemens, and Honeywell International alongside specialized firms like Verint Systems and Avigilon competing for market share. Geographic growth is robust across North America, Europe, and the Asia-Pacific region, with China and the United States emerging as key markets. However, challenges remain, including the need for robust data security measures and the ongoing evolution of video formats and content creation techniques, requiring continuous software updates and adaptation. The long-term outlook remains positive, with continued technological advancements and growing regulatory pressures pushing the market toward further growth. The increasing sophistication of AI and machine learning algorithms will be crucial in enabling more accurate and efficient short-video content parsing, expanding the market's potential further in the coming years.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Debilitating hearing loss (HL) affects ~6% of the human population. Only 20% of the people in need of a hearing assistive device will eventually seek and acquire one. The number of people that are satisfied with their Hearing Aids (HAids) and continue using them in the long term is even lower. Understanding the personal, behavioral, environmental, or other factors that correlate with the optimal HAid fitting and with users' experience of HAids is a significant step in improving patient satisfaction and quality of life, while reducing societal and financial burden. In SMART BEAR we are addressing this need by making use of the capacity of modern HAids to provide dynamic logging of their operation and by combining this information with a big amount of information about the medical, environmental, and social context of each HAid user. We are studying hearing rehabilitation through a 12-month continuous monitoring of HL patients, collecting data, such as participants' demographics, audiometric and medical data, their cognitive and mental status, their habits, and preferences, through a set of medical devices and wearables, as well as through face-to-face and remote clinical assessments and fitting/fine-tuning sessions. Descriptive, AI-based analysis and assessment of the relationships between heterogeneous data and HL-related parameters will help clinical researchers to better understand the overall health profiles of HL patients, and to identify patterns or relations that may be proven essential for future clinical trials. In addition, the future state and behavioral (e.g., HAids Satisfiability and HAids usage) of the patients will be predicted with time-dependent machine learning models to assist the clinical researchers to decide on the nature of the interventions. Explainable Artificial Intelligence (XAI) techniques will be leveraged to better understand the factors that play a significant role in the success of a hearing rehabilitation program, constructing patient profiles. This paper is a conceptual one aiming to describe the upcoming data collection process and proposed framework for providing a comprehensive profile for patients with HL in the context of EU-funded SMART BEAR project. Such patient profiles can be invaluable in HL treatment as they can help to identify the characteristics making patients more prone to drop out and stop using their HAids, using their HAids sufficiently long during the day, and being more satisfied by their HAids experience. They can also help decrease the number of needed remote sessions with their Audiologist for counseling, and/or HAids fine tuning, or the number of manual changes of HAids program (as indication of poor sound quality and bad adaptation of HAids configuration to patients' real needs and daily challenges), leading to reduced healthcare cost.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An understanding of the nature and function of human trust in artificial intelligence (AI) is fundamental to the safe and effective integration of these technologies into organizational settings. The Trust in Automation Scale is a commonly used self-report measure of trust in automated systems; however, it has not yet been subjected to comprehensive psychometric validation. Across two studies, we tested the capacity of the scale to effectively measure trust across a range of AI applications. Results indicate that the Trust in Automation Scale is a valid and reliable measure of human trust in AI; however, with 12 items, it is often impractical for contexts requiring frequent and minimally disruptive measurements. To address this limitation, we developed and validated a three-item version of the TIAS, the Short Trust in Automation Scale (S-TIAS). In two further studies, we tested the sensitivity of the S-TIAS to manipulations of the trustworthiness of an AI system, as well as the convergent validity of the scale and its capacity to predict intentions to rely on AI-generated recommendations. In both studies, the S-TIAS also demonstrated convergent validity and significantly predicted intentions to rely on the AI system in patterns similar to the TIAS. This suggests that the S-TIAS is a practical and valid alternative for measuring trust in automation and AI for the purposes of identifying antecedent factors of trust and predicting trust outcomes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Deep Learning data for the 2021 AMS AI Short course. Predictors are from the HREFv2 ensemble and labels from SPC storm reports. All data between 1 May 2017 and 31 August 2020.