32 datasets found

d
The National Artificial Intelligence Research And Development Strategic Plan...
catalog.data.gov
datadiscoverystudio.org
+3more
Updated May 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCO NITRD (2025). The National Artificial Intelligence Research And Development Strategic Plan [Dataset]. https://catalog.data.gov/dataset/the-national-artificial-intelligence-research-and-development-strategic-plan
Explore at:
Dataset updated
May 14, 2025
Dataset provided by
NCO NITRD
Description
Executive Summary: Artificial intelligence (AI) is a transformative technology that holds promise for tremendous societal and economic benefit. AI has the potential to revolutionize how we live, work, learn, discover, and communicate. AI research can further our national priorities, including increased economic prosperity, improved educational opportunities and quality of life, and enhanced national and homeland security. Because of these potential benefits, the U.S. government has invested in AI research for many years. Yet, as with any significant technology in which the Federal government has interest, there are not only tremendous opportunities but also a number of considerations that must be taken into account in guiding the overall direction of Federally-funded R&D in AI. On May 3, 2016,the Administration announced the formation of a new NSTC Subcommittee on Machine Learning and Artificial intelligence, to help coordinate Federal activity in AI.1 This Subcommittee, on June 15, 2016, directed the Subcommittee on Networking and Information Technology Research and Development (NITRD) to create a National Artificial Intelligence Research and Development Strategic Plan. A NITRD Task Force on Artificial Intelligence was then formed to define the Federal strategic priorities for AI R&D, with particular attention on areas that industry is unlikely to address. This National Artificial Intelligence R&D Strategic Plan establishes a set of objectives for Federallyfunded AI research, both research occurring within the government as well as Federally-funded research occurring outside of government, such as in academia. The ultimate goal of this research is to produce new AI knowledge and technologies that provide a range of positive benefits to society, while minimizing the negative impacts. To achieve this goal, this AI R&D Strategic Plan identifies the following priorities for Federally-funded AI research: Strategy 1: Make long-term investments in AI research. Prioritize investments in the next generation of AI that will drive discovery and insight and enable the United States to remain a world leader in AI. Strategy 2: Develop effective methods for human-AI collaboration. Rather than replace humans, most AI systems will collaborate with humans to achieve optimal performance. Research is needed to create effective interactions between humans and AI systems. Strategy 3: Understand and address the ethical, legal, and societal implications of AI. We expect AI technologies to behave according to the formal and informal norms to which we hold our fellow humans. Research is needed to understand the ethical, legal, and social implications of AI, and to develop methods for designing AI systems that align with ethical, legal, and societal goals. Strategy 4: Ensure the safety and security of AI systems. Before AI systems are in widespread use, assurance is needed that the systems will operate safely and securely, in a controlled, well-defined, and well-understood manner. Further progress in research is needed to address this challenge of creating AI systems that are reliable, dependable, and trustworthy. Strategy 5: Develop shared public datasets and environments for AI training and testing. The depth, quality, and accuracy of training datasets and resources significantly affect AI performance. Researchers need to develop high quality datasets and environments and enable responsible access to high-quality datasets as well as to testing and training resources. Strategy 6: Measure and evaluate AI technologies through standards and benchmarks. . Essential to advancements in AI are standards, benchmarks, testbeds, and community engagement that guide and evaluate progress in AI. Additional research is needed to develop a broad spectrum of evaluative techniques. Strategy 7: Better understand the national AI R&D workforce needs. Advances in AI will require a strong community of AI researchers. An improved understanding of current and future R&D workforce demands in AI is needed to help ensure that sufficient AI experts are available to address the strategic R&D areas outlined in this plan. The AI R&D Strategic Plan closes with two recommendations: Recommendation 1: Develop an AI R&D implementation framework to identify S&T opportunities and support effective coordination of AI R&D investments, consistent with Strategies 1-6 of this plan. Recommendation 2: Study the national landscape for creating and sustaining a healthy AI R&D workforce, consistent with Strategy 7 of this plan.
AI corporate investment worldwide 2015-2022
statista.com
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). AI corporate investment worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/941137/ai-investment-and-funding-worldwide/
Explore at:
Dataset updated
Aug 12, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2022, the global total corporate investment in artificial intelligence (AI) reached almost 92 billion U.S. dollars, a slight decrease from the previous year. In 2018, the yearly investment in AI saw a slight downturn, but that was only temporary. Private investments account for a bulk of total AI corporate investment. AI investment has increased more than sixfold since 2016, a staggering growth in any market. It is a testament to the importance of the development of AI around the world.

What is Artificial Intelligence (AI)?

Artificial intelligence, once the subject of people’s imaginations and the main plot of science fiction movies for decades, is no longer a piece of fiction, but rather commonplace in people’s daily lives whether they realize it or not. AI refers to the ability of a computer or machine to imitate the capacities of the human brain, which often learns from previous experiences to understand and respond to language, decisions, and problems. These AI capabilities, such as computer vision and conversational interfaces, have become embedded throughout various industries’ standard business processes.

AI investment and startups

The global AI market, valued at 142.3 billion U.S. dollars as of 2023, continues to grow driven by the influx of investments it receives. This is a rapidly growing market, looking to expand from billions to trillions of U.S. dollars in market size in the coming years. From 2020 to 2022, investment in startups globally, and in particular AI startups, increased by five billion U.S. dollars, nearly double its previous investments, with much of it coming from private capital from U.S. companies. The most recent top-funded AI businesses are all machine learning and chatbot companies, focusing on human interface with machines.

The anatomy of Green AI technologies: structure, evolution, and impact -...

zenodo.org

bin, csv

Updated May 30, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Lorenzo Emer; Lorenzo Emer; Andrea Mina; Andrea Vandin; Andrea Vandin; Andrea Mina (2025). The anatomy of Green AI technologies: structure, evolution, and impact - Dataset and Replicability Material [Dataset]. http://doi.org/10.5281/zenodo.15545361

Explore at:

csv, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15545361

Dataset updated

May 30, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Lorenzo Emer; Lorenzo Emer; Andrea Mina; Andrea Vandin; Andrea Vandin; Andrea Mina

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Accompanying material for the paper "The anatomy of Green AI technologies: structure, evolution, and impact" (2025).

Dataset Construction

The Green AI Patent Dataset comprises 63 326 unique U.S. patents that intersect environmental (“green”) technologies with artificial‐intelligence components, spanning from 1976 to 2023. It was assembled by combining:

PatentsView (USPTO) – U.S. patents (snapshot of January 2025) labelled under Cooperative Patent Classification classes Y02 and Y04S for climate‐change mitigation/adaptation and smart‐grid technologies.
Artificial Intelligence Patent Dataset (AIPD 2023 - most recent update) – USPTO’s machine‐learning–validated classification of AI‐related patents (predict50_any_ai = 1). Available here: Pairolero, N. et al. The artificial intelligence patent dataset (aipd) 2023 update. USPTO Economic Working Paper 2024-4,
USPTO (2024). Available at https://www.uspto.gov/sites/default/files/documents/oce-aipd-2023.pdf.

Variables

Variable	Description	Completeness (non-null count)
patent_id	Unique USPTO patent identifier.	63 326
cpc_subclass	Subclasses of "green" CPC taxonomy Y02 / Y04S. Refer to the USTPO's website for more details: https://www.uspto.gov/web/patents/classification/cpc/html/cpc-Y.html	63 326
patent_date	Grant date of the patent (YYYY-MM-DD).	63 326
patent_title	Title of the patent.	63 326
assignee	Disambiguated assignee organization name.	59 479
country	Disambiguated assignee country.	59 155
forward_citations	Number of times this patent is cited by later patents (forward citations).	63 326
tech_domain	BERTOPIC‐derived technology domain (integer 0–15; –1 marks outliers).	62 337
real_value	Market‐value proxy associated with the patent, derived from the updated dataset of Kogan, L., Papanikolaou, D., Seru, A. & Stoffman, N. Technological innovation, resource allocation, and growth. The Q. J. Econ. 132, 665–712, DOI: 10.1093/qje/qjw040 (2017).	26 306

BERTOPIC Topic Mapping

Each patent was assigned to one of 16 topics (tech_domain), numbered 0–15 (with –1 for outliers). Below is the label, example keywords (with their topic cohesion scores), and the number of patents in each topic:

ID	Label	Top Keywords (score)	Count
0	Data Processing & Memory Management	processing (0.516), computing (0.461), process (0.449), systems (0.443), memory (0.421)	27 435
1	Microgrid & Distributed Energy Systems	microgrid (0.487), electricity (0.421), utility (0.401), power (0.380), energy (0.370)	5 378
2	Vehicle Control & Autonomous Powertrains	vehicle (0.477), vehicles (0.468), control (0.416), driving (0.387), engine (0.386)	3 747
3	Irrigation & Agricultural Water Mgmt	irrigation (0.511), systems (0.431), flow (0.353), process (0.348), water (0.333)	2 754
4	Photovoltaic & Electrochemical Devices	semiconductor (0.518), photoelectric (0.509), electrodes (0.487), electrode (0.473), photovoltaic (0.470)	2 599
5	Clinical Microbiome & Therapeutics	microbiome (0.481), clinical (0.371), physiological (0.321), therapeutic (0.320), disease (0.314)	2 286
6	Combustion Engine Control	combustion (0.423), engine (0.373), control (0.342), fuel (0.338), ignition (0.318)	2 179
7	Battery Charging & Management	charging (0.485), charger (0.449), charge (0.425), battery (0.386), batteries (0.377)	1 541
8	HVAC & Thermal Regulation	hvac (0.515), heater (0.474), cooling (0.471), heating (0.464), evaporator (0.455)	1 523
9	Lighting & Illumination Systems	lighting (0.621), illumination (0.601), lights (0.545), brightness (0.526), light (0.488)	1 219
10	Exhaust & Emission Treatment	exhaust (0.464), catalytic (0.446), purification (0.444), catalyst (0.366), emissions (0.365)	1 064
11	Wind Turbine & Rotor Control	turbines (0.498), turbine (0.488), windmill (0.464), wind (0.418), rotor (0.300)	988
12	Aircraft Wing Aerodynamics & Control	wing (0.450), aircraft (0.448), wingtip (0.424), apparatus (0.423), aerodynamic (0.418)	697
13	Meteorological Radar & Weather Forecasting	radar (0.541), meteorological (0.511), weather (0.412), precipitation (0.391), systems (0.372)	542
14	Fuel Cell Systems & Electrodes	fuel (0.375), cell (0.313), systems (0.295), cells (0.291), controls (0.262)	377
15	Turbine Airfoils & Cooling	airfoils (0.584), airfoil (0.572), turbine (0.433), engine (0.333), axial (0.321)	352
–1	Outliers	–	7 656

Code availability

This Zenodo entry contains topic_modeling.ipynb, a fully documented jupyter notebook containing Python code for uncovering latent themes in patent abstracts using BERTopic. It walks through text preprocessing (lowercasing, standard English stopwords plus “herein” and “invention,” tokenization, and boilerplate removal), embedding with the all-MiniLM-L6-v2 SentenceTransformer, dimensionality reduction via UMAP, clustering with HDBSCAN, and topic extraction through class-based TF-IDF. The script also executes a grid search over UMAP and HDBSCAN hyperparameters, computes UMass coherence and topic diversity for each configuration, and saves a CSV of evaluation metrics, enabling straightforward reproduction of our topic-modeling workflow.

**Note on Patent Abstracts**
The BERTopic analysis in this notebook was performed on the full text of U.S. patent abstracts. To save space and comply with memory constraints, the abstracts themselves are not included in this repository. However, they can be downloaded directly from the PatentsView portal (see “g_patent_abstract” in the data tables at https://patentsview.org/download/data-download-tables). Each record is linked to our processed dataset via the `patent_id` field, so you can seamlessly merge the raw abstracts with your local copy of the Green AI dataset before running or inspecting the topic model.

Additional analyses, such as data cleaning, merging, aggregation, and the generation of summary tables and plots, were also performed but are not included here by default, as they consist of straightforward operations using standard open-source libraries (e.g., pandas, NumPy, matplotlib, and seaborn). The full code for these steps can be made available upon request.

🤖 Students' Perceptions of AI in Education
kaggle.com
Updated Mar 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gianina-Maria Petrașcu (2023). 🤖 Students' Perceptions of AI in Education [Dataset]. https://www.kaggle.com/datasets/gianinamariapetrascu/survey-on-students-perceptions-of-ai-in-education/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gianina-Maria Petrașcu
Description
This dataset contains the results of a survey conducted on undergraduate students enrolled in the 2nd and 3rd year of study at the Faculty of Cybernetics, Statistics and Economic Informatics. The survey was conducted online and distributed through social media groups. The aim of the survey was to gather insights into students' perceptions of the role of artificial intelligence in education.

👇

Question 1: On a scale of 1 to 10, how informed do you think you are about the concept of artificial intelligence? (1-not informed at all, 10-extremely informed)

Question 2: What sources do you use to learn about the concept of artificial intelligence? -Internet -Books/Scientific papers (physical/online format) -Social media -Discussions with family/friends -I don't inform myself about AI

Question 3: Express your agreement or disagreement with the following statements: (Strongly Disagree, Partially Disagree, Neutral, Partially Agree, Fully Agree) 1. AI encourages dehumanization 2. Robots will replace people at work 3. AI helps to solve many problems in society (education, agriculture, medicine), managing time and dangerous situations more efficiently 4. AI will rule society

Question 4: Express your agreement or disagreement with the following statements: (Strongly Disagree, Partially Disagree, Neutral, Partially Agree, Fully Agree) 1. Machinery using AI is very expensive and resource intensive to build and maintain 2. AI will lead to a global economic crisis 3. AI will help global economic growth 4. AI leads to job losses

Question 5: When you think about AI do you feel: o Curiosity o Fear o Indifference o Trust

Question 6: In which areas do you think AI would have a big impact? -Education -Medicine -Agriculture -Constructions -Marketing -Public administration -Art

Question 7: On a scale of 1 to 10, how useful do you think AI would be in the educational process? (1- not useful at all, 10-extremely useful)

Question 8: What do you think is the main advantage that AI would have in the teaching process? o Teachers can be assisted by a virtual assistant for teaching lessons and answering students' questions immediately o More efficient management of teachers' time o More interactive and engaging lessons for students o Other

Question 9: What do you think is the main advantage that AI would have in the learning process? o Personalized lessons according to students' needs o Universal access for all students eager to learn, including those with special needs o More interactive and engaging lessons for students o Other

Question 10: What do you think is the main advantage that AI would have in the evaluation process? o Automation of exam grading o Fewer errors in grading system o Constant feedback from virtual assistants for each student o Other

Question 11: What do you think is the main disadvantage that AI would have in the educational process? o Lack of a relationship between students and teacher o Internet addiction o Rarer interactions between students and teachers o Loss of information caused by possible system failure

Question 12: What is your gender? o Female o Male

Question 13: What is your year of study? o Year 2 o Year 3

Question 14: What is your major? o Economic Cybernetics o Statistics and Economic Forecasting o Economic Informatics

Question 15: Did you pass all your exams? o Yes o No

Question 16: What is your GPA for your last year of study? (Note that grades are from 1 to 10 in Romania) o 5.0-5.4 o 5.5.-5.9 o 6.0-6.4 o 6.5-6.9 o 7.0-7.4 o 7.5-7.9 o 8.0-8.4 o 8.5-8.9 o 9.0-9.4 o 9.5-10
d
5.07 Employee Turnover (summary)
catalog.data.gov
performance.tempe.gov
+10more
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 5.07 Employee Turnover (summary) [Dataset]. https://catalog.data.gov/dataset/5-07-employee-turnover-summary-ff770
Explore at:
Dataset updated
Jan 17, 2025
Dataset provided by
City of Tempe
Description
Turnover data by fiscal year for the City of Tempe compared to the seven market cities which included Chandler, Gilbert, Glendale, Mesa, Phoenix, Peoria and Scottsdale. There are two totals, one with and one without retires.Please note that the Valley Benchmark Cities’ annual average is unavailable for FY 2020/2021 due to a gap in data collection during that year.Please note that corrections were made to the data, including historic data, due to additional review and research on the data on 10/2/2024.This page provides data for the Employee Turnover performance measure.The performance measure dashboard is available at 5.07 Employee Turnover.Additional InformationSource: Department ReportsContact: Lawrence La VictoireContact E-Mail: lawrence_lavictoire@tempe.govData Source Type: ExcelPreparation Method: Extracted from PeopleSoft and requested data from other cities is entered manually into a spreadsheet and calculations are conducted to determine percent of turnover per fiscal yearPublish Frequency:AnnuallyPublish Method: ManualData Dictionary
d
EnviroAtlas - Percentage of Working Age Population Who Are Employed by Block...
datasets.ai
gimi9.com
+1more
0, 23
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). EnviroAtlas - Percentage of Working Age Population Who Are Employed by Block Group for the Conterminous United States [Dataset]. https://datasets.ai/datasets/enviroatlas-percentage-of-working-age-population-who-are-employed-by-block-group-for-the-conter1
Explore at:
0, 23Available download formats
Dataset updated
Aug 9, 2024
Dataset authored and provided by
U.S. Environmental Protection Agency
Area covered
Contiguous United States, United States
Description
This EnviroAtlas dataset shows the employment rate, or the percent of the population aged 16-64 who have worked in the past 12 months. The employment rate is a measure of the percent of the working-age population who are employed. It is an indicator of the prevalence of unemployment, which is often used to assess labor market conditions by economists. It is a widely used metric to evaluate the sustainable development of communities (NRC, 2011, UNECE, 2009). This dataset is based on the American Community Survey 5-year data for 2008-2012. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

Dairy Supply Chain Sales Dataset

zenodo.org
data.niaid.nih.gov

pdf, zip

Updated Jul 12, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Dimitris Iatropoulos; Konstantinos Georgakidis; Konstantinos Georgakidis; Ilias Siniosoglou; Ilias Siniosoglou; Christos Chaschatzis; Christos Chaschatzis; Anna Triantafyllou; Anna Triantafyllou; Athanasios Liatifis; Athanasios Liatifis; Dimitrios Pliatsios; Dimitrios Pliatsios; Thomas Lagkas; Thomas Lagkas; Vasileios Argyriou; Vasileios Argyriou; Panagiotis Sarigiannidis; Panagiotis Sarigiannidis; Dimitris Iatropoulos (2024). Dairy Supply Chain Sales Dataset [Dataset]. http://doi.org/10.21227/smv6-z405

Explore at:

zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.21227/smv6-z405

Dataset updated

Jul 12, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

2. Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

3. Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File	Period	Number of Samples (days)
product 1 2020.xlsx	01/01/2020–31/12/2020	363
product 1 2021.xlsx	01/01/2021–31/12/2021	364
product 1 2022.xlsx	01/01/2022–31/12/2022	365
product 2 2020.xlsx	01/01/2020–31/12/2020	363
product 2 2021.xlsx	01/01/2021–31/12/2021	364
product 2 2022.xlsx	01/01/2022–31/12/2022	365
product 3 2020.xlsx	01/01/2020–31/12/2020	363
product 3 2021.xlsx	01/01/2021–31/12/2021	364
product 3 2022.xlsx	01/01/2022–31/12/2022	365
product 4 2020.xlsx	01/01/2020–31/12/2020	363
product 4 2021.xlsx	01/01/2021–31/12/2021	364
product 4 2022.xlsx	01/01/2022–31/12/2022	364
product 5 2020.xlsx	01/01/2020–31/12/2020	363
product 5 2021.xlsx	01/01/2021–31/12/2021	364
product 5 2022.xlsx	01/01/2022–31/12/2022	365
product 6 2020.xlsx	01/01/2020–31/12/2020	362
product 6 2021.xlsx	01/01/2021–31/12/2021	364
product 6 2022.xlsx	01/01/2022–31/12/2022	365
product 7 2020.xlsx	01/01/2020–31/12/2020	362
product 7 2021.xlsx	01/01/2021–31/12/2021	364
product 7 2022.xlsx	01/01/2022–31/12/2022	365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature	Description	Unit
Day	day of the month	-
Month	Month	-
Year	Year	-
daily_unit_sales	Daily sales - the amount of products, measured in units, that during that specific day were sold	units
previous_year_daily_unit_sales	Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year	units
percentage_difference_daily_unit_sales	The percentage difference between the two above values	%
daily_unit_sales_kg	The amount of products, measured in kilograms, that during that specific day were sold	kg
previous_year_daily_unit_sales_kg	Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year	kg
percentage_difference_daily_unit_sales_kg	The percentage difference between the two above values	kg
daily_unit_returns_kg	The percentage of the products that were shipped to selling points and were returned	%
previous_year_daily_unit_returns_kg	The percentage of the products that were shipped to

A
‘Startup Success Prediction’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Startup Success Prediction’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-startup-success-prediction-92c8/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Startup Success Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/manishkc06/startup-success-prediction on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

A startup or start-up is a company or project begun by an entrepreneur to seek, develop, and validate a scalable economic model. While entrepreneurship refers to all new businesses, including self-employment and businesses that never intend to become registered, startups refer to new businesses that intend to grow large beyond the solo founder. Startups face high uncertainty and have high rates of failure, but a minority of them do go on to be successful and influential. Some startups become unicorns: privately held startup companies valued at over US$1 billion. [Source of information: Wikipedia] https://images.unsplash.com/photo-1556761175-5973dc0f32e7?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=500&q=60" alt="startup image"> Startups play a major role in economic growth. They bring new ideas, spur innovation, create employment thereby moving the economy. There has been an exponential growth in startups over the past few years. Predicting the success of a startup allows investors to find companies that have the potential for rapid growth, thereby allowing them to be one step ahead of the competition.

Objective

The objective is to predict whether a startup which is currently operating turns into a success or a failure. The success of a company is defined as the event that gives the company's founders a large sum of money through the process of M&A (Merger and Acquisition) or an IPO (Initial Public Offering). A company would be considered as failed if it had to be shut down.

About the Data

The data contains industry trends, investment insights and individual company information. There are 48 columns/features. Some of the features are:

age_first_funding_year – quantitative

age_last_funding_year – quantitative

relationships – quantitative

funding_rounds – quantitative

funding_total_usd – quantitative

milestones – quantitative

age_first_milestone_year – quantitative

age_last_milestone_year – quantitative

state – categorical

industry_type – categorical

has_VC – categorical

has_angel – categorical

has_roundA – categorical

has_roundB – categorical

has_roundC – categorical

has_roundD – categorical

avg_participants – quantitative

is_top500 – categorical

status(acquired/closed) – categorical (the target variable, if a startup is ‘acquired’ by some other organization, means the startup succeed)

Acknowledgements

I would like to thank Ramkishan Panthena, for providing us this dataset. He is a Machine Learning Engineer at GMO.

This dataset was used in data sprint #5 at DPhi.

Inspiration

Predicting the success of a startup allows investors to find companies that have the potential for rapid growth, thereby allowing them to be one step ahead of the competition.

--- Original source retains full ownership of the source dataset ---
d
EnviroAtlas - Commute Time to Work by Census Block Group for the...
datasets.ai
gimi9.com
+2more
0, 23
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). EnviroAtlas - Commute Time to Work by Census Block Group for the Conterminous United States [Dataset]. https://datasets.ai/datasets/enviroatlas-commute-time-to-work-by-census-block-group-for-the-conterminous-united-states1
Explore at:
0, 23Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
U.S. Environmental Protection Agency
Area covered
Contiguous United States, United States
Description
This EnviroAtlas dataset portrays the commute time of workers to their workplace for each Census Block Group (CBG) during 2008-2012. Data were compiled from the Census ACS (American Community Survey) 5-year Summary Data. The commute time is the amount of travel time in minutes for workers to get from home to work. This value includes private vehicle use, carpooling, public transit, bicycling, or walking. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
m
A Litopenaeus vannamei shrimp dataset with images and corresponding...
data.mendeley.com
Updated May 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Joaquín Ramírez-Coronel (2024). A Litopenaeus vannamei shrimp dataset with images and corresponding weight-size measurements for the development of artificial intelligence-based biomass estimation and organism detection algorithms [Dataset]. http://doi.org/10.17632/h8tcn6ykky.1
Explore at:
Unique identifier
https://doi.org/10.17632/h8tcn6ykky.1
Dataset updated
May 10, 2024
Authors
Fernando Joaquín Ramírez-Coronel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was compiled with the ultimate goal of developing non-invasive computer vision algorithms for assessing shrimp biometrics and biomass estimation. The main folder, labeled "DATASET," contains five sub-folders—DB1, DB2, DB3, DB4, and DB5—each filled with images of shrimps. Additionally, each sub-folder is accompanied by an Excel file that includes manually measured data for the shrimps pictured. The files are named respectively: DB1_INDUSTRIAL_FARM_1, DB2_INDUSTRIAL_FARM_2_C1, DB3_INDUSTRIAL_FARM_2_C2, DB4_ACADEMIC_POND_S1, and DB5_ACADEMIC_POND_S2.

Here’s a detailed description of the contents of each sub-folder and its corresponding Excel file:

1) DB1 includes 490 PNG images of 22 shrimps taken from one pond at an industrial farm. The associated Excel file, DB1_INDUSTRIAL_FARM_1, contains columns for: SAMPLE: Reflecting the number of individual shrimps (22 entries or rows). LENGTH (cm): Measuring from the rostrum (near the eyes) to the start of the tail. WEIGHT (g): Recorded using a scale. COMPLETE SHRIMP IMAGES: Indicates if at least one full-body image is available (1) or not (0).

2) DB2 consists of 2002 PNG images of 58 shrimps. The Excel file, DB2_INDUSTRIAL_FARM_2_C1, includes: SAMPLE: Number of shrimps (58 entries or rows). CEPHALOTHORAX (cm): Width measured at the middle. LENGTH (cm) and WEIGHT (g): Similar measurements as DB1. COMPLETE SHRIMP IMAGES: Presence (1) or absence (0) of full-body images.

3) DB3 contains 1719 PNG images of 50 shrimps, with its Excel file, DB3_INDUSTRIAL_FARM_2_C2, documenting: SAMPLE: Number of shrimps (50 entries or rows). Measurements and categories identical to DB2.

4) DB4 encompasses 635 PNG images of 20 shrimps, detailed in the Excel file DB4_ACADEMIC_POND_S1. This includes: SAMPLE: Number of shrimps (20 entries or rows). CEPHALOTHORAX (cm), LENGTH (cm), WEIGHT (g), and COMPLETE SHRIMP IMAGES: Documented as in other datasets.

5) DB5 includes 661 PNG images of 20 shrimps, with DB5_ACADEMIC_POND_S2 as the corresponding Excel file. The file mirrors the structure and measurements of DB4.

The images for each foler are named "sm_n", where m is the number of shrimp sample and n is the number of picture of that shrimp. This carefully structured dataset provides comprehensive biometric data on shrimps, facilitating the development of algorithms aimed at non-invasive measurement techniques. This will likely be pivotal in enhancing the precision of biomass estimation in aquaculture farming, utilizing advanced statistical morphology analysis and machine learning techniques.
d
Number of licensed day care center slots per 1,000 children aged 0-5 years
datasets.ai
data.chhs.ca.gov
+2more
33, 53, 57
Updated Aug 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of California (2024). Number of licensed day care center slots per 1,000 children aged 0-5 years [Dataset]. https://datasets.ai/datasets/number-of-licensed-day-care-center-slots-per-1000-children-aged-0-5-years-50e7f
Explore at:
53, 33, 57Available download formats
Dataset updated
Aug 27, 2024
Dataset authored and provided by
State of California
Description
This table contains data on the number of licensed day care center slots (facility capacity) per 1,000 children aged 0-5 years in California, its regions, counties, cities, towns, and census tracts. The table contains 2015 data, and includes type of facility (day care center or infant center). Access to child care has become a critical support for working families. Many working families find high-quality child care unaffordable, and the increasing cost of child care can be crippling for low-income families and single parents. These barriers can impact parental choices of child care. Increased availability of child care facilities can positively impact families by providing more choices of child care in terms of price and quality. Estimates for this indicator are provided for the total population, and are not available by race/ethnicity. More information on the data table and a data dictionary can be found in the Data and Resources section. The licensed day care centers table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. More information on HCI can be found here: https://www.cdph.ca.gov/Programs/OHE/CDPH%20Document%20Library/Accessible%202%20CDPH_Healthy_Community_Indicators1pager5-16-12.pdf

The format of the licensed day care centers table is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.
National Prosecutors Survey Series
catalog.data.gov
datasets.ai
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Justice Statistics (2025). National Prosecutors Survey Series [Dataset]. https://catalog.data.gov/dataset/national-prosecutors-survey-series-8cbc0
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
Bureau of Justice Statisticshttp://bjs.ojp.gov/
Description
Investigator(s): Bureau of Justice Statistics The National Survey of Prosecutors is a survey of chief prosecutors in state court systems. A chief prosecutor is an official, usually locally elected and typically with the title of district attorney or county attorney, who is in charge of a prosecutorial district made up of one or more counties, and who conducts or supervises the prosecution of felony cases in a state court system. Prosecutors in courts of limited jurisdiction, such as municipal prosecutors, were not included in the survey. The survey's purpose was to obtain detailed descriptive information on prosecutors' offices, as well as information on their policies and practices. Years Produced: Every 4 to 5 years.
Forecast revenue big data market worldwide 2011-2027
statista.com
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
Explore at:
Dataset updated
Feb 13, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027.

What is Big data?

Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets.

Big data analytics

Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
d
Overtone Journalistic Content Bot/Human Indicator Dataset
datarade.ai
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overtone (2025). Overtone Journalistic Content Bot/Human Indicator Dataset [Dataset]. https://datarade.ai/data-providers/overtone/data-products/overtone-journalistic-content-bot-human-indicator-dataset-overtone
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
Overtone
Area covered
Finland, Panama, Slovakia, Greece, South Africa, Korea (Republic of), Venezuela (Bolivarian Republic of), Russian Federation, Spain, Slovenia
Description
We indicate how likely a piece of content is computer generated or human written. Content: any text in English or Spanish, from a single sentence to articles of 1,000s words length.

Data uniqueness: we use custom built and trained NLP algorithms to assess human effort metrics that are inherent in text content. We focus on what's in the text, not metadata such as publication or engagement. Our AI algorithms are co-created by NLP & journalism experts. Our datasets have all been human-reviewed and labeled.

Dataset: CSV containing URL and/or body text, with attributed scoring as an integer and model confidence as a percentage. We ignore metadata such as author, publication, date, word count, shares and so on, to provide a clean and maximally unbiased assessment of how much human effort has been invested in content. Our data is provided in CSV/RSS/JSON format. One row = one scored article. CSV contains URL and/or body text, with attributed scoring as an integer and model confidence as a percentage.

Integrity indicators provided as integers on a 1–5 scale. We also have custom models with 35 categories that can be added on request.

Data sourcing: public websites, crawlers, scrapers, other partnerships where available. We generally can assess content behind paywalls as well as without paywalls. We source from ~4,000 news outlets, examples include: Bloomberg, CNN, BCC are one each. Countries: all English-speaking markets world-wide. Includes English-language content from non English majority regions, such as Germany, Scandinavia, Japan. Also available in Spanish on request.

Use-cases: assessing the implicit integrity and reliability of an article. There is correlation between integrity and human value: we have shown that articles scoring highly according to our scales show increased, sustained, ongoing end-user engagement. Clients also use this to assess journalistic output, publication relevance and to create datasets of 'quality' journalism.

Overtone provides a range of qualitative metrics for journalistic, newsworthy and long-form content. We find, highlight and synthesise content that shows added human effort and, by extension, added human value.
Face_Images_High_Quality_Scraped_from_Unsplash
kaggle.com
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subrahmanya Gaonkar (2025). Face_Images_High_Quality_Scraped_from_Unsplash [Dataset]. https://www.kaggle.com/datasets/subrahmanya090/face-images-high-quality-scraped-from-unsplash
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Subrahmanya Gaonkar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
UnFake - https://github.com/negativenagesh/UnFake Checkout and give star

Welcome to the UnFake Deepfake Detection Dataset, a meticulously curated collection of approximately 76000 images designed to advance research in deepfake detection. This dataset is the backbone of the "UnFake" project—a pioneering platform aimed at identifying AI-generated or manipulated images, with a focus on protecting users of platforms like Unsplash from misinformation, legal risks, and reputational harm. With the rise of deepfake technology blurring the lines between reality and fabrication, this dataset provides a robust resource for training and evaluating models to distinguish real images from their AI-crafted counterparts.

The dataset comprises real images scraped from Unsplash (approximately 76,000). It primarily focuses on human faces and bodies, spanning a wide range of categories to ensure diversity and real-world applicability.

Dataset Composition Total Images: ~76000 Sources: Unsplash: ~76,000 real images scraped via the Unsplash API, representing high-quality, royalty-free photography.

Categories The dataset is categorized to reflect real-world diversity and complexity, making it suitable for training robust deepfake detection models:

General Human Faces: Standard face images as a baseline.

Ethnic Diversity: Includes faces from Asian, Black, Caucasian, and other ethnic backgrounds.

Facial Features: Covers variations like bearded, freckled, wrinkled, and spectacled faces.

Age Variation: Features child faces alongside adult ones.

Pose & Composition: Includes close-up shots and headshot/portrait-style images.

Purpose This dataset was created to address the growing challenge of deepfake proliferation on platforms hosting billions of images, such as Unsplash. With over 5 million photos and 13 billion monthly impressions (source: Unsplash), Unsplash is a vital resource for designers, marketers, educators, and more. However, the lack of transparency about image authenticity poses risks—misinformation, copyright violations, defamation, and more. This dataset powers the "UnFake" platform, integrating deepfake detection into the image-downloading process, and is now shared with the Kaggle community to foster innovation in AI-driven media authentication.

Key Features Size: ~76K images, offering a substantial volume for training and testing. Diversity: Broad representation across ethnicity, age, and facial characteristics. Real-World Relevance: Sourced from Unsplash, a widely-used platform, plus synthetic deepfakes mimicking modern AI techniques.

Potential Use Cases Deepfake Detection Research: Train and benchmark convolutional neural networks (e.g., EfficientNet-B7, as used in UnFake) to classify real vs. fake images. Media Authentication: Develop tools to verify image authenticity on stock photo platforms. AI Ethics & Security: Study the implications of deepfake technology and build countermeasures. Educational Projects: Use in academic settings to explore computer vision and AI. Dataset Structure Format: Images are stored as standard image files (e.g., JPEG/PNG). Directories: Organized into real and fake subfolders for ease of use.

How It Was Built Real Images: Scraped from Unsplash using its official API, focusing on human-centric photos.

Acknowledgments Unsplash: For providing a rich source of real-world images via their API.

License This dataset is released under the Apache License Version 2.0, allowing free use, modification, and distribution. See the LICENSE file for details.

Get Started Download the dataset, explore the diversity of real images, and join the fight against manipulated media! Check out the UnFake GitHub repo - https://github.com/negativenagesh/UnFake for the full project, including code and model details.
d
1.0-second spectral response acceleration (5% of critical damping) with a 1%...
datasets.ai
s.cnmilf.com
+2more
55
Updated Oct 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2024). 1.0-second spectral response acceleration (5% of critical damping) with a 1% probability of exceedance in 1 year [Dataset]. https://datasets.ai/datasets/1-0-second-spectral-response-acceleration-5-of-critical-damping-with-a-1-probability-of-ex-59470
Explore at:
55Available download formats
Dataset updated
Oct 8, 2024
Dataset authored and provided by
Department of the Interior
Description
A one-year seismic hazard forecast for the Central and Eastern United States, based on induced and natural earthquakes, has been produced by the U.S. Geological Survey. The model assumes that earthquake rates calculated from several different time windows will remain relatively stationary and can be used to forecast earthquake hazard and damage intensity for the year 2018. This assessment is the first step in developing an operational earthquake forecast for the CEUS, and the analysis could be revised with updated seismicity and model parameters. Consensus input models consider alternative earthquake catalog durations, smoothing parameters, maximum magnitudes, and ground motion estimates, and represent uncertainties in earthquake occurrence and diversity of opinion in the science community. Near some areas of active induced earthquakes, hazard is higher than in the 2014 USGS National Seismic Hazard Model (NSHM) by more than a factor of 3; the 2014 NSHM did not consider induced earthquakes. In some areas, previously observed induced earthquakes have stopped, so the seismic hazard reverts back to the 2014 NSHM. This data set represents the results of calculations of hazard curves for a grid of points with a spacing of 0.05 degrees in latitude and longitude. This particular data set is for horizontal spectral response acceleration for 1.0-second period with a 1 percent probability of exceedance in 1 year. The data are for the Western United States and are based on the long-term 2014 National Seismic Hazard Model.
A
‘Traffic counter’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Traffic counter’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-europa-eu-traffic-counter-f557/6ee35de0/?iid=004-001&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Traffic counter’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/61a025bea95e6e49a64ed7a7 on 17 January 2022.

--- Dataset description provided by original source is as follows ---

Dictionary Public

Bordeaux Métropole has installed a network of sensors to assess the state of traffic in its territory. This dataset represents by a punctual the geolocation of these sensors on the pavement. It provides information on the identifier and type of meters. There are two types of meters: all-vehicle meters connected to the central station of the Bordeaux Métropole Circulation service via the “Gertrude” system and SIREDO stations that make it possible to distinguish light cars from heavy goods vehicles and to raise speeds.

For “BOUCLE” sensors, a “real time” data record is available at a time of 5 minutes. For sensors of the “SIREDO” type, the operating system does not now allow real-time data to be retrieved on the opendata.

The “5 min” data history has also been available since mid-October 2020.

These data correspond to the gross census of the sensors. They do not take into account any posteriori patches in the event of a specific failure or problem on the sensor. Publications produced by Bordeaux Métropole from reliable data sets may differ from the raw holdings of the data 5 min.

Sensors are magnetic loops located in the roadway to count road vehicles running on the sensor, so they can be disrupted by external events or work on public space.

Via the Bordeaux Métropole WebServices, it is possible to:

to access current values

to access values at a past T moment

do data aggregation per day, month, etc.

This dataset is available in an additional format:

Download in AutoCAD DWG format

This data set is refreshed all: 3 Minute(s). Be careful, for performance reasons, this dataset (Table, Map, Analysis and Export tabs) can be updated less frequently than the source and a deviation may exist. We also invite you to use our Webservices (see Webservices BM tab) to retrieve the freshest data.

--- Original source retains full ownership of the source dataset ---
d
Dataset: Forest growth in and around an ungulate enclosure on Northern Guam,...
datasets.ai
search.dataone.org
+1more
55
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior, Dataset: Forest growth in and around an ungulate enclosure on Northern Guam, 2005-2011 [Dataset]. https://datasets.ai/datasets/dataset-forest-growth-in-and-around-an-ungulate-enclosure-on-northern-guam-2005-2011
Explore at:
55Available download formats
Dataset authored and provided by
Department of the Interior
Area covered
Guam
Description
On an island largely devoid of native vertebrate seed dispersers, we monitored forest succession for seven years following ungulate exclusion from a 5-hectare area and adjacent plots with ungulates still present. The study site was in northern Guam on Andersen Air Force Base (13°37’N, 144°51’E) and situated on a coralline limestone plateau. We established 22 plots and six 0.25-m2 subplots to measure trees and understory canopy. Data were collected in February or March, during the dry season from 2005-2011.
d
Worldwide satellite data | Air quality | Pollutants | Particulate...
datarade.ai
.csv, .xls
Updated Jun 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caeli (2022). Worldwide satellite data | Air quality | Pollutants | Particulate matter(PM2,5 – PM10) [Dataset]. https://datarade.ai/data-products/worldwide-satellite-data-air-quality-pollutants-particu-caeli
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 16, 2022
Dataset authored and provided by
Caeli
Area covered
Ghana, Tanzania, Turkmenistan, Honduras, China, Mayotte, Slovenia, Nepal, Vanuatu, Zambia
Description
Caeli can provide this data through an API, dashboard, real-time geo map, or via datasets(.csv). In addition, all this data is available in daily, monthly and annual formats. The data can be delivered in various spatial resolutions starting from 0.001 degrees latitude and longitude (WSG 84), which roughly converts to 100X100 meter.

The Caeli datasets are often used for creating and validating various models and for training machine learning algorithms. We’ll allow you to specify your state or country, your preferred timeframe, resolution, and pollutant. Based on this information we’ll compile a reliable dataset. The measurements in de dataset can be used in determining the air quality of a region for a specific period of time. Additionally, your composite dataset can also serve for strategy and reporting purposes, such as ESG strategy, TCDF, SFDR, and sustainable decision making. The price of the dataset is based on the size of the area, the resolution chosen, and the number of years.

Additional information about particulate matter(PM2,5 – PM10): Particulate matter (PM) refers to tiny particles suspended in the air that can be inhaled into the respiratory system. PM is classified by size, with PM2.5 and PM10 referring to particles that are 2.5 micrometers and 10 micrometers in diameter, respectively. PM2.5 particles are particularly harmful because they are small enough to pass through the respiratory system and enter the bloodstream, where they can cause a variety of health problems. PM2.5 and PM10 are often used as indicators of air quality, with higher concentrations of these particles in the air being associated with increased risk of respiratory and cardiovascular diseases.

Are you interested in the pollutant particulate matter(PM2,5 – PM10) or would you like to gather more information about our opportunities? Please, do not hesitate to contact us. www.caeli.space

Sector coverage: Financial | Energy | Government | Agricultural | Health | Shipping.
A
‘Selected Video Facsimile/Slot Machine Data from Foxwoods and Mohegan Sun...
analyst-2.ai
Updated Dec 22, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2010). ‘Selected Video Facsimile/Slot Machine Data from Foxwoods and Mohegan Sun Casinos’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-selected-video-facsimile-slot-machine-data-from-foxwoods-and-mohegan-sun-casinos-85d1/b2d7a742/?iid=005-625&v=presentation
Explore at:
Dataset updated
Dec 22, 2010
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Selected Video Facsimile/Slot Machine Data from Foxwoods and Mohegan Sun Casinos’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/16616a80-923f-4991-90de-ed11bdf3d67f on 27 January 2022.

--- Dataset description provided by original source is as follows ---

Mohegan Sun Footnotes: (1) Monthly contributions are due to the State by the 15th of the following month. (2) Mohegan Sun did not include the value of eBonus credits redeemed by patrons at slot machines in its video facsimile devices Win amounts; however, the value of eBonus credits wagered was included in the reported Handle. In addition, please be advised that the Casino Hold % column amounts may be understated and the Payout % column amounts may be overstated as a result of this. (3) From July 1, 2009 to June 30, 2012, if the aggregate amount of eBonus coupons or credits actually played on the Mohegan Tribe's Video Facsimiles during a particular month exceeded 5.5% of “gross operating revenues” for that month, the Mohegan Tribe paid to the State an amount equal to twenty-five percent (25%) of such excess face amount of eBonus coupons or credits used in such calendar month (the "eBonus Contribution"). Beginning on July 1, 2012, and for all months thereafter, the aggregate amount threshold for determining the eBonus Contribution increased from 5.5% to 11% of "gross operating revenues." (4) The value of eBonus free slot play credits redeemed during February 2009 totaled $1,910,268; however, it was determined that eBonus credits redeemed were overstated by $1,460,390 for January 2008 though January 2009. February 2009 is adjusted by this amount. March 2009 was was adjusted by an additional $8,139. (5) During fiscal year 2010 the Mohegan Tribe and the State of Connecticut settled a dispute regarding the proper treatment of eBonus for the period November 2007 through June 2009. As a result of this settlement, the State of Connecticut received $5,727,731, including interest. (6) For fiscal years 2007/2008 and 2008/2009, Poker Pro Electronic Table Rake Amounts of $401,309 and $42,188, respectively, were included in the calculation to determine the amount of Slot Machine Contributions to the State of Connecticut. (7) The Mohegan Sun Casino officially opened on Saturday, October 12, 1996. On October 8-10, video facsimile/slot machines were available for actual play during pre-opening charitable gaming nights. (8) Beginning with the month of May 2001, Mohegan Sun Casino reports video facsimile/slot machine win on an accrual basis, reflecting data captured and reported by an on-line slot accounting system. Reports were previously prepared on a cash basis, based on the coin and currency removed from the machines on each gaming day. (9) Cumulative Win amount total should be reduced by $1,452,341.21 to correct for an over reporting of slot revenues for prior periods related to errors in the accrual carry forward of estimated cash on floor.

Foxwoods Footnotes: (1) Monthly contributions are due to the State by the 15th of the following month. (2) The operation of the video facsimile/slot machines began at Foxwoods on January 16, 1993. (3) Foxwoods did not include the value of Free Play coupons redeemed by patrons at slot machines in its video facsimile devices Win amounts; however, the value of Free Play coupons wagered was included in the reported Handle. In addition, please be advised that the Casino Hold % column amounts may be understated and the Payout % column amounts may be overstated as a result of this. (4) From July 1, 2009 to June 30, 2012, if the aggregate amount of Free Play coupons or credits actually played on the Mashantucket Pequot Tribe's Video Facsimiles during a particular month exceeded 5.5% of “gross operating revenues” for that month, the Mashantucket Pequot Tribe paid to the State an amount equal to twenty-five percent (25%) of such excess face amount of Free Play coupons or credits used in such calendar month (the "Free Play Contribution"). Beginning on July 1, 2012, and for all months thereafter, the aggregate amount threshold for determining the Free Play Contribution increased from 5.5% to

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

NCO NITRD (2025). The National Artificial Intelligence Research And Development Strategic Plan [Dataset]. https://catalog.data.gov/dataset/the-national-artificial-intelligence-research-and-development-strategic-plan

The National Artificial Intelligence Research And Development Strategic Plan

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

May 14, 2025

Dataset provided by

NCO NITRD

Description

Executive Summary: Artificial intelligence (AI) is a transformative technology that holds promise for tremendous societal and economic benefit. AI has the potential to revolutionize how we live, work, learn, discover, and communicate. AI research can further our national priorities, including increased economic prosperity, improved educational opportunities and quality of life, and enhanced national and homeland security. Because of these potential benefits, the U.S. government has invested in AI research for many years. Yet, as with any significant technology in which the Federal government has interest, there are not only tremendous opportunities but also a number of considerations that must be taken into account in guiding the overall direction of Federally-funded R&D in AI. On May 3, 2016,the Administration announced the formation of a new NSTC Subcommittee on Machine Learning and Artificial intelligence, to help coordinate Federal activity in AI.1 This Subcommittee, on June 15, 2016, directed the Subcommittee on Networking and Information Technology Research and Development (NITRD) to create a National Artificial Intelligence Research and Development Strategic Plan. A NITRD Task Force on Artificial Intelligence was then formed to define the Federal strategic priorities for AI R&D, with particular attention on areas that industry is unlikely to address. This National Artificial Intelligence R&D Strategic Plan establishes a set of objectives for Federallyfunded AI research, both research occurring within the government as well as Federally-funded research occurring outside of government, such as in academia. The ultimate goal of this research is to produce new AI knowledge and technologies that provide a range of positive benefits to society, while minimizing the negative impacts. To achieve this goal, this AI R&D Strategic Plan identifies the following priorities for Federally-funded AI research: Strategy 1: Make long-term investments in AI research. Prioritize investments in the next generation of AI that will drive discovery and insight and enable the United States to remain a world leader in AI. Strategy 2: Develop effective methods for human-AI collaboration. Rather than replace humans, most AI systems will collaborate with humans to achieve optimal performance. Research is needed to create effective interactions between humans and AI systems. Strategy 3: Understand and address the ethical, legal, and societal implications of AI. We expect AI technologies to behave according to the formal and informal norms to which we hold our fellow humans. Research is needed to understand the ethical, legal, and social implications of AI, and to develop methods for designing AI systems that align with ethical, legal, and societal goals. Strategy 4: Ensure the safety and security of AI systems. Before AI systems are in widespread use, assurance is needed that the systems will operate safely and securely, in a controlled, well-defined, and well-understood manner. Further progress in research is needed to address this challenge of creating AI systems that are reliable, dependable, and trustworthy. Strategy 5: Develop shared public datasets and environments for AI training and testing. The depth, quality, and accuracy of training datasets and resources significantly affect AI performance. Researchers need to develop high quality datasets and environments and enable responsible access to high-quality datasets as well as to testing and training resources. Strategy 6: Measure and evaluate AI technologies through standards and benchmarks. . Essential to advancements in AI are standards, benchmarks, testbeds, and community engagement that guide and evaluate progress in AI. Additional research is needed to develop a broad spectrum of evaluative techniques. Strategy 7: Better understand the national AI R&D workforce needs. Advances in AI will require a strong community of AI researchers. An improved understanding of current and future R&D workforce demands in AI is needed to help ensure that sufficient AI experts are available to address the strategic R&D areas outlined in this plan. The AI R&D Strategic Plan closes with two recommendations: Recommendation 1: Develop an AI R&D implementation framework to identify S&T opportunities and support effective coordination of AI R&D investments, consistent with Strategies 1-6 of this plan. Recommendation 2: Study the national landscape for creating and sustaining a healthy AI R&D workforce, consistent with Strategy 7 of this plan.

Clear search

Close search

Google apps

Main menu

The National Artificial Intelligence Research And Development Strategic Plan...

AI corporate investment worldwide 2015-2022

The anatomy of Green AI technologies: structure, evolution, and impact -...

Dataset Construction

Variables

BERTOPIC Topic Mapping

Code availability

🤖 Students' Perceptions of AI in Education

5.07 Employee Turnover (summary)

EnviroAtlas - Percentage of Working Age Population Who Are Employed by Block...

Dairy Supply Chain Sales Dataset

‘Startup Success Prediction’ analyzed by Analyst-2

Context

Objective

About the Data

Acknowledgements

Inspiration

EnviroAtlas - Commute Time to Work by Census Block Group for the...

A Litopenaeus vannamei shrimp dataset with images and corresponding...

Number of licensed day care center slots per 1,000 children aged 0-5 years

National Prosecutors Survey Series

Forecast revenue big data market worldwide 2011-2027

Overtone Journalistic Content Bot/Human Indicator Dataset

Face_Images_High_Quality_Scraped_from_Unsplash

1.0-second spectral response acceleration (5% of critical damping) with a 1%...

‘Traffic counter’ analyzed by Analyst-2

Dataset: Forest growth in and around an ungulate enclosure on Northern Guam,...

Worldwide satellite data | Air quality | Pollutants | Particulate...

‘Selected Video Facsimile/Slot Machine Data from Foxwoods and Mohegan Sun...

The National Artificial Intelligence Research And Development Strategic PlanSee More Versions

The National Artificial Intelligence Research And Development Strategic Plan