Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our most comprehensive database of AI models, containing over 800 models that are state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 300 training compute estimates.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MOSTLY AI Prize Dataset
This repository contains the dataset used in the MOSTLY AI Prize competition.
About the Competition
Generate the BEST tabular synthetic data and win 100,000 USD in cash. Competition runs for 50 days: May 14 - July 3, 2025. This competition features two independent synthetic data challenges that you can join separately:
The FLAT DATA Challenge The SEQUENTIAL DATA Challenge
For each challenge, generate a dataset with the same size and structure as… See the full description on the dataset page: https://huggingface.co/datasets/mostlyai/mostlyaiprize.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CAI-Synthetic Model
Overview
The CAI-Synthetic Model is a large language model designed to understand and respond to complex questions. This model has been fine-tuned on a synthetic dataset from Mostly AI, allowing it to engage in a variety of contexts with reliable responses. It is designed to perform well in diverse scenarios.
Base Model and Fine-Tuning
Base Model: Google/Gemma-7b
Fine-Tuning Adapter: LoRA Adapter
Synthetic Dataset: Mostly AI Synthetic… See the full description on the dataset page: https://huggingface.co/datasets/InnerI/CAI-synthetic-10k.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is an instruction dataset fine for the purpose of efficient answering to row completion prompts. See https://github.com/mostly-ai/datallm for more.
Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy
Google Gemini Statistics: In 2023, Google unveiled the most powerful AI model to date. Google Gemini is the world’s most advanced AI leaving the ChatGPT 4 behind in the line. Google has 3 different sizes of models, superior to each, and can perform tasks accordingly. According to Google Gemini Statistics, these can understand and solve complex problems related to absolutely anything. Google even said, they will develop AI in such as way that it will let you know how helpful AI is in our daily routine. Well, we hope our next generation won’t be fully dependent on such technologies, otherwise, we will lose all of our natural talent! Editor’s Choice Google Gemini can follow natural and engaging conversations. According to Google Gemini Statistics, Gemini Ultra has a 90.0% score on the MMLU benchmark for testing the knowledge of and problem-solving on subjects including history, physics, math, law, ethics, history, and medicine. If you ask Gemini what to do with your raw material, it can provide you with ideas in the form of text or images according to the given input. Gemini has outperformed ChatGPT -4 tests in the majority of the cases. According to the report this LLM is said to be unique because it can process multiple types of data at the same time along with video, images, computer code, and text. Google is considering its development as The Gemini Era, showing the importance of our AI is significant in improving our daily lives. Google Gemini can talk like a real person Gemini Ultra is the largest model and can solve extremely complex problems. Gemini models are trained on multilingual and multimodal datasets. Gemini’s Ultra performance on the MMMU benchmark has also outperformed the GPT-4V in the following results Art and Design (74.2), Business (62.7), Health and Medicine (71.3), Humanities and Social Science (78.3), and Technology and Engineering (53.00).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Which Social Media Millennials Care About Most?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/which-social-media-millennials-care-about-moste on 13 February 2022.
--- Dataset description provided by original source is as follows ---
This data was collected by Whatsgoodly, a millennial social polling company.
It was published by Brietbart on 3/17/17.
Link to article here: http://www.breitbart.com/tech/2017/03/17/report-snapchat-is-most-important-social-network-among-millennials/
This dataset was created by Adam Halper and contains around 500 samples along with Segment Type, Count, technical information and other features such as: - Segment Description - Answer - and more.
- Analyze Percentage in relation to Question
- Study the influence of Segment Type on Count
- More datasets
If you use this dataset in your research, please credit Adam Halper
--- Original source retains full ownership of the source dataset ---
This dataset contains ranks and counts for the top 25 baby names by sex for live births that occurred in California (by occurrence) based on information entered on birth certificates.
The College Scorecard is designed to increase transparency, putting the power in the hands of the public — from those choosing colleges to those improving college quality — to see how well different schools are serving their students.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘CDD46 - Population Usually Resident and Present in the State who Speak a Language other than English or Irish at Home’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/a0cbdfca-e70d-4275-b4c7-e1f65a5c1487 on 16 January 2022.
--- Dataset description provided by original source is as follows ---
Population Usually Resident and Present in the State who Speak a Language other than English or Irish at Home
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AI-powered technologies are increasingly being developed for educational purposes to contribute to students' academic performance and overall better learning outcomes. This exploratory review uses the PRISMA approach to describe how the effectiveness of AI-driven technologies is being measured, as well as the roles attributed to teachers, and the theoretical and practical contributions derived from the interventions. Findings from 48 articles highlighted that learning outcomes were more aligned with the optimization of AI systems, mostly nested in a computer science perspective, and did not consider teachers in an active role in the research. Most studies proved to be atheoretical and practical contributions were limited to enhancing the design of the AI system. We discuss the importance of developing complementary research designs for AI-powered tools to be integrated optimally into education.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘CD365 - Usually Resident and Present Population Aged 15 Years and Over Who Speak a Language Other than English or Irish at Home’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/feb0166e-42b9-4ee4-a5b6-a83f6dfdd3bb on 19 January 2022.
--- Dataset description provided by original source is as follows ---
Usually Resident and Present Population Aged 15 Years and Over Who Speak a Language Other than English or Irish at Home
--- Original source retains full ownership of the source dataset ---
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset of the most popular text-to-image prompts.
Dataset Details
Dataset Description
Curated by: kazimir.ai Funded by [optional]: [More Information Needed] Shared by [optional]: https://kazimir.ai License: apache-2.0
Dataset Sources [optional]
Repository: [More Information Needed] Paper [optional]: [More Information Needed] Demo [optional]: [More Information Needed]
Uses
Free to use.
Dataset Structure
CSV file… See the full description on the dataset page: https://huggingface.co/datasets/Kazimir-ai/text-to-image-prompts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Most common main languages’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/14f0e561-78a5-4f47-8ccf-9ae93d37e990-stadt-zurich on 16 January 2022.
--- Dataset description provided by original source is as follows ---
The 50 most common languages of 15-year-olds and elders of the permanent resident population in the city of Zurich. The analysis is based on the pooled target person dataset of the structure survey. Period: 2017 to 2019.
--- Original source retains full ownership of the source dataset ---
The Arena Cove, California Forecast Grids provides bathymetric data strictly for tsunami inundation modeling with the Method of Splitting Tsunami (MOST) model. MOST is a suite of numerical simulation codes capable of simulating three processes of tsunami evolution: generation, transoceanic propagation, and inundation of dry land. Tsunami waves are computationally propagated across a set of three nested grids (A, B, and C), each of which is successively finer in resolution, moving from offshore to onshore. Nearshore details are resolved to the point that model output can be directly compared with tide gauge observations and can provide estimates of wave arrival time, wave amplitude and simulation of wave inundation onto dry land. A Grid Resolution: 60 arc-sec. B Grid Resolution: 24 arc-sec in x direction and 18 arc-sec in y direction. C Grid Resolution: 2 arc-sec in x direction and 1.5 arc-sec in y direction.
The Florence, Oregon Forecast Model Grids provides bathymetric data strictly for tsunami inundation modeling with the Method of Splitting Tsunami (MOST) model. MOST is a suite of numerical simulation codes capable of simulating three processes of tsunami evolution: generation, transoceanic propagation, and inundation of dry land. Tsunami waves are computationally propagated across a set of three nested grids (A, B, and C), each of which is successively finer in resolution, moving from offshore to onshore. Nearshore details are resolved to the point that model output can be directly compared with tide gauge observations and can provide estimates of wave arrival time, wave amplitude and simulation of wave inundation onto dry land. A Grid Resolution: 72 arc-sec. B Grid Resolution: 12 arc-sec. C Grid Resolution: 1.8 arc-sec in the x direction. 1.2 arc sec in the y direction.
The Port San Luis, California Forecast Model Grids provides bathymetric data strictly for tsunami inundation modeling with the Method of Splitting Tsunami (MOST) model. MOST is a suite of numerical simulation codes capable of simulating three processes of tsunami evolution: generation, transoceanic propagation, and inundation of dry land. Tsunami waves are computationally propagated across a set of three nested grids (A, B, and C), each of which is successively finer in resolution, moving from offshore to onshore. Nearshore details are resolved to the point that model output can be directly compared with tide gauge observations and can provide estimates of wave arrival time, wave amplitude and simulation of wave inundation onto dry land. A Grid Resolution: 120 arc-sec. B Grid Resolution: 17.9 arc-sec. C Grid Resolution: 2 arc-sec.
The Port Orford, Oregon Forecast Model Grids provides bathymetric data strictly for tsunami inundation modeling with the Method of Splitting Tsunami (MOST) model. MOST is a suite of numerical simulation codes capable of simulating three processes of tsunami evolution: generation, transoceanic propagation, and inundation of dry land. Tsunami waves are computationally propagated across a set of three nested grids (A, B, and C), each of which is successively finer in resolution, moving from offshore to onshore. Nearshore details are resolved to the point that model output can be directly compared with tide gauge observations and can provide estimates of wave arrival time, wave amplitude and simulation of wave inundation onto dry land. A Grid Resolution: 72 arc-sec. B Grid Resolution: 12 arc-sec. C Grid Resolution: 2 arc sec.
The Eureka, California Forecast Grids provides bathymetric data strictly for tsunami inundation modeling with the Method of Splitting Tsunami (MOST) model. MOST is a suite of numerical simulation codes capable of simulating three processes of tsunami evolution: generation, transoceanic propagation, and inundation of dry land. Tsunami waves are computationally propagated across a set of three nested grids (A, B, and C), each of which is successively finer in resolution, moving from offshore to onshore. Nearshore details are resolved to the point that model output can be directly compared with tide gauge observations and can provide estimates of wave arrival time, wave amplitude and simulation of wave inundation onto dry land. A Grid Resolution: 72 arc-sec. B Grid Resolution: 18 arc-sec. C Grid Resolution: 2 arc-sec.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘CD611 - Population Aged One Year and Over Usually Resident and Present in the State who Lived Outside the State for One Year or More’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/0be1dc2c-4346-4799-8eaa-a27024aef61b on 19 January 2022.
--- Dataset description provided by original source is as follows ---
Population Aged One Year and Over Usually Resident and Present in the State who Lived Outside the State for One Year or More
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our most comprehensive database of AI models, containing over 800 models that are state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 300 training compute estimates.