17 datasets found

smol
huggingface.co
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2025). smol [Dataset]. https://huggingface.co/datasets/google/smol
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2025
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SMOL

SMOL (Set for Maximal Overall Leverage) is a collection of professional translations into 221 Low-Resource Languages, for the purpose of training translation models, and otherwise increasing the representations of said languages in NLP and technology. Please read the SMOL Paper and the GATITOS Paper for a much more thorough description! There are four resources in this directory:

SmolDoc: document-level translations into 100 languages SmolSent: sentence-level translations into… See the full description on the dataset page: https://huggingface.co/datasets/google/smol.
i
Benchmark dataset for small and narrow rectangular object detection from...
ieee-dataport.org
Updated May 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zhonghua hong (2022). Benchmark dataset for small and narrow rectangular object detection from Google Earth imagery [Dataset]. https://ieee-dataport.org/documents/benchmark-dataset-small-and-narrow-rectangular-object-detection-google-earth-imagery
Explore at:
Dataset updated
May 18, 2022
Authors
zhonghua hong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The benchmark dataset are consisted of 2
h
google_mt5-small-details
huggingface.co
Updated Jul 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open LLM Leaderboard (2025). google_mt5-small-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/google_mt5-small-details
Explore at:
Dataset updated
Jul 30, 2025
Dataset authored and provided by
Open LLM Leaderboard
Description
Dataset Card for Evaluation run of google/mt5-small

Dataset automatically created during the evaluation run of model google/mt5-small The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/google_mt5-small-details.
n
Google Small Business
library.nwosu.edu
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
null (2025). Google Small Business [Dataset]. https://library.nwosu.edu/business/market
Explore at:
Dataset updated
Apr 24, 2025
Authors
null
License
https://www.youtube.com/t/termshttps://www.youtube.com/t/terms
Description
Tips from Google about marketing a small business online.
Small towns in Italy with the most Google searches per month 2023
statista.com
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Small towns in Italy with the most Google searches per month 2023 [Dataset]. https://www.statista.com/statistics/1262452/most-popular-small-towns-italy/
Explore at:
Dataset updated
Jul 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Italy
Description
A May 2024 study analyzed the small towns in Italy with a population of under **** thousand with the highest average monthly number of Google searches in 2023. Based on the analysis, *** Sicilian destinations, Favignana and San Vito Lo Capo, recorded the highest figure, each with an average of ****** monthly Google searches in 2023. Portofino in Liguria followed in the ranking, with ****** monthly Google searches on average that year.
a
Small Object Dataset
academictorrents.com
bittorrent
Updated Jun 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zheng Ma and Lei Yu and Antoni B. Chan (2017). Small Object Dataset [Dataset]. https://academictorrents.com/details/8e751c111cf90123374b5f0cf61e6af9f5e5231e
Explore at:
bittorrent(5858609)Available download formats
Dataset updated
Jun 6, 2017
Dataset authored and provided by
Zheng Ma and Lei Yu and Antoni B. Chan
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Images of small objects for small instance detections. Currently four object types are available. ![]() We collect four datasets of small objects from images/videos on the Internet (e.g.YouTube or Google). Fly Dataset: contains 600 video frames with an average of 86 ± 39 flies per frame (648×72 @ 30 fps). 32 images are used for training (1:6:187) and 50 images for testing (301:6:600). Honeybee Dataset: contains 118 images with an average of 28 ± 6 honeybees per image (640×480). The dataset is divided evenly for training and test sets. Only the first 32 images are used for training. Fish Dataset: contains 387 frames of video with an average of 56±9 fish per frame (300×410 @ 30 fps). 32 images are used for training (1:3:94) and 65 for testing (193:3:387). Seagull Dataset: contains three high-resolution images (624×964) with an average of 866±107 seagulls per image. The first image is used for training, and the res
h
google_flan-t5-small-details
huggingface.co
Updated Jul 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open LLM Leaderboard (2025). google_flan-t5-small-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/google_flan-t5-small-details
Explore at:
Dataset updated
Jul 30, 2025
Dataset authored and provided by
Open LLM Leaderboard
Description
Dataset Card for Evaluation run of google/flan-t5-small

Dataset automatically created during the evaluation run of model google/flan-t5-small The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/google_flan-t5-small-details.
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Moldova (Republic of), Bangladesh, Canada, Isle of Man, British Indian Ocean Territory, Andorra, Taiwan, Northern Mariana Islands, Nepal, Tunisia
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
f
Table_1_Does “Dr. Google” improve discussion and decisions in small animal...
figshare.com
docx
Updated Jun 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Svenja Springer; Thomas Bøker Lund; Sandra A. Corr; Peter Sandøe (2024). Table_1_Does “Dr. Google” improve discussion and decisions in small animal practice? Dog and cat owners use of internet resources to find medical information about their pets in three European countries.docx [Dataset]. http://doi.org/10.3389/fvets.2024.1417927.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fvets.2024.1417927.s002
Dataset updated
Jun 19, 2024
Dataset provided by
Frontiers
Authors
Svenja Springer; Thomas Bøker Lund; Sandra A. Corr; Peter Sandøe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modern dog and cat owners increasingly use internet resources to obtain information on pet health issues. While access to online information can improve owners’ knowledge of patient care and inform conversations with their veterinarian during consultations, there is also a risk that owners will misinterpret online information or gain a false impression of current standards in veterinary medicine. This in turn can cause problems or tensions, for example if the owner delays consulting their veterinarian about necessary treatment, or questions the veterinarian’s medical advice. Based on an online questionnaire aimed at dog and cat owners in Austria, Denmark and the United Kingdom (N = 2117) we investigated the use of internet resources to find veterinary medical information, the type of internet resources that were used, and whether owner beliefs explain how often they used the internet to find medical information about their pet. Approximately one in three owners reported that they never used internet resources prior to (31.7%) or after (37.0%) a consultation with their veterinarian. However, when owners do make use of the internet, our results show that they were more likely to use it before than after the consultation. The most common internet resources used by owners were practice websites (35.0%), veterinary association websites (24.0%), or ‘other’ websites providing veterinary information (55.2%). Owners who believe that the use of internet resources enables them to have a more informed discussion with their veterinarians more often use internet resources prior to a consultation, whereas owners who believed that internet resources help them to make the right decision for their animal more often use internet resources after a consultation. The results suggest that veterinarians should actively ask pet owners if they use internet resources, and what resources they use, in order to facilitate open discussion about information obtained from the internet. Given that more than a third of pet owners use practice websites, the findings also suggest that veterinarians should actively curate their own websites where they can post information that they consider accurate and trustworthy.
Google: global corporate demography 2014-2024, by gender
statista.com
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Google: global corporate demography 2014-2024, by gender [Dataset]. https://www.statista.com/statistics/311800/google-employee-gender-global/
Explore at:
Dataset updated
Oct 28, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
As of January 2024, the majority of Google employees worldwide, almost 66 percent, were male. The distribution of male and female employees at Google hasn’t seen a big change over the recent years. In 2014 the share of female employees at Google was 30.6 percent. In 2021 this number has increased by only 3 percent. Considering that the total number of Google employees increased greatly between the years 2007 and 2020, the female quota among the employees had seen rather a small increase. Google as a company Google is a diverse internet company that provides a wide range of digital products and services. In 2022, the company’s global revenue was over 279 billion U.S. dollars. Most of its revenue, around 305 billion U.S. dollars, was from advertising. Among its services, the most popular ones are YouTube and Google Play. Male and female employees at tech companies Google is not the only tech company with a lower number of female employees. This pattern can be seen in other big tech companies too. In 2019, in a ranking of 20 leading tech companies worldwide, only 23andMe had more than a 50 percent share of female employees. The majority of tech companies in the ranking have far more male than female employees.
G
Google Workspace Business Tool Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Google Workspace Business Tool Report [Dataset]. https://www.marketresearchforecast.com/reports/google-workspace-business-tool-27470
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Mar 5, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Google Workspace Business Tools market, encompassing applications like Gmail, Docs, Sheets, and Drive, is experiencing robust growth fueled by the increasing adoption of cloud-based solutions and the rising demand for collaborative work environments. The market's expansion is driven by several key factors, including the enhanced productivity and efficiency offered by integrated tools, the accessibility provided by mobile and web interfaces, and the growing need for secure data storage and sharing. While precise market sizing data is not provided, considering the extensive market penetration of Google Workspace and the overall growth in the SaaS (Software as a Service) market, a reasonable estimate for the 2025 market value would be in the range of $10 billion to $15 billion, potentially reaching $20 billion by 2030. This estimate considers factors such as the robust growth in cloud computing, the increasing number of businesses adopting digital workspaces, and the global expansion of internet connectivity. Growth is primarily driven by adoption among small and medium-sized enterprises (SMEs), given Google Workspace's competitive pricing and ease of use compared to more complex enterprise solutions. However, large enterprises contribute significantly to the overall market value due to their higher purchasing power and complex business needs that Google Workspace addresses with its advanced features and integrations. The market faces some challenges, including competition from established players like Microsoft 365 and Salesforce, as well as security concerns related to data breaches and privacy. However, Google's continuous innovation, ongoing improvements to security protocols, and strategic partnerships are mitigating these risks. Future growth will likely be driven by further integration with other Google services, the expansion of AI-powered features, and increasing demand for tailored solutions for specific industries. This will solidify Google Workspace's position as a leading provider of collaborative business tools and further expand its market share. Regionally, North America and Europe will continue to dominate the market, owing to high levels of digitalization and adoption of cloud technologies. However, rapid growth is anticipated in the Asia-Pacific region driven by increasing internet penetration and economic growth in emerging markets.

Global Google Business View Market Research Report: By Service Type...

wiseguyreports.com

Updated Aug 10, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Google Business View Market Research Report: By Service Type (Photography, Virtual Tours, 360-Degree Images, Floor Plans), By Business Size (Small Businesses, Medium-Sized Businesses, Large Enterprises), By Industry (Hospitality, Retail, Healthcare, Education, Real Estate) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/google-business-view-market

Explore at:

Dataset updated

Aug 10, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 8, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	4.62(USD Billion)
MARKET SIZE 2024	5.14(USD Billion)
MARKET SIZE 2032	12.2(USD Billion)
SEGMENTS COVERED	Service Type ,Business Size ,Industry ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	Rising Adoption of Digital Marketing Technological Advancements Virtual Reality Integration Growing Popularity of 3D Virtual Tours Increased Focus on Customer Engagement
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Yuneec ,3D Robotics ,Sony ,Matterport ,Capture3D ,Autel Robotics ,Skyline Multimedia ,FlyCAM ,DJI ,Parrot ,Aeryon Labs ,DroneDeploy ,Pix4D ,GoPro
MARKET FORECAST PERIOD	2025 - 2032
KEY MARKET OPPORTUNITIES	1 Expanding ecommerce industry 2 Growing demand for virtual tours 3 VRAR integration opportunities 4 Personalized customer experiences
COMPOUND ANNUAL GROWTH RATE (CAGR)	11.39% (2025 - 2032)

Leading search engine providers used by SMEs in the U.S. 2016
statista.com
Updated Nov 25, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2016). Leading search engine providers used by SMEs in the U.S. 2016 [Dataset]. https://www.statista.com/statistics/642282/us-search-engine-providers-used-by-small-to-medium-sized-enterprises/
Explore at:
Dataset updated
Nov 25, 2016
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 16, 2016 - Nov 21, 2016
Area covered
United States
Description
This statistic shows the leading search engine providers used by small to medium sized enterprise (SME) owners in the United States in order to be found more quickly as of *************. During the Statista survey conducted in *************, ** percent of responding SME owners said that they had paid or were considering to pay Google in order to be found more quickly in their search engine.
h
flan-t5-small-embed-refinedweb
huggingface.co
Updated Jun 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maxine (2023). flan-t5-small-embed-refinedweb [Dataset]. https://huggingface.co/datasets/crumb/flan-t5-small-embed-refinedweb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2023
Authors
maxine
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
All of the data together is around 41GB. It's the last hidden states of 131,072 samples from refinedweb padded/truncated to 512 tokens on the left, fed through google/flan-t5-small. Structure: { "encoding": List, shaped (512, 512) aka (tokens, d_model), "text": String, the original text that was encoded, "attention_mask": List, binary mask to pass to your model with encoding to not attend to pad tokens }

just a tip, you cannot load this with the RAM in the free ver of google colab, not… See the full description on the dataset page: https://huggingface.co/datasets/crumb/flan-t5-small-embed-refinedweb.
speech_commands
huggingface.co
tensorflow.org
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google, speech_commands [Dataset]. https://huggingface.co/datasets/google/speech_commands
Explore at:
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a set of one-second .wav audio files, each containing a single spoken English word or background noise. These words are from a small set of commands, and are spoken by a variety of different speakers. This data set is designed to help train simple machine learning models. This dataset is covered in more detail at https://arxiv.org/abs/1804.03209.

Version 0.01 of the data set (configuration "v0.01") was released on August 3rd 2017 and contains 64,727 audio files.

In version 0.01 thirty different words were recoded: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".

In version 0.02 more words were added: "Backward", "Forward", "Follow", "Learn", "Visual".

In both versions, ten of them are used as commands by convention: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". Other words are considered to be auxiliary (in current implementation it is marked by True value of "is_unknown" feature). Their function is to teach a model to distinguish core words from unrecognized ones.

The _silence_ class contains a set of longer audio clips that are either recordings or a mathematical simulation of noise.
h
kokborok
huggingface.co
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dbr (2025). kokborok [Dataset]. https://huggingface.co/datasets/sdmy/kokborok
Explore at:
Dataset updated
Jun 2, 2025
Authors
dbr
Description
Kokborok Digitalisation Project

The Kokborok Digitalisation Project is an initiative to curate and enhance parallel data for the Kokborok-English language pair. This project builds upon the SMOL dataset by Google, available on Hugging Face, and involves modifying and correcting it to better reflect the nuances of the local Kokborok dialect.

From the Author

"Language is a living, breathing entity—constantly evolving, shaping cultures, and connecting generations. When we… See the full description on the dataset page: https://huggingface.co/datasets/sdmy/kokborok.

Global Generative Ai For Business Market Research Report: By Application...

wiseguyreports.com

Updated Aug 10, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Generative Ai For Business Market Research Report: By Application (Content and media generation, Product and prototype design, Marketing and advertising, Data analysis and insights, Customer service and engagement), By Type (Text-based, Image-based, Audio-based, Video-based, Multi-modal), By Industry (Healthcare, Financial services, Manufacturing, Retail, Technology), By Deployment Model (Cloud-based, On-premise, Hybrid), By End User (Large enterprises, Small and medium-sized businesses (SMBs), Independent professionals) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/generative-ai-for-business-market

Explore at:

Dataset updated

Aug 10, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 8, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	34.07(USD Billion)
MARKET SIZE 2024	39.85(USD Billion)
MARKET SIZE 2032	139.6(USD Billion)
SEGMENTS COVERED	Application ,Type ,Industry ,Deployment Model ,End User ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	Growing demand for personalized content Increasing use of AIpowered tools in businesses Advancements in generative AI technology Government initiatives to promote AI adoption Partnerships and collaborations between tech companies
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Microsoft ,Google ,OpenAI ,Meta Platforms ,BigScience ,Teradata ,Adobe ,Tencent ,IBM ,Alibaba ,C3.ai ,Baidu ,Salesforce ,Amazon ,NVIDIA
MARKET FORECAST PERIOD	2025 - 2032
KEY MARKET OPPORTUNITIES	Content Creation Marketing Automation Sales Optimization Product Development Customer Service
COMPOUND ANNUAL GROWTH RATE (CAGR)	16.97% (2025 - 2032)

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Google (2025). smol [Dataset]. https://huggingface.co/datasets/google/smol

smol

Smol

google/smol

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 28, 2025

Dataset authored and provided by

Googlehttp://google.com/

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

SMOL

SMOL (Set for Maximal Overall Leverage) is a collection of professional translations into 221 Low-Resource Languages, for the purpose of training translation models, and otherwise increasing the representations of said languages in NLP and technology. Please read the SMOL Paper and the GATITOS Paper for a much more thorough description! There are four resources in this directory:

SmolDoc: document-level translations into 100 languages SmolSent: sentence-level translations into… See the full description on the dataset page: https://huggingface.co/datasets/google/smol.

Clear search

Close search

Google apps

Main menu

smol

Benchmark dataset for small and narrow rectangular object detection from...

google_mt5-small-details

Google Small Business

Small towns in Italy with the most Google searches per month 2023

Small Object Dataset

google_flan-t5-small-details

Company Datasets for Business Profiling

Table_1_Does “Dr. Google” improve discussion and decisions in small animal...

Google: global corporate demography 2014-2024, by gender

Google Workspace Business Tool Report

Global Google Business View Market Research Report: By Service Type...

Leading search engine providers used by SMEs in the U.S. 2016

flan-t5-small-embed-refinedweb

speech_commands

kokborok

Global Generative Ai For Business Market Research Report: By Application...

smol

Smol

google/smol