The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global AI Training Dataset Market size will be USD 2962.4 million in 2025. It will expand at a compound annual growth rate (CAGR) of 28.60% from 2025 to 2033.
North America held the major market share for more than 37% of the global revenue with a market size of USD 1096.09 million in 2025 and will grow at a compound annual growth rate (CAGR) of 26.4% from 2025 to 2033.
Europe accounted for a market share of over 29% of the global revenue, with a market size of USD 859.10 million.
APAC held a market share of around 24% of the global revenue with a market size of USD 710.98 million in 2025 and will grow at a compound annual growth rate (CAGR) of 30.6% from 2025 to 2033.
South America has a market share of more than 3.8% of the global revenue, with a market size of USD 112.57 million in 2025 and will grow at a compound annual growth rate (CAGR) of 27.6% from 2025 to 2033.
Middle East had a market share of around 4% of the global revenue and was estimated at a market size of USD 118.50 million in 2025 and will grow at a compound annual growth rate (CAGR) of 27.9% from 2025 to 2033.
Africa had a market share of around 2.20% of the global revenue and was estimated at a market size of USD 65.17 million in 2025 and will grow at a compound annual growth rate (CAGR) of 28.3% from 2025 to 2033.
Data Annotation category is the fastest growing segment of the AI Training Dataset Market
Market Dynamics of AI Training Dataset Market
Key Drivers for AI Training Dataset Market
Government-Led Open Data Initiatives Fueling AI Training Dataset Market Growth
In recent years, Government-initiated open data efforts have strongly driven the development of the AI Training Dataset Market through offering affordable, high-quality datasets that are vital in training sound AI models. For instance, the U.S. government's drive for openness and innovation can be seen through portals such as Data.gov, which provides an enormous collection of datasets from many industries, ranging from healthcare, finance, and transportation. Such datasets are basic building blocks in constructing AI applications and training models using real-world data. In the same way, the platform data.gov.uk, run by the U.K. government, offers ample datasets to aid AI research and development, creating an environment that is supportive of technological growth. By releasing such information into the public domain, governments not only enhance transparency but also encourage innovation in the AI industry, resulting in greater demand for training datasets and helping to drive the market's growth.
India's IndiaAI Datasets Platform Accelerates AI Training Dataset Market Growth
India's upcoming launch of the IndiaAI Datasets Platform in January 2025 is likely to greatly increase the AI Training Dataset Market. The project, which is part of the government's ?10,000 crore IndiaAI Mission, will establish an open-source repository similar to platforms such as HuggingFace to enable developers to create, train, and deploy AI models. The platform will collect datasets from central and state governments and private sector organizations to provide a wide and rich data pool. Through improved access to high-quality, non-personal data, the platform is filling an important requirement for high-quality datasets for training AI models, thus driving innovation and development in the AI industry. This public initiative reflects India's determination to become a global AI hub, offering the infrastructure required to facilitate startups, researchers, and businesses in creating cutting-edge AI solutions. The initiative not only simplifies data access but also creates a model for public-private partnerships in AI development.
Restraint Factor for the AI Training Dataset Market
Data Privacy Regulations Impeding AI Training Dataset Market Growth
Strict data privacy laws are coming up as a major constraint in the AI Training Dataset Market since governments across the globe are establishing legislation to safeguard personal data. In the European Union, explicit consent for using personal data is required under the General Data Protection Regulation (GDPR), reducing the availability of datasets for training AI. Likewise, the data protection regulator in Brazil ordered Meta and others to stop the use of Brazilian personal data in training AI models due to dangers to individuals' funda...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for dwb2023/gdelt-event-2025-v4
This dataset contains global event records from the GDELT (Global Database of Events, Language, and Tone) Project for May 1 - 11, capturing real-world events and their characteristics across the globe through news media coverage.
Dataset Details
Dataset Description
The GDELT Event Database is a comprehensive repository of human societal-scale behavior and beliefs across all countries of the world, connecting every… See the full description on the dataset page: https://huggingface.co/datasets/dwb2023/gdelt-event-2025-v4.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Earth population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Earth. The dataset can be utilized to understand the population distribution of Earth by age. For example, using this dataset, we can identify the largest age group in Earth.
Key observations
The largest age group in Earth, TX was for the group of age 10 to 14 years years with a population of 102 (10.89%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Earth, TX was the 85 years and over years with a population of 4 (0.43%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Earth Population by Age. You can refer the same here
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global AI training dataset market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 6.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 20.5% from 2024 to 2032. This substantial growth is driven by the increasing adoption of artificial intelligence across various industries, the necessity for large-scale and high-quality datasets to train AI models, and the ongoing advancements in AI and machine learning technologies.
One of the primary growth factors in the AI training dataset market is the exponential increase in data generation across multiple sectors. With the proliferation of internet usage, the expansion of IoT devices, and the digitalization of industries, there is an unprecedented volume of data being generated daily. This data is invaluable for training AI models, enabling them to learn and make more accurate predictions and decisions. Moreover, the need for diverse and comprehensive datasets to improve AI accuracy and reliability is further propelling market growth.
Another significant factor driving the market is the rising investment in AI and machine learning by both public and private sectors. Governments around the world are recognizing the potential of AI to transform economies and improve public services, leading to increased funding for AI research and development. Simultaneously, private enterprises are investing heavily in AI technologies to gain a competitive edge, enhance operational efficiency, and innovate new products and services. These investments necessitate high-quality training datasets, thereby boosting the market.
The proliferation of AI applications in various industries, such as healthcare, automotive, retail, and finance, is also a major contributor to the growth of the AI training dataset market. In healthcare, AI is being used for predictive analytics, personalized medicine, and diagnostic automation, all of which require extensive datasets for training. The automotive industry leverages AI for autonomous driving and vehicle safety systems, while the retail sector uses AI for personalized shopping experiences and inventory management. In finance, AI assists in fraud detection and risk management. The diverse applications across these sectors underline the critical need for robust AI training datasets.
As the demand for AI applications continues to grow, the role of Ai Data Resource Service becomes increasingly vital. These services provide the necessary infrastructure and tools to manage, curate, and distribute datasets efficiently. By leveraging Ai Data Resource Service, organizations can ensure that their AI models are trained on high-quality and relevant data, which is crucial for achieving accurate and reliable outcomes. The service acts as a bridge between raw data and AI applications, streamlining the process of data acquisition, annotation, and validation. This not only enhances the performance of AI systems but also accelerates the development cycle, enabling faster deployment of AI-driven solutions across various sectors.
Regionally, North America currently dominates the AI training dataset market due to the presence of major technology companies and extensive R&D activities in the region. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid technological advancements, increasing investments in AI, and the growing adoption of AI technologies across various industries in countries like China, India, and Japan. Europe and Latin America are also anticipated to experience significant growth, supported by favorable government policies and the increasing use of AI in various sectors.
The data type segment of the AI training dataset market encompasses text, image, audio, video, and others. Each data type plays a crucial role in training different types of AI models, and the demand for specific data types varies based on the application. Text data is extensively used in natural language processing (NLP) applications such as chatbots, sentiment analysis, and language translation. As the use of NLP is becoming more widespread, the demand for high-quality text datasets is continually rising. Companies are investing in curated text datasets that encompass diverse languages and dialects to improve the accuracy and efficiency of NLP models.
Image data is critical for computer vision application
The Global 15x15 Minute Grids of the Downscaled GDP Based on the Special Report on Emissions Scenarios (SRES) B2 Scenario, 1990 and 2025, are geospatial distributions of Gross Domestic Product (GDP) per Unit area (GDP densities). These global grids were generated using the Country-level GDP and Downscaled Projections Based on the SRES B2 Scenario, 1990-2100 data set, and CIESIN's Gridded Population of World, Version 2 (GPWv2) data set as the base map. First, the GDP per capita was developed at a country-level for 1990 and 2025. Then the gridded GDP was developed within each country by applying the GDP per capita to each grid cell of the GPW, under the assumption that the GDP per capita was uniform within a country. This data set is produced and distributed by the Columbia University Center for International Earth Science Information Network (CIESIN).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive, up-to-date information about the top 100 Software-as-a-Service (SaaS) companies globally as of 2025. It includes detailed financial metrics, company fundamentals, and operational data that are crucial for market research, competitive analysis, investment decisions, and academic studies.
Key Features
Use Cases
Industries Covered
Enterprise Software (CRM, ERP, HR) Developer Tools & DevOps Cybersecurity Data Analytics & Business Intelligence Marketing & Sales Technology Financial Technology Communication & Collaboration E-commerce Platforms Design & Creative Tools Infrastructure & Cloud Services
Why This Dataset? The SaaS industry has grown to over $300 billion globally, with companies achieving unprecedented valuations and growth rates. This dataset captures the current state of the industry leaders, providing insights into what makes successful SaaS companies tick.
Sources/Proof of Data: Data Sources The data has been meticulously compiled from multiple authoritative sources:
Company Financial Reports (Q4 2024 - Q1 2025)
Official earnings releases and investor relations documents SEC filings for public companies
Investment Databases
Crunchbase, PitchBook, and CB Insights for funding data Venture capital and private equity announcements
Market Research Reports
Gartner, Forrester, and IDC industry analyses SaaS Capital Index and valuation reports
Industry Publications
TechCrunch, Forbes, Wall Street Journal coverage Company press releases and official announcements
Product Review Platforms
G2 Crowd ratings and reviews Capterra and GetApp user feedback
Data Verification
Cross-referenced across multiple sources for accuracy Updated with latest available information as of May 2025 Validated against official company statements where available
Comprehensive dataset covering Amazon Prime availability across 27 countries, including launch dates, pricing, and regional benefit differences
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.
The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.
WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.
Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.
Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.
We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.
The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942
Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.
posts
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
crosspost_parent_id | TEXT | The ID of the original Reddit post if this post is a crosspost. |
post_id | TEXT | Unique identifier for the Reddit post. |
created_at | TIMESTAMP | The timestamp when the post was created. |
updated_at | TIMESTAMP | The timestamp when the post was last updated. |
language_code | TEXT | The language code of the post. |
score | INTEGER | The score (upvotes minus downvotes) of the post. |
upvote_ratio | REAL | The ratio of upvotes to total votes. |
gildings | INTEGER | Number of awards (gildings) received by the post. |
num_comments | INTEGER | Number of comments on the post. |
comments
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
post_id | TEXT | The ID of the Reddit post the comment belongs to. |
parent_id | TEXT | The ID of the parent comment (if a reply). |
comment_id | TEXT | Unique identifier for the comment. |
created_at | TIMESTAMP | The timestamp when the comment was created. |
last_modified_at | TIMESTAMP | The timestamp when the comment was last modified. |
score | INTEGER | The score (upvotes minus downvotes) of the comment. |
upvote_ratio | REAL | The ratio of upvotes to total votes for the comment. |
gilded | INTEGER | Number of awards (gildings) received by the comment. |
postlinks
Column Name | Type | Description |
---|---|---|
post_id | TEXT | Unique identifier for the Reddit post. |
end_processed_valid | INTEGER | Whether the extracted URL from the post resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the Reddit post. |
final_valid | INTEGER | Whether the final URL from the post resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final URL. |
final_url | TEXT | The final URL after redirections. |
redirected | INTEGER | Indicator of whether the posted URL was redirected (1) or not (0). |
in_title | INTEGER | Indicator of whether the link appears in the post title (1) or post body (0). |
commentlinks
Column Name | Type | Description |
---|---|---|
comment_id | TEXT | Unique identifier for the Reddit comment. |
end_processed_valid | INTEGER | Whether the extracted URL from the comment resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the comment. |
final_valid | INTEGER | Whether the final URL from the comment resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final |
A Global Self-consistent, Hierarchical, High-resolution Geography Database is a high-resolution geography data set amalgamated from three data bases in the public domain: World Vector Shorelines (WVS). CIA World Data Bank II (WDBII). Atlas of the Cryosphere (AC). The WVS is our basis for shorelines except for Antarctica while the WDBII is the basis for lakes, although there are instances where differences in coastline representations necessitated adding WDBII islands to GSHHG. The WDBII source also provides all political borders and rivers. The addition of AC since 2.3.0 allows us to offer two choices for Antarctica coastlines: Ice-front or Grounding line. These are encoded as levels 5 and 6, respectively and users of GSHHG can choose which set to use. GSHHG data have undergone extensive processing and should be free of internal inconsistencies such as erratic points and crossing segments. The shorelines are constructed entirely from hierarchically arranged closed polygons. A modified version of GSHHG is used by GMT, the Generic Mapping Tools. Starting with version 2.2.2, GSHHG has been released under the GNU Lesser General Public License. NCEI decommissioned the Global Self-consistent, Hierarchical, High-resolution Geography Database in May 2025 with no further updates. Comments and questions may be sent to: ncei.info@noaa.gov.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains version 3.0 (March 2025 release) of the Global Fishing Watch apparent fishing effort dataset. Data is available for 2012-2024 and based on positions of >190,000 unique automatic identification system (AIS) devices on fishing vessels, of which up to ~96,000 are active in a given year. Fishing vessels are identified via a machine learning model, vessel registry databases, and manual review by GFW and regional experts. Vessel time is measured in hours, calculated by assigning to each AIS position the amount of time elapsed since the previous AIS position of the vessel. The time is counted as apparent fishing hours if the GFW fishing detection model - a neural network machine learning model - determines the vessel is engaged in fishing behavior during that AIS position.
Data are spatially binned into grid cells that measure 0.01 or 0.1 degrees on a side; the coordinates defining each cell are provided in decimal degrees (WGS84) and correspond to the lower-left corner. Data are available in the following formats:
The fishing effort dataset is accompanied by a table of vessel information (e.g. gear type, flag state, dimensions).
Fishing effort and vessel presence data are available as .csv files in daily formats. Files for each year are stored in separate .zip files. A README.txt and schema.json file is provided for each dataset version and contains the table schema and additional information. There is also a README-known-issues-v3.txt file outlining some of the known issues with the version 3 release.
Files are names according to the following convention:
Daily file format:
[fleet/mmsi]-daily-csvs-[100/10]-v3-[year].zip
[fleet/mmsi]-daily-csvs-[100/10]-v3-[date].csv
Monthly file format:
fleet-monthly-csvs-10-v3-[year].zip
fleet-monthly-csvs-10-v3-[date].csv
Fishing vessel format: fishing-vessels-v3.csv
README file format: README-[fleet/mmsi/fishing-vessels/known-issues]-v3.txt
File identifiers:
[fleet/mmsi]: Data by fleet (flag and geartype) or by MMSI
[100/10]: 100th or 10th degree resolution
[year]: Year of data included in .zip file
[date]: Date of data included in .csv files. For monthly data, [date]corresponds to the first date of the month
Examples: fleet-daily-csvs-100-v3-2020.zip; mmsi-daily-csvs-10-v3-2020-01-10.csv; fishing-vessels-v3.csv; README-fleet-v3.txt; fleet-monthly-csvs-10-v3-2024.zip; fleet-monthly-csvs-10-v3-2024-08-01.csv
For an overview of how GFW turns raw AIS positions into estimates of fishing hours, see this page.
The models used to produce this dataset were developed as part of this publication: D.A. Kroodsma, J. Mayorga, T. Hochberg, N.A. Miller, K. Boerder, F. Ferretti, A. Wilson, B. Bergman, T.D. White, B.A. Block, P. Woods, B. Sullivan, C. Costello, and B. Worm. "Tracking the global footprint of fisheries." Science 361.6378 (2018). Model details are available in the Supplementary Materials.
The README-known-issues-v3.txt file describing this dataset's specific caveats can be downloaded from this page. We highly recommend that users read this file in full.
The README-mmsi-v3.txt file, the README-fleet-v3.txt file, and the README-fishing-vessels-v3.txt files are downloadable from this page and contain the data description for (respectively) the fishing hours by MMSI dataset, the fishing hours by fleet dataset, and the vessel information file. These readmes contain key explanations about the gear types and flag states assigned to vessels in the dataset.
File name structure for the datafiles are available below on this page and file schema can be downloaded from this page.
A FAQ describing the updates in this version and the differences between this dataset and the data available from the GFW Map and APIs is available here.
The apparent fishing hours dataset is intended to allow users to analyze patterns of fishing across the world’s oceans at temporal scales as fine as daily and at spatial scales as fine as 0.1 or 0.01 degree cells. Fishing hours can be separated out by gear type, vessel flag and other characteristics of vessels such as tonnage.
Potential applications for this dataset are broad. We offer suggested use cases to illustrate its utility. The dataset can be integrated as a static layer in multi-layered analyses, allowing researchers to investigate relationships between fishing effort and other variables, including biodiversity, tracking, and environmental data, as defined by their research objectives.
A few example questions that these data could be used to answer:
What flag states have fishing activity in my area of interest?
Do hotspots of longline fishing overlap with known migration routes of sea turtles?
How does fishing time by trawlers change by month in my area of interest? Which seasons see the most trawling hours and which see the least?
This global dataset estimates apparent fishing hours effort. The dataset is based on publicly available information and statistical classifications which may not fully capture the nuances of local fishing practices. While we manually review the dataset at a global scale and in a select set of smaller test regions to check for issues, given the scale of the dataset we are unable to manually review every fleet in every region. We recognize the potential for inaccuracies and encourage users to approach regional analyses with caution, utilizing their own regional expertise to validate findings. We welcome your feedback on any regional analysis at research@globalfishingwatch.org to enhance the dataset's accuracy.
Caveats relating to known sources of inaccuracy as well as interpretation pitfalls to avoid are described in the README-known-issues-v3.txt file available for download from this page. We highly recommend that users read this file in full. The issues described include:
Data from 2024 should be considered provisional, as vessel classifications may change as more data from 2025 becomes available.
MMSI is used in this dataset as the vessel identifier. While MMSI is intended to serve as the unique AIS identifier for an individual vessel, this does not always hold in practice.
The Maritime Identification Digits (MID), the first 3 digits of MMSI, are the only source of information on vessel flag state when the vessel does not appear on a registry. The MID may be entered incorrectly, obscuring information about an MMSI’s flag state.
AIS reception is not consistent across all areas and changes over time.
Query using SQL in the Global Fishing Watch public BigQuery dataset: global-fishing-watch.fishing_effort_v3
Download the entire dataset from the Global Fishing Watch Data Download Portal (https://globalfishingwatch.org/data-download/datasets/public-fishing-effort)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for PRODUCER PRICES reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GOLD RESERVES reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for HOUSING STARTS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Black Earth town population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of Black Earth town. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.
Key observations
The largest age group was 18 to 64 years with a poulation of 276 (66.67% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age cohorts:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Black Earth town Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for EXISTING HOME SALES reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in the United States was worth 29184.89 billion US dollars in 2024, according to official data from the World Bank. The GDP value of the United States represents 27.49 percent of the world economy. This dataset provides - United States GDP - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gold rose to 3,377.18 USD/t.oz on August 26, 2025, up 0.30% from the previous day. Over the past month, Gold's price has risen 1.88%, and is up 33.74% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks the benchmark market for this commodity. Gold - values, historical data, forecasts and news - updated on August of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Blue Earth City township Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of Blue Earth City township, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of Blue Earth City township.
Key observations
Among the Hispanic population in Blue Earth City township, regardless of the race, the largest group is of Mexican origin, with a population of 26 (100% of the total Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Origin for Hispanic or Latino population include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Blue Earth City township Population by Race & Ethnicity. You can refer the same here
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.