100+ datasets found

Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
g
Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
search.gesis.org
Updated Oct 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pfeffer, Jürgen (2022). Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2516
Explore at:
Dataset updated
Oct 16, 2022
Dataset provided by
GESIS search
GESIS, Köln
Authors
Pfeffer, Jürgen
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Description
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.

Wordle Answer Search Trends Dataset (2021–2025)

kaggle.com

Updated Jun 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 26, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ankush Kamboj

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

🔍 Hypothesis

Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

This dataset supports exploration of:

Wordle Answers
Trends for wordle answers
Correlation between wordle answer rarity and search interest

Columns

Column	Description
`date`	Date of the Wordle puzzle
`word`	Correct 5-letter Wordle answer
`game`	Wordle game number
`wordfreq_commonality`	Normalized frequency score using Python’s `wordfreq` library
`subtlex_commonality`	Normalized frequency score using SUBTLEX-US dataset
`trend_day_global`	Google search interest on the day (global, all categories)
`trend_avg_200_global`	200-day average search interest (global, all categories)
`trend_day_language`	Search interest on Wordle day (Language Resources category)
`trend_avg_200_language`	200-day average search interest (Language Resources category)

Notes: - All trend values are relative (0–100 scale, per Google Trends)

🧮 Methodology

Wordle answers were scraped from wordfinder.yourdictionary.com
Commonality scores were computed using:
- wordfreq Python library
- SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
Trend data was fetched using Google Trends API via pytrends

📊 Analysis

Can find analysis done using this data in the blog post

Data interactions per person per day worldwide 2010-2025
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Data interactions per person per day worldwide 2010-2025 [Dataset]. https://www.statista.com/statistics/948840/worldwide-data-interactions-daily-per-capita/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2018
Area covered
Worldwide
Description
This statistic shows the daily digital data engagement interactions per person worldwide from 2010 to 2025. The average number of data interactions per connected person per day is expected to increase dramatically, from *** interactions per day in 2010 to almost ************* interactions per day by 2025.
d
Daily COVID-19 Outbreak Summary
catalog.data.gov
data.kingcounty.gov
+3more
Updated Feb 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.kingcounty.gov (2024). Daily COVID-19 Outbreak Summary [Dataset]. https://catalog.data.gov/dataset/daily-covid-19-outbreak-summary
Explore at:
Dataset updated
Feb 2, 2024
Dataset provided by
data.kingcounty.gov
Description
Updated daily between 3:00 pm to 5:00 pm Data are updated daily in the early afternoon and reflect laboratory results reported to the Washington State Department of Health as of midnight the day before. Data for previous dates will be updated as new results are entered, interviews are conducted, and data errors are corrected. Many people test positive but do not require hospitalization. The counts of positive cases do not necessarily indicate levels of demand at local hospitals. Reporting of test results to the Washington State Department of Health may be delayed by several days and will be updated when data are available. Only positive or negative test results are reflected in the counts and exclude tests where results are pending, inconclusive or were not performed.
Job Offers Web Scraping Search
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

By [source]

About this dataset

This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

Research Ideas

Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.

The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.

It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
d
GaiaLens News Data: real-time (refreshed daily), covers c.17,000 global...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GaiaLens (2022). GaiaLens News Data: real-time (refreshed daily), covers c.17,000 global publicly traded companies, tracks 50 ESG themes [Dataset]. https://datarade.ai/data-products/gaialens-news-data-real-time-refreshed-daily-covers-c-17-gaialens
Explore at:
.json, .xml, .csv, .xls, .txtAvailable download formats
Dataset authored and provided by
GaiaLens
Area covered
Georgia, Pakistan, Indonesia, Slovenia, Norway, Nigeria, Croatia, New Zealand, Bahamas, Togo
Description
We can offer the news data in two formats: 1) News flow: all news flow for our company coverage including articles and tweets. 2) ESG Incidents: highlights any pressing issues that companies are facing in the news.

1) News flow

Our system executes around 100,000 searches per day across the internet. We search specific websites deemed to be high-quality and informationally additive for news about our whole company coverage.

These include: • Mainstream publications like Reuters, CNN, CNBC, NBC News etc. • NGO websites such as Ethical Consumer and Anti-Slavery International • Investigative journalist websites like MLex • National papers like the Japan Times • Trade publications like Insurance Journal • Sustainability publications like Edie.net

Each article that we download goes through rigorous processing. This includes cleaning the body of the article and adding its metadata e.g., the date that it was published.

We then calculate our proprietary “relevance” scores. This is a calculation to determine how relevant the article is to the company, CEO, biggest Insider and biggest Outsider.

Natural Language Processing (NLP) techniques are used to calculate the similarity and sentiment scores for each article for each news topic.

We use Twitter’s API to download the latest tweets from Thought Leader Accounts. We track over 100 Thought Leaders such as Ceres and Science Based Targets.

These tweets are then searched to see if any of our company coverage is mentioned.

Afterwards, the same processing and calculation steps are followed as for the news articles.

2) ESG Incidents

ESG Incidents is the second news feed that we display for users. It is designed to show any pressing issues that a company is facing in the news in real-time.

To get ESG Incidents outputs we follow these steps: 1. Choose a time period of news to look at e.g., 3 months. 2. For each news topic (we have around 50) pick out the article(s) that have the highest relevance to a company and the highest similarity score over that time period. We multiply these two scores together to calculate an “Incidence Score”. 3. Calculate how many times that new topic has come up in the news over the chosen time period as a proportion of the total articles for that company.

We are then able to see emerging trends and incidents for a particular company over a time period and also have the ability to see the most relevant articles for each news topic. This allows investors to see any emerging incidents or scandals for a company in real-time.
COVID-19 Daily Testing - By Person - Historical
healthdata.gov
data.cityofchicago.org
+2more
application/rdfxml +5
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2025). COVID-19 Daily Testing - By Person - Historical [Dataset]. https://healthdata.gov/dataset/COVID-19-Daily-Testing-By-Person-Historical/3yv2-aspi
Explore at:
application/rssxml, tsv, xml, csv, application/rdfxml, jsonAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
data.cityofchicago.org
Description
This dataset is historical only and ends at 5/7/2021. For more information, please see http://dev.cityofchicago.org/open%20data/data%20portal/2021/05/04/covid-19-testing-by-person.html. The recommended alternative dataset for similar data beyond that date is https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test/gkdw-2tgv.

This is the source data for some of the metrics available at https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html.

For all datasets related to COVID-19, see https://data.cityofchicago.org/browse?limitTo=datasets&sortBy=alpha&tags=covid-19.

This dataset contains counts of people tested for COVID-19 and their results. This dataset differs from https://data.cityofchicago.org/d/gkdw-2tgv in that each person is in this dataset only once, even if tested multiple times. In the other dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in that dataset more than once on the same day unless he/she had both a positive and not-positive test.

Only Chicago residents are included based on the home address as provided by the medical provider.

Molecular (PCR) and antigen tests are included, and only one test is counted for each individual. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table.

Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results.

Demographic data are more complete for those who test positive; care should be taken when calculating percentage positivity among demographic groups.

All data are provisional and subject to change. Information is updated as additional details are received.

Data Source: Illinois National Electronic Disease Surveillance System
s
Data from: Facebook Users
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Facebook Users [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Facebook is fast approaching 3 billion monthly active users. That’s about 36% of the world’s entire population that log in and use Facebook at least once a month.
Average daily time spent on social media worldwide 2012-2025
statista.com
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
P
M4 Dataset
paperswithcode.com
Updated Feb 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Makridakis (2021). M4 Dataset [Dataset]. https://paperswithcode.com/dataset/m4
Explore at:
Dataset updated
Feb 7, 2021
Authors
Makridakis
Description
The M4 dataset is a collection of 100,000 time series used for the fourth edition of the Makridakis forecasting Competition. The M4 dataset consists of time series of yearly, quarterly, monthly and other (weekly, daily and hourly) data, which are divided into training and test sets. The minimum numbers of observations in the training test are 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The participants were asked to produce the following numbers of forecasts beyond the available data that they had been given: six for yearly, eight for quarterly, 18 for monthly series, 13 for weekly series and 14 and 48 forecasts respectively for the daily and hourly ones.

The M4 dataset was created by selecting a random sample of 100,000 time series from the ForeDeCk database. The selected series were then scaled to prevent negative observations and values lower than 10, thus avoiding possible problems when calculating various error measures. The scaling was performed by simply adding a constant to the series so that their minimum value was equal to 10 (29 occurrences across the whole dataset). In addition, any information that could possibly lead to the identification of the original series was removed so as to ensure the objectivity of the results. This included the starting dates of the series, which did not become available to the participants until the M4 had ended.
d
Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...
datarade.ai
.csv
Updated Oct 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori (2019). Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori
Explore at:
.csvAvailable download formats
Dataset updated
Oct 1, 2019
Dataset authored and provided by
Factori
Area covered
Austria, Egypt, Faroe Islands, Sweden, Turks and Caicos Islands, Japan, Cameroon, Palestine, Uzbekistan, Taiwan
Description
Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts
d
COVID-19 Test Results by Date of Specimen Collection (By County) - ARCHIVE
catalog.data.gov
data.ct.gov
+1more
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2023). COVID-19 Test Results by Date of Specimen Collection (By County) - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-test-results-by-date-of-specimen-collection-by-county
Explore at:
Dataset updated
Aug 12, 2023
Dataset provided by
data.ct.gov
Description
DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 test results by date of specimen collection, including total, positive, negative, and indeterminate for molecular and antigen tests. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests. Test results may be reported several days after the result. Data are incomplete for the most recent days. Data from previous dates are routinely updated. Records with a null date field summarize tests reported that were missing the date of collection. Starting in July 2020, this dataset will be updated every weekday.
SGP97 Surface: NCDC Summary of the Day COOP Dataset
catalog.data.gov
geodata.nal.usda.gov
+4more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). SGP97 Surface: NCDC Summary of the Day COOP Dataset [Dataset]. https://catalog.data.gov/dataset/sgp97-surface-ncdc-summary-of-the-day-coop-dataset-668f1
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The Southern Great Plains 1997 (SGP97) Hydrology Experiment originated from an interdisciplinary investigation, "Soil Moisture Mapping at Satellite Temporal and Spatial Scales" (PI: Thomas J. Jackson, USDA Agricultural Research Service, Beltsville, MD) selected under the NASA Research Announcement 95-MTPE-03. The region selected for investigation is the best instrumented site for surface soil moisture, hydrology and meteorology in the world. This includes the USDA/ARS Little Washita Watershed, the USDA/ARS facility at El Reno, Oklahoma, the ARM/CART central facility, as well as the Oklahoma Mesonet. The National Climatic Data Center (NCDC) Summary of the Day Co-operative Dataset is one of several surface datasets provided for the Southern Great Plains (SGP) 1997 project. This NCDC Co-operative Observer (COOP) dataset contains data from sixty-two stations for the SGP 1997 time period (18 June 1997 through 18 July 1997) and in the SGP 1997 domain (approximately 97W to 99W longitude and 34.5N to 37N latitude). The primary thrust of the cooperative observing program is the recording of 24-hour precipitation amounts, but approximately 55% of the stations also record maximum and minimum temperatures. The observations are for the 24-hour period ending at the time of observation. Observer convenience or special program needs mean that observing times vary from station to station. However, the vast majority of observations are taken near either 7:00 AM or 7:00 PM local time. The NCDC Summary of the Day Co-operative Dataset (TD-3200) contains eight metadata parameters and fifteen data parameters and flags. The metadata parameters describe the date/time, network, station and location at which the data were collected. All times are UTC. Data values are valid for the 24 hours preceding the time of observation. Resources in this dataset:Resource Title: GeoData catalog record. File Name: Web Page, url: https://geodata.nal.usda.gov/geonetwork/srv/eng/catalog.search#/metadata/SGP97COOP_jjm_2015-05-04_0918
d
Satellite US Construction Materials Dataset Package (Cemex, Vulcan, Martin...
datarade.ai
.csv
Updated Jan 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Space Know (2023). Satellite US Construction Materials Dataset Package (Cemex, Vulcan, Martin Marietta) [Dataset]. https://datarade.ai/data-products/satellite-us-construction-materials-dataset-package-cemex-v-space-know
Explore at:
.csvAvailable download formats
Dataset updated
Jan 18, 2023
Dataset authored and provided by
Space Know
Area covered
United States of America
Description
This dataset package is focused on U.S construction materials and three construction companies: Cemex, Martin Marietta & Vulcan.

In this package, SpaceKnow tracks manufacturing and processing facilities for construction material products all over the US. By tracking these facilities, we are able to give you near-real-time data on spending on these materials, which helps to predict residential and commercial real estate construction and spending in the US.

The dataset includes 40 indices focused on asphalt, cement, concrete, and building materials in general. You can look forward to receiving country-level and regional data (activity in the North, East, West, and South of the country) and the aforementioned company data.

SpaceKnow uses satellite (SAR) data to capture activity and building material manufacturing and processing facilities in the US.

Data is updated daily, has an average lag of 4-6 days, and history back to 2017.

The insights provide you with level and change data for refineries, storage, manufacturing, logistics, and employee parking-based locations.

SpaceKnow offers 3 delivery options: CSV, API, and Insights Dashboard

Available Indices Companies: Cemex (CX): Construction Materials (covers all manufacturing facilities of the company in the US), Concrete, Cement (refinery and storage) indices, and aggregates Martin Marietta (MLM): Construction Materials (covers all manufacturing facilities of the company in the US), Concrete, Cement (refinery and storage) indices, and aggregates Vulcan (VMC): Construction Materials (covers all manufacturing facilities of the company in the US), Concrete, Cement (refinery and storage) indices, and aggregates

USA Indices:

Aggregates USA Asphalt USA Cement USA Cement Refinery USA Cement Storage USA Concrete USA Construction Materials USA Construction Mining USA Construction Parking Lots USA Construction Materials Transfer Hub US Cement - Midwest, Northeast, South, West Cement Refinery - Midwest, Northeast, South, West Cement Storage - Midwest, Northeast, South, West

Why get SpaceKnow's U.S Construction Materials Package?

Monitor Construction Market Trends: Near-real-time insights into the construction industry allow clients to understand and anticipate market trends better.

Track Companies Performance: Monitor the operational activities, such as the volume of sales

Assess Risk: Use satellite activity data to assess the risks associated with investing in the construction industry.

Index Methodology Summary Continuous Feed Index (CFI) is a daily aggregation of the area of metallic objects in square meters. There are two types of CFI indices; CFI-R index gives the data in levels. It shows how many square meters are covered by metallic objects (for example employee cars at a facility). CFI-S index gives the change in data. It shows how many square meters have changed within the locations between two consecutive satellite images.

How to interpret the data SpaceKnow indices can be compared with the related economic indicators or KPIs. If the economic indicator is in monthly terms, perform a 30-day rolling sum and pick the last day of the month to compare with the economic indicator. Each data point will reflect approximately the sum of the month. If the economic indicator is in quarterly terms, perform a 90-day rolling sum and pick the last day of the 90-day to compare with the economic indicator. Each data point will reflect approximately the sum of the quarter.

Where the data comes from SpaceKnow brings you the data edge by applying machine learning and AI algorithms to synthetic aperture radar and optical satellite imagery. The company’s infrastructure searches and downloads new imagery every day, and the computations of the data take place within less than 24 hours.

In contrast to traditional economic data, which are released in monthly and quarterly terms, SpaceKnow data is high-frequency and available daily. It is possible to observe the latest movements in the construction industry with just a 4-6 day lag, on average.

The construction materials data help you to estimate the performance of the construction sector and the business activity of the selected companies.

The foundation of delivering high-quality data is based on the success of defining each location to observe and extract the data. All locations are thoroughly researched and validated by an in-house team of annotators and data analysts.

See below how our Construction Materials index performs against the US Non-residential construction spending benchmark

Each individual location is precisely defined to avoid noise in the data, which may arise from traffic or changing vegetation due to seasonal reasons.

SpaceKnow uses radar imagery and its own unique algorithms, so the indices do not lose their significance in bad weather conditions such as rain or heavy clouds.

→ Reach out to get free trial

...
Google Trends
console.cloud.google.com
Updated Jul 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&inv=1&invt=Ab1KDQ (2018). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends
Explore at:
Dataset updated
Jul 18, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Google Searchhttp://google.com/
Description
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Spain Google Search Trends: Economic Measures: Mortgage Loan
ceicdata.com
Updated Mar 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2024). Spain Google Search Trends: Economic Measures: Mortgage Loan [Dataset]. https://www.ceicdata.com/en/spain/google-search-trends-by-categories
Explore at:
Dataset updated
Mar 20, 2024
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 8, 2025 - Mar 19, 2025
Area covered
Spain
Description
Google Search Trends: Economic Measures: Mortgage Loan data was reported at 10.000 Score in 14 May 2025. This records a decrease from the previous number of 12.000 Score for 13 May 2025. Google Search Trends: Economic Measures: Mortgage Loan data is updated daily, averaging 10.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 47.000 Score in 21 Apr 2023 and a record low of 0.000 Score in 14 Feb 2023. Google Search Trends: Economic Measures: Mortgage Loan data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Spain – Table ES.Google.GT: Google Search Trends: by Categories.
Eritrea Google Search Trends: Computer & Electronics: Apple
ceicdata.com
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2020). Eritrea Google Search Trends: Computer & Electronics: Apple [Dataset]. https://www.ceicdata.com/en/eritrea/google-search-trends-by-categories/google-search-trends-computer--electronics-apple
Explore at:
Dataset updated
Mar 18, 2025
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 7, 2025 - Mar 18, 2025
Area covered
Eritrea
Description
Eritrea Google Search Trends: Computer & Electronics: Apple data was reported at 0.000 Score in 15 May 2025. This stayed constant from the previous number of 0.000 Score for 14 May 2025. Eritrea Google Search Trends: Computer & Electronics: Apple data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 15 May 2025, with 1262 observations. The data reached an all-time high of 100.000 Score in 19 Apr 2025 and a record low of 0.000 Score in 15 May 2025. Eritrea Google Search Trends: Computer & Electronics: Apple data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Eritrea – Table ER.Google.GT: Google Search Trends: by Categories.
Daily time spent online by users worldwide Q3 2024, by region
statista.com
ai-chatbox.pro
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Daily time spent online by users worldwide Q3 2024, by region [Dataset]. https://www.statista.com/statistics/1258232/daily-time-spent-online-worldwide/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
As of the third quarter of 2024, internet users in South Africa spent more than **** hours and ** minutes online per day, ranking first among the regions worldwide. Brazil followed, with roughly **** hours of daily online usage. As of the examined period, Japan registered the lowest number of daily hours spent online, with users in the country spending an average of over **** hours per day using the internet. The data includes the daily time spent online on any device. Social media usage In recent years, social media has become integral to internet users' daily lives, with users spending an average of *** minutes daily on social media activities. In April 2024, global social network penetration reached **** percent, highlighting its widespread adoption. Among the various platforms, YouTube stands out, with over *** billion monthly active users, making it one of the most popular social media platforms. YouTube’s global popularity In 2023, the keyword "YouTube" ranked among the most popular search queries on Google, highlighting the platform's immense popularity. YouTube generated most of its traffic through mobile devices, with about 98 billion visits. This popularity was particularly evident in the United Arab Emirates, where YouTube penetration reached approximately **** percent, the highest in the world.
Z
Data from: Notably Inaccessible – Data Driven Understanding of Data Science...
data.niaid.nih.gov
zenodo.org
Updated Jul 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mankoff, Jennifer (2023). Notably Inaccessible – Data Driven Understanding of Data Science Notebook (In)Accessibility [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8185049
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
Potluri, Venkatesh
Mankoff, Jennifer
Tieanklin, Nussara
Singanamalla, Sudheesh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This dataset artifact contains the intermediate datasets from pipeline executions necessary to reproduce the results of the paper. We share this artifact in hopes of providing a starting point for other researchers to extend the analysis on notebooks, discover more about their accessibility, and offer solutions to make data science more accessible. The scripts needed to generate these datasets and analyse them are shared in the GitHub repository for this work.

The dataset contains large files of approximately 60 GB so please exercise caution when extracting the data from compressed files.

The dataset contains files which could take a significant amount of run time of the scripts to generate/reproduce.

Dataset Contents

We briefly summarize the included files in our dataset. Please refer to the documentation for specific information about the structure of the data in these files, the scripts to generate them, and runtimes for various parts of our data processing pipeline.

epoch_9_loss_0.04706_testAcc_0.96867_X_resnext101_docSeg.pth: We share this model file, originally provided by Jobin et al., to enable the classification of figures found in our dataset. Please place this into the model/ directory.

model-results.csv: This file contains results from the classification performed on the figures found in the notebooks in our dataset.

Performing this classification may take upto a day.

a11y-scan-dataset.zip: This archive contains two files and results in datasets of approximately 60GB when extracted. Please ensure that you have sufficient disk space to uncompress this zip archive. The archive contains:

a11y/a11y-detailed-result.csv: This dataset contains the accessibility scan results from the scans run on the 100k notebooks across themes.

The detailed result file can be really large (> 60 GB) and can be time-consuming to construct.

a11y/a11y-aggregate-scan.csv: This file is an aggregate of the detailed result that contains the number of each type of error found in each notebook.

This file is also shared outside the compressed directory.

errors-different-counts-a11y-analyze-errors-summary.csv: This file contains the counts of errors that occur in notebooks across different themes.

nb_processed_cell_html.csv: This file contains metadata corresponding to each cell extracted from the html exports of our notebooks.

nb_first_interactive_cell.csv: This file contains the necessary metadata to compute the first interactive element, as defined in our paper, in each notebook.

nb_processed.csv: This file contains the necessary data after processing the notebooks extracting the number of images, imports, languages, and cell level information.

processed_function_calls.csv: This file contains the information about the notebooks, the various imports and function calls used within the notebooks.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/

Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028

Explore at:

Dataset updated

Jun 30, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

May 2024

Area covered

Worldwide

Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

Clear search

Close search

Google apps

Main menu

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

Wordle Answer Search Trends Dataset (2021–2025)

🔍 Hypothesis

Columns

🧮 Methodology

📊 Analysis

Data interactions per person per day worldwide 2010-2025

Daily COVID-19 Outbreak Summary

Job Offers Web Scraping Search

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

GaiaLens News Data: real-time (refreshed daily), covers c.17,000 global...

1) News flow

2) ESG Incidents

COVID-19 Daily Testing - By Person - Historical

Data from: Facebook Users

Average daily time spent on social media worldwide 2012-2025

M4 Dataset

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...

COVID-19 Test Results by Date of Specimen Collection (By County) - ARCHIVE

SGP97 Surface: NCDC Summary of the Day COOP Dataset

Satellite US Construction Materials Dataset Package (Cemex, Vulcan, Martin...

Google Trends

Spain Google Search Trends: Economic Measures: Mortgage Loan

Eritrea Google Search Trends: Computer & Electronics: Apple

Daily time spent online by users worldwide Q3 2024, by region

Data from: Notably Inaccessible – Data Driven Understanding of Data Science...

Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028