This dataset consists of the Surface Ocean CO2 Atlas Version 2022 (SOCATv2022) data product files. The ocean absorbs one quarter of the global CO2 emissions from human activity. The community-led Surface Ocean CO2 Atlas (www.socat.info) is key for the quantification of ocean CO2 uptake and its variation, now and in the future. SOCAT version 2022 has quality-controlled in situ surface ocean fCO2 (fugacity of CO2) measurements on ships, moorings, autonomous and drifting surface platforms for the global oceans and coastal seas from 1957 to 2021. The main synthesis and gridded products contain 33.7 million fCO2 values with an estimated accuracy of better than 5 μatm. A further 6.4 million fCO2 sensor data with an estimated accuracy of 5 to 10 μatm are separately available. During quality control, marine scientists assign a flag to each data set, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Data sets are assigned flags of A and B for an estimated accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3 to 2022. Quality control comments for individual data sets can be accessed via the SOCAT Data Set Viewer (www.socat.info). All data sets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Access to data sets with an estimated accuracy of 5 to 10 (flag of E) and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. The gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer (www.socat.info), enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. Matlab code is available for reading these files. Ocean Data View also provides access to the SOCAT data products (www.socat.info). SOCAT data products are discoverable, accessible and citable. The SOCAT Data Use Statement (www.socat.info) asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable data sets, the Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onwards. SOCAT is used for quantification of ocean CO2 uptake and ocean acidification and for evaluation of climate models and sensor data. SOCAT products inform the annual Global Carbon Budget since 2013. The annual SOCAT releases by the SOCAT scientific community are a Voluntary Commitment for United Nations Sustainable Development Goal 14.3 (Reduce Ocean Acidification) (#OceanAction20464). More broadly the SOCAT releases contribute to UN SDG 13 (Climate Action) and SDG 14 (Life Below Water), and to the UN Decade of Ocean Science for Sustainable Development. Hundreds of peer-reviewed scientific publications and high-impact reports cite SOCAT. The SOCAT community-led synthesis product is a key step in the value chain based on in situ inorganic carbon measurements of the oceans, which provides policy makers with critical information on ocean CO2 uptake in climate negotiations. The need for accurate knowledge of global ocean CO2 uptake and its (future) variation makes sustained funding of in situ surface ocean CO2 observations imperative.
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Amazon Product Reviews Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/amazon-product-reviews-datasete on 13 February 2022.
--- Dataset description provided by original source is as follows ---
This dataset contains 30K records of product reviews from amazon.com.
This dataset was created by PromptCloud and DataStock
This dataset contains the following:
Total Records Count: 43729
Domain Name: amazon.com
Date Range: 01st Jan 2020 - 31st Mar 2020
File Extension: CSV
Available Fields:
-- Uniq Id,
-- Crawl Timestamp,
-- Billing Uniq Id,
-- Rating,
-- Review Title,
-- Review Rating,
-- Review Date,
-- User Id,
-- Brand,
-- Category,
-- Sub Category,
-- Product Description,
-- Asin,
-- Url,
-- Review Content,
-- Verified Purchase,
-- Helpful Review Count,
-- Manufacturer Response
We wouldn't be here without the help of our in house teams at PromptCloud and DataStock. Who has put their heart and soul into this project like all other projects? We want to provide the best quality data and we will continue to do so.
The inspiration for these datasets came from research. Reviews are something that is important wit everybody across the globe. So we decided to come up with this dataset that shows us exactly how the user reviews help companies to better their products.
This dataset was created by PromptCloud and contains around 0 samples along with Billing Uniq Id, Verified Purchase, technical information and other features such as: - Crawl Timestamp - Manufacturer Response - and more.
- Analyze Helpful Review Count in relation to Sub Category
- Study the influence of Review Date on Product Description
- More datasets
If you use this dataset in your research, please credit PromptCloud
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recently, the economy in Guangdong province has ranked first in the country, maintaining a good growth momentum. The prediction of Gross Domestic Product (GDP) for Guangdong province is an important issue. Through predicting the GDP, it is possible to analyze whether the economy in Guangdong province can maintain high-quality growth. Hence, to accurately forecast the economy in Guangdong, this paper proposed an Elman neural network combining with wavelet function. The wavelet function not only stimulates the forecast ability of Elman neural network, but also improves the convergence speed of Elman neural network. Experimental results indicate that our model has good forecast ability of regional economy, and the forecast accuracy reach 0.971. In terms of forecast precision and errors, our model defeats the competitors. Moreover, our model gains advanced forecast results to both individual economic indicator and multiple economic indicators. This means that our model is independently of specific scenarios in regional economic forecast. We also find that the investment in education has a major positive impact on regional economic development in Guangdong province, and the both surges positive correlation. Experimental results also show that our model does not exhibit exponential training time with the augmenting of data volume. Consequently, we propose that our model is suitable for the prediction of large-scale datasets. Additionally, we demonstrate that using wavelet function gains more profits than using complex network architectures in forecast accuracy and training cost. Moreover, using wavelet function can simplify the designs of complexity network architectures, reducing the training parameter of neural networks.
This dataset consists of the Surface Ocean CO2 Atlas Version 2021 (SOCATv2021) data product files. The Surface Ocean CO2 Atlas (SOCAT) documents the increase in surface ocean CO2 (carbon dioxide), a critical measure as the oceans are taking up one quarter of the global CO2 emissions from human activity. SOCAT version 2021 has 30.6 million quality-controlled surface ocean fCO2 (fugacity of CO2) observations with an estimated accuracy of better than 5 μatm and a WOCE flag of 2 (good) from 1957 to 2020 for the global oceans and coastal seas. In addition, 2.1 million values with an estimated accuracy of 5 to 10 μatm are available. During quality control, marine scientists assign a flag to each data set, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Data sets are assigned flags of A and B for an estimated accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3 to 2021. Quality control comments for individual data sets can be accessed via the SOCAT Data Set Viewer (www.socat.info). All data sets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Access to data sets with an estimated accuracy of 5 to 10 (flag of E) and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. The gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer (www.socat.info), enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. Matlab code is available for reading these files. Ocean Data View also provides access to the SOCAT data products (www.socat.info). SOCAT data products are discoverable, accessible and citable. The SOCAT Data Use Statement asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable data sets, the Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onwards. SOCAT-based data products are used for quantification of the ocean carbon sink, to estimate ocean acidification, for evaluation of biogeochemical sensor data and to evaluate climate models (CMIP). Since 2013 SOCAT products inform the annual Global Carbon Budget. The annual SOCAT releases are made by the SOCAT scientific community as a Voluntary Commitment for United Nations Sustainable Development Goal 14.3 (Reduce Ocean Acidification) (#OceanAction20464). More broadly the SOCAT releases contribute to UN SDG 13 (Climate Action) and SDG 14 (Life Below Water), and to the UN Decade of Ocean Science for Sustainable Development. Hundreds of peer-reviewed scientific publications and high-impact reports cite SOCAT. The SOCAT community-led synthesis product is a key step in the value chain based on in situ inorganic carbon measurements of the oceans, which provides policy makers with essential information on ocean CO2 uptake in climate negotiations. The global need for accurate knowledge of ocean CO2 uptake and its variation (including ocean acidification) makes sustained funding for in situ surface ocean CO2 observations imperative.
The Surface Ocean CO2 Atlas (SOCAT) is a synthesis activity by the international marine carbon research community and has more than 100 contributors worldwide. SOCAT provides access to synthesis and gridded, quality controlled, observational products of surface ocean fCO2 (fugacity of carbon dioxide) for the global oceans and coastal seas. SOCAT version 5 has 21.5 million, in situ, surface ocean fCO2 measurements with an accuracy of better than 5 μatm and a WOCE flag of 2 (good) from 1957 to 2016. Calibrated sensor data with an accuracy of better than 10 μatm are also available. During quality control, marine scientists assign a flag to each dataset, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Datasets are assigned flags of A and B for an accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3 to 5. Quality control comments for individual datasets can be accessed via the SOCAT Data Set Viewer (www.socat.info). All datasets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all datasets with flags of A to D and fCO2 values with a flag of 2. Access to datasets with a flag of E and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer, enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. MatLab code is available for reading these files. Ocean Data View also provides access to the SOCAT data products. SOCAT data products are discoverable, accessible and citable. SOCAT versions 3 to 5 should be cited as Bakker et al., 2016 (until a publication on versions 4 and 5 is published). The SOCAT Fair Data Use Statement asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable datasets, the Fair Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onward. SOCAT enables quantification of the ocean carbon sink and ocean acidification and evaluation of ocean biogeochemical models. More than 180 peer-reviewed scientific publications and high-impact reports cite SOCAT. SOCAT represents a milestone in research coordination, data access, biogeochemical and climate research and in informing policy.
description: This data set provides supply chain health commodity shipment and pricing data. Specifically, the data set identifies Antiretroviral (ARV) and HIV lab shipments to supported countries. In addition, the data set provides the commodity pricing and associated supply chain expenses necessary to move the commodities to countries for use. The dataset has similar fields to the Global Fund's Price, Quality and Reporting (PQR) data. PEPFAR and the Global Fund represent the two largest procurers of HIV health commodities. This dataset, when analyzed in conjunction with the PQR data, provides a more complete picture of global spending on specific health commodities. The data are particularly valuable for understanding ranges and trends in pricing as well as volumes delivered by country. The US Government believes this data will help stakeholders make better, data-driven decisions. Care should be taken to consider contextual factors when using the database. Conclusions related to costs associated with moving specific line items or products to specific countries and lead times by product/country will not be accurate.; abstract: This data set provides supply chain health commodity shipment and pricing data. Specifically, the data set identifies Antiretroviral (ARV) and HIV lab shipments to supported countries. In addition, the data set provides the commodity pricing and associated supply chain expenses necessary to move the commodities to countries for use. The dataset has similar fields to the Global Fund's Price, Quality and Reporting (PQR) data. PEPFAR and the Global Fund represent the two largest procurers of HIV health commodities. This dataset, when analyzed in conjunction with the PQR data, provides a more complete picture of global spending on specific health commodities. The data are particularly valuable for understanding ranges and trends in pricing as well as volumes delivered by country. The US Government believes this data will help stakeholders make better, data-driven decisions. Care should be taken to consider contextual factors when using the database. Conclusions related to costs associated with moving specific line items or products to specific countries and lead times by product/country will not be accurate.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Welcome! This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. We also released a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.
This is real commercial data, it has been anonymised, and references to the companies and partners in the review text have been replaced with the names of Game of Thrones great houses.
We have also released a Marketing Funnel Dataset. You may join both datasets and see an order from Marketing perspective now!
Instructions on joining are available on this Kernel.
This dataset was generously provided by Olist, the largest department store in Brazilian marketplaces. Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners. See more on our website: www.olist.com
After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.
https://i.imgur.com/JuJMns1.png" alt="Example of a product listing on a marketplace">
The data is divided in multiple datasets for better understanding and organization. Please refer to the following data schema when working with it:
https://i.imgur.com/HRhd2Y0.png" alt="Data Schema">
We had previously released a classified dataset, but we removed it at Version 6. We intend to release it again as a new dataset with a new data schema. While we don't finish it, you may use the classified dataset available at the Version 5 or previous.
Here are some inspiration for possible outcomes from this dataset.
NLP:
This dataset offers a supreme environment to parse out the reviews text through its multiple dimensions.
Clustering:
Some customers didn't write a review. But why are they happy or mad?
Sales Prediction:
With purchase date information you'll be able to predict future sales.
Delivery Performance:
You will also be able to work through delivery performance and find ways to optimize delivery times.
Product Quality:
Enjoy yourself discovering the products categories that are more prone to customer insatisfaction.
Feature Engineering:
Create features from this rich dataset or attach some external public information to it.
Thanks to Olist for releasing this dataset.
The Advanced Microwave Scanning Radiometer 2 (AMSR2) instrument on the Global Change Observation Mission - Water 1 (GCOM-W1) provides global passive microwave measurements of terrestrial, oceanic, and atmospheric parameters for the investigation of global water and energy cycles. Near real-time (NRT) products are generated within 3 hours of the last observations in the file, by the Land Atmosphere Near real-time Capability for EOS (LANCE) at the AMSR Science Investigator-led Processing System (AMSR SIPS), which is collocated with the Global Hydrology Resource Center (GHRC) DAAC. The GCOM-W1 NRT AMSR2 Unified L2B Global Swath Ocean Products is a swath product containing global sea surface temperature over ocean, wind speed over ocean, water vapor over ocean and cloud liquid water over ocean, using resampled NRT Level-1R data provided by JAXA. This is the same algorithm that generates the corresponding standard science products in the AMSR SIPS. The NRT products are generated in HDF-EOS-5 augmented with netCDF-4/CF metadata and are available via HTTPS from the EOSDIS LANCE system at https://lance.nsstc.nasa.gov/amsr2-science/data/level2/ocean/. If data latency is not a primary concern, please consider using science quality products. Science products are created using the best available ancillary, calibration and ephemeris information. Science quality products are an internally consistent, well-calibrated record of the Earth's geophysical properties to support science. The AMSR SIPS produces AMSR2 standard science quality data products, and they are available at the NSIDC DAAC.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".
Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.
What Makes Our Data Unique?
Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.
Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.
Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.
Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.
How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.
Primary Use Cases and Verticals
Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.
Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.
B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.
HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.
How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.
Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.
Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.
Contact us for sample datasets or to discuss your specific needs.
Market basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
This dataset consists of the Surface Ocean CO2 Atlas (SOCAT) data product files. SOCAT is a synthesis activity by the international marine carbon research community and has more than 100 contributors worldwide. SOCAT provides access to synthesis and gridded, quality controlled, observational products of surface ocean fCO2 (fugacity of carbon dioxide) for the global oceans and coastal seas. SOCAT version 2020 has 28.2 million quality-controlled surface ocean fCO2 (fugacity of CO2) observations with an estimated accuracy of better than 5 μatm and a WOCE flag of 2 (good) from 1957 to 2020 for the global oceans and coastal seas. In addition, 2.3 million values with an estimated accuracy of 5 to 10 μatm are available. During quality control, marine scientists assign a flag to each data set, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Data sets are assigned flags of A and B for an accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3, 4, 5, 6, 2019 and 2020. Quality control comments for individual data sets can be accessed via the SOCAT Data Set Viewer. All data sets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all data sets with an estimated accuracy of better than 5 µatm (flags of A to D) and fCO2 values with a flag of 2. Access to data sets with an estimated accuracy of 5 to 10 (flag of E) and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer (www.socat.info), enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. Matlab code is available at www.socat.info for reading these files. Ocean Data View also provides access to the SOCAT data products (www.socat.info). SOCAT data products are discoverable, accessible and citable. SOCAT versions 3 to 2020 should be cited as Bakker et al., 2016 (until a publication on versions 4 to 2020 is published). The SOCAT Data Use Statement (www.socat.info) asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable data sets, the Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onwards. SOCAT enables quantification of the ocean carbon sink and ocean acidification, as well as evaluation of sensor data and ocean biogeochemical models. More than 329 peer-reviewed scientific publications and 80 high-impact reports cite SOCAT. SOCAT represents a milestone in biogeochemical and climate research. SOCAT informs policy and high-profile climate negotiations. Maintenance and annual updates of the SOCAT product require sustained funding and community involvement.
This dataset consists of the Surface Ocean CO2 Atlas (SOCAT) data product files. SOCAT is a synthesis activity by the international marine carbon research community and has more than 100 contributors worldwide. SOCAT provides access to synthesis and gridded, quality controlled, observational products of surface ocean fCO2 (fugacity of carbon dioxide) for the global oceans and coastal seas. SOCAT version 2019 has 25.7 million quality-controlled surface ocean fCO2 (fugacity of CO2) observations with an estimated accuracy of better than 5 μatm and a WOCE flag of 2 (good) from 1957 to 2017 from 1957 to 2019 for the global oceans and coastal seas. In addition, 1.7 million values with an estimated accuracy of 5 to 10 μatm are available. During quality control, marine scientists assign a flag to each data set, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Data sets are assigned flags of A and B for an accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3, 4, 5, 6 and 2019. Quality control comments for individual data sets can be accessed via the SOCAT Data Set Viewer. All data sets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all data sets with an estimated accuracy of better than 5 µatm (flags of A to D) and fCO2 values with a flag of 2. Access to data sets with an estimated accuracy of 5 to 10 (flag of E) and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer (www.socat.info), enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. Matlab code is available for reading these files. Ocean Data View also provides access to the SOCAT data products (www.socat.info). SOCAT data products are discoverable, accessible and citable. SOCAT versions 3 to 2019 should be cited as Bakker et al., 2016 (until a publication on versions 4 to 2019 is published). The SOCAT Fair Data Use Statement (www.socat.info) asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable data sets, the Fair Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onwards. SOCAT enables quantification of the ocean carbon sink and ocean acidification, as well as evaluation of sensor data and ocean biogeochemical models. More than 260 peer-reviewed scientific publications and 80 high-impact reports cite SOCAT. SOCAT represents a milestone in biogeochemical and climate research. SOCAT informs policy and high-profile climate negotiations. Maintenance and annual updates of the SOCAT product require sustained funding and community involvement.
Finding clean, high-quality B2B contact data shouldn't feel like going to the dentist. We make it easy for companies of all sizes, ranging from startups to enterprises globally to access high-quality B2B contact data, lead data, and business contact data for any company, any industry, and any job title.
Nymblr offers access to 140 million global verified B2B contacts with valid work emails, personal emails, work phones & direct dials, and social profiles. Our platform and API make it easy to access the highest-quality B2B Data, Business Contact Data, Lead Data, Work & Personal Email Data, and Phone data.
Easily access our data via API or directly in our platform which makes it fast and easy to search for B2B contacts and B2B leads using multiple filters, including:
Job Title Seniority Level (C-Level/Owner, VP, Director, etc.) Job Department (Sales, Accounting, Marketing, Finance, etc.) Skills Company Name/Company Domain Company Industry Company SIC Company Revenue Company Size Location (Country, State, and City)
Contact us to get a free trial today! No commitments required.
This dataset is comprised of mean and variance of the surface velocity field of the Gulf of Mexico, obtained from a large set of historical surface drifter data from the Gulf of Mexico—3770 trajectories spanning 28 years and more than a dozen data sources— which were uniformly processed and quality controlled, and assimilated into a spatially and temporally gridded dataset. A gridded product, called GulfFlow, is created by averaging all available data from the GulfDrifters dataset within quarter-degree spatial bins, and within overlapping monthlong temporal bins having a semimonthly spacing. The dataset spans monthly time bins centered on July 16, 1992 through July 1, 2020, for a total of 672 overlapping time slices. Odd- numbered slices correspond to calendar months, while even-numbered slices run from halfway through one month to halfway through the following month. A higher spatial resolution version, GulfFlow-1/12 degree is created in the identical way but using 1/12 degree bins instead of quarter-degree bins. In addition to the average velocities within each 3D bin, the count of sources contributing to each bin is also distributed, as is the subgridscale velocity variance. The count variable is a four-dimensional array of integers, the fourth dimension of which has length 45. This variable gives the number of hourly observations from each source dataset contributing to each three-dimensional bin. Values 1–15 are the count of velocity observations from drifters from each of the 15 experiments that are flagged as having retained their drogues, values 16–30 are for observation from drifters that are flagged as having lost their drogues, and values 31–45 are for observations from drifters of an unknown drogue status. In defining averaged quantities, we represent the velocity as a vector, u = [u v]T , where the superscript “T” denotes the transpose. Let an overbar, (\overline {\bf u}) , denote an average over a spatial bin and over all times, while angled brackets, , denote an average over a spatial bin and a particular temporal bin. Thus, , is a function of time while (\overline {\bf u}) is not. We refer to , as the local average, (\overline {\bf u}) as the global average, and (\overline {<\bf u>}) as the double average. Given the inhomogeneity of the drifter data, turns out the global average is biased towards intensive but short duration programs, hence the double average results in a much better representation of the true mean velocity field. The dataset includes the global average (\overline {<\bf u>}), the local covariance defined as
(\bf{ε}=<(u − )(𝐮−< 𝐮 >)^T>)
and (\epsilon^2)which is the trace of (\overline{\bf ε})
(\epsilon^2)=(tr{\overline{\bf ε}})
The data is distributed in two separate netcdCDF files, one for each grid resolution.
Here the article describing this dataset.
Lilly, J. M. and P. Pérez-Brunius (2021). A gridded surface current product for the Gulf of Mexico from consolidated drifter measurements. Earth System Science Data, 13: 645–669. https://doi.org/10.5194/essd-13-645-2021.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied:
MrVBF is a topographic index designed to identify areas of deposited material at a range of scales based on the observations that valley bottoms are low and flat relative to their surroundings and that large valley bottoms are flatter than smaller ones. Zero values indicate erosional terrain with values 1 and larger indicating progressively larger areas of deposition. There is some evidence that MrVBF values correlate with depth of deposited material.
The 3 second resolution product was generated from the 1 second MrVBF product and masked by the 3" water and ocean mask datasets.
Data quality
Lineage:
Source data
1.\tThe 1 second MrVBF product
2.\t3 second resolution SRTM water body and ocean mask datasets
MrVBF calculation
The MrVBF method is described in Gallant and Dowling (2003). It is based on slope and position in landscape (ranking within a 3- or 6-cell circular window) calculated from the original DEM and progressively generalised DEMs. The algorithm used to create this product is version 6g-a5, which is slightly different to that in the original paper.
Each value of MrVBF is associated with a particular scale and slope threshold. For each successive value the slope threshold halves and the scale triples. At each scale a location is considered erosional if it is high (ranked above the majority of the surrounding cells) or steep (slope greater than the threshold), and depositional otherwise. The largest value takes precedence at each location, so a value of zero indicates the site is considered to be erosional at all scales.
Value Threshold Resolution Interpretation
slope (%) (approx)
0 30 m Erosional
1 16 30 Small hillside deposit
2 8 30 Narrow valley floor
3 4 90
4 2 270 Valley floor
5 1 800 Extensive valley floor
6 0.5 2.4 km
7 0.25 7.2 km Depositional basin
8 0.125 22 km
9 0.0625 66 km Extensive depositional basin
The 3 second version of MrVBF was derived from the 1 second MrVBF using the median value in each 3 x 3 group of 1 second cells.
MrVBF has been used with topographic wetness index (TWI) to predict soil depths; see McKenzie, Gallant and Gregory (2003) for details.
Positional accuracy:
The horizontal positional error is the same as for the raw SRTM 1 second data, with 90% of tested locations within 7.2 m for Australia. See Rodriguez et al. (2006) for more information.
Attribute accuracy:
MrVBF is designed to represent the degree of deposition within a landscape dominated by colluvial and alluvial processes and the index has been found helpful for various purposes in much of Australia. There are several circumstances where it can be misleading:
\*\tIn areas dominated by aeolian transport the assumptions behind the method are not met, and dunes (for example) are represented as erosional features with MrVBF = 0.
\*\tAlluvial and colluvial fans are depositional but stand above their surroundings so often appear as erosional features with MrVBF = 0.
The characteristics of the SRTM data from which DEM-S has been derived means that there are often small raised areas within otherwise flat landscapes that may be artefacts; MrVBF will often have a small value or 0 on those features.
Logical consistency:
This product is a consistent representation of terrain based on the information in DEM-S.
Completeness:
The MrVBF product covers the same area as the source DEM-S, which is virtually all of continental Australia and near coastal islands. Some tiles containing parts of mainland or pieces of islands were not supplied at 1 second resolution and are therefore missing, see DEM-S metadata for details.
CSIRO (2000) Multi-resolution Valley Bottom Flatness MrVBF at three second resolution CSIRO 20000211. Bioregional Assessment Source Dataset. Viewed 12 December 2018, http://data.bioregionalassessments.gov.au/dataset/7dfc93bb-62f3-40a1-8d39-0c0f27a83cb3.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we used remote sensing data from multiple sources (time-series of Landsat and Sentinel images) to map the impervious surface area (ISA) at five-year intervals from 1990 to 2015, and then converted the results into a standardized dataset of the built-up area for 433 Chinese cities with 300,000 inhabitants or more, which were listed in the United Nations (UN) World Urbanization Prospects (WUP) database (including Mainland China, Hong Kong, Macao and Taiwan). We employed a range of spectral indices to generate the 1990–2015 ISA maps in urban areas based on remotely sensed data acquired from multiple sources. In this process, various types of auxiliary data were used to create the desired products for urban areas through manual segmentation of peri-urban and rural areas together with reference to several freely available products of urban extent derived from ISA data using automated urban–rural segmentation methods. After that, following the well-established rules adopted by the UN, we carried out the conversion to the standardized built-up area products from the 1990–2015 ISA maps in urban areas, which conformed to the definition of urban agglomeration area (UAA). Finally, we implemented data postprocessing to guarantee the spatial accuracy and temporal consistency of the final product.The standardized urban built-up area dataset (SUBAD–China) introduced here is the first product using the same definition of UAA adopted by the WUP database for 433 county and higher-level cities in China. The comparisons made with contemporary data produced by the National Bureau of Statistics of China, the World Bank and UN-habitat indicate that our results have a high spatial accuracy and good temporal consistency and thus can be used to characterize the process of urban expansion in China.The SUBAD–China contains 2,598 vector files in shapefile format containing data for all China's cities listed in the WUP database that have different urban sizes and income levels with populations over 300,000. Attached with it, we also provided the distribution of validation points for the 1990–2010 ISA products of these 433 Chinese cities in shapefile format and the confusion matrices between classified data and reference data during different time periods as a Microsoft Excel Open XML Spreadsheet (XLSX) file.Furthermore, The standardized built-up area products for such cities will be consistently updated and refined to ensure the quality of their spatiotemporal coverage and accuracy. The production of this dataset together with the usage of population counts derived from the WUP database will close some of the data gaps in the calculation of SDG11.3.1 and benefit other downstream applications relevant to a combined analysis of the spatial and socio-economic domains in urban areas.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Abstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. This dataset provides a …Show full descriptionAbstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. This dataset provides a userguide and setup information relating to accessing the Gescience Australia, 1 second SRTM Digital Elevation Model (DEM), for visualisation and analysis using ESRI ArcMap and ArcCatalog. The 1 second DSM, DEM, DEM-S and DEM-H are national elevation data products derived from the Shuttle Radar Topography Mission (SRTM) data. The SRTM data is not suitable for routine application due to various artefacts and noise. The data has been treated with several processes to produce more usable products: * A cleaned digital surface model (DSM) * regular grid representing ground surface topography as well as other features including vegetation and man-made structures * A bare-earth digital elevation model (DEM) * regular grid representing ground surface topography, and where possible, excluding other features such as vegetation and man-made structures. * A smoothed digital elevation model (DEM-S) * A smoothed DEM based on the bare-earth DEM that has been adaptively smoothed to reduce random noise typically associated with the SRTM data in low relief areas. * A hydrologically enforced digital elevation model (DEM-H) * A hydrologically enforced DEM is based on DEM-S that has had drainage lines imposed and been further smoothed using the ANUDEM interpolation software. The last product, a hydrologically enforced DEM, is most similar to the DEMs commonly in use around Australia, such as the GEODATA 9 Second DEM and the 25 m resolution DEMs produced by State and Territory agencies from digitised topographic maps. For any analysis where surface shape is important, one of the smoothed DEMs (DEM-S or DEM-H) should be used. DEM-S is preferred for shape and vertical accuracy and DEM-H for hydrological connectivity. The DSM is suitable if you want to see the vegetation as well as the land surface height. There are few cases where DEM is the best data source, unless access to a less processed product is necessary. The 1 second DEM (in its various incarnations) has quite different characteristics to DEMs derived by interpolation from topographic data. Those DEMs are typically quite smooth and are based on fairly accurate but sparse source data, usually contours and spot heights supplemented by drainage lines. The SRTM data is derived from radar measurements that are dense (there is essentially a measurement at almost every grid cell) but noisy. Version 1.0 of the DSM was released in early 2009 and version 1.0 of the DEM was released in late 2009. Version 1.0 of the DEM-S was released in July 2010 and version 1.0 of the hydrologically enforced DEM-H was released in October 2011. These products provide substantial improvements in the quality and consistency of the data relative to the original SRTM data, but are not free from artefacts. Improved products will be released over time. The 3 second products were derived from the 1 second data and version 1.0 was released in August 2010. Future releases of these products will occur when the 1 second products have been improved. At this stage there is no 3 second DEM-H product, which requires re-interpolation with drainage enforcement at that resolution. Dataset History The following datasets were used to derive this version of the 1 second DEM products: Source data SRTM 1 second Version 2 data (Slater et al., 2006), supplied by Defence Imagery and Geospatial Organisation (DIGO) as 813 1 x 1 degree tiles. Data were produced by NASA from radar data collected by the Shuttle Radar Topography Mission in February 2000. GEODATA 9 second DEM Version 3 (Geoscience Australia, 2008) used to fill voids. SRTM Water Body Data (SWBD) shapefile accompanying the SRTM data (Slater et al., 2006). This defines the coastline and larger inland waterbodies for the DEM and DSM. Vegetation masks and water masks applied to the DEM to remove vegetation. Full metadata, methodologies and lineage descriptions can be found in the PDF userguide within this dataset. Further information can be found at http://www.ga.gov.au/metadata-gateway/metadata/record/gcat_72759 Dataset Citation Geoscience Australia (2011) Geoscience Australia, 1 second SRTM Digital Elevation Model (DEM). Bioregional Assessment Source Dataset. Viewed 10 December 2018, http://data.bioregionalassessments.gov.au/dataset/9a9284b6-eb45-4a13-97d0-91bf25f1187b.
The Advanced Microwave Scanning Radiometer 2 (AMSR2) instrument on the Global Change Observation Mission - Water 1 (GCOM-W1) provides global passive microwave measurements of terrestrial, oceanic, and atmospheric parameters for the investigation of global water and energy cycles. Near real-time (NRT) products are generated within 3 hours of the last observations in the file, by the Land Atmosphere Near real-time Capability for EOS (LANCE) at the AMSR Science Investigator-led Processing System (AMSR SIPS), which is collocated with the Global Hydrology Resource Center (GHRC) DAAC. The GCOM-W1 NRT AMSR2 Unified Global Swath Surface Precipitation GSFC Profiling Algorithm is a swath product containing global rain rate and type, calculated by the GPROF 2017 V2R rainfall retrieval algorithm using resampled NRT Level-1R data provided by JAXA. This is the same algorithm that generates the corresponding standard science products in the AMSR SIPS. The NRT products are generated in HDF-EOS-5 augmented with netCDF-4/CF metadata and are available via HTTPS from the EOSDIS LANCE system at https://lance.nsstc.nasa.gov/amsr2-science/data/level2/rain/. If data latency is not a primary concern, please consider using science quality products. Science products are created using the best available ancillary, calibration and ephemeris information. Science quality products are an internally consistent, well-calibrated record of the Earth's geophysical properties to support science. The AMSR SIPS produces AMSR2 standard science quality data products, and they are available at the NSIDC DAAC.
This dataset consists of the Surface Ocean CO2 Atlas Version 2022 (SOCATv2022) data product files. The ocean absorbs one quarter of the global CO2 emissions from human activity. The community-led Surface Ocean CO2 Atlas (www.socat.info) is key for the quantification of ocean CO2 uptake and its variation, now and in the future. SOCAT version 2022 has quality-controlled in situ surface ocean fCO2 (fugacity of CO2) measurements on ships, moorings, autonomous and drifting surface platforms for the global oceans and coastal seas from 1957 to 2021. The main synthesis and gridded products contain 33.7 million fCO2 values with an estimated accuracy of better than 5 μatm. A further 6.4 million fCO2 sensor data with an estimated accuracy of 5 to 10 μatm are separately available. During quality control, marine scientists assign a flag to each data set, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Data sets are assigned flags of A and B for an estimated accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3 to 2022. Quality control comments for individual data sets can be accessed via the SOCAT Data Set Viewer (www.socat.info). All data sets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Access to data sets with an estimated accuracy of 5 to 10 (flag of E) and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. The gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer (www.socat.info), enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. Matlab code is available for reading these files. Ocean Data View also provides access to the SOCAT data products (www.socat.info). SOCAT data products are discoverable, accessible and citable. The SOCAT Data Use Statement (www.socat.info) asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable data sets, the Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onwards. SOCAT is used for quantification of ocean CO2 uptake and ocean acidification and for evaluation of climate models and sensor data. SOCAT products inform the annual Global Carbon Budget since 2013. The annual SOCAT releases by the SOCAT scientific community are a Voluntary Commitment for United Nations Sustainable Development Goal 14.3 (Reduce Ocean Acidification) (#OceanAction20464). More broadly the SOCAT releases contribute to UN SDG 13 (Climate Action) and SDG 14 (Life Below Water), and to the UN Decade of Ocean Science for Sustainable Development. Hundreds of peer-reviewed scientific publications and high-impact reports cite SOCAT. The SOCAT community-led synthesis product is a key step in the value chain based on in situ inorganic carbon measurements of the oceans, which provides policy makers with critical information on ocean CO2 uptake in climate negotiations. The need for accurate knowledge of global ocean CO2 uptake and its (future) variation makes sustained funding of in situ surface ocean CO2 observations imperative.