100+ datasets found

i
Analytics
ieee-dataport.org
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuriy Syerov (2025). Analytics [Dataset]. https://ieee-dataport.org/documents/social-media-big-dataset-research-analytics-prediction-and-understanding-global-climate
Explore at:
Dataset updated
Jun 17, 2025
Authors
Yuriy Syerov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
trends
d
Employee Data | The Largest Dataset Of Active Profiles | Global / 1B Records...
datarade.ai
.json
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avanteer (2025). Employee Data | The Largest Dataset Of Active Profiles | Global / 1B Records / Updated Daily [Dataset]. https://datarade.ai/data-products/employee-data-the-largest-dataset-of-active-profiles-glob-avanteer
Explore at:
.jsonAvailable download formats
Dataset updated
Apr 19, 2025
Dataset authored and provided by
Avanteer
Area covered
State of, Anguilla, Tunisia, Bulgaria, Pitcairn, United Arab Emirates, Gambia, Nicaragua, Maldives, Fiji
Description
//// 🌍 Avanteer Employee Data ////

The Largest Dataset of Active Global Profiles 1B+ Records | Updated Daily | Built for Scale & Accuracy

Avanteer’s Employee Data offers unparalleled access to the world’s most comprehensive dataset of active professional profiles. Designed for companies building data-driven products or workflows, this resource supports recruitment, lead generation, enrichment, and investment intelligence — with unmatched scale and update frequency.

//// 🔧 What You Get ////

1B+ active profiles across industries, roles, and geographies

Work history, education history, languages, skills and multiple additional datapoints.

AI-enriched datapoints include: Gender Age Normalized seniority Normalized department Normalized skillset MBTI assessment

Daily updates, with change-tracking fields to capture job changes, promotions, and new entries.

Flexible delivery via API, S3, or flat file.

Choice of formats: raw, cleaned, or AI-enriched.

Built-in compliance aligned with GDPR and CCPA.

//// 💡 Key Use Cases ////

✅ Smarter Talent Acquisition Identify, enrich, and engage high-potential candidates using up-to-date global profiles.

✅ B2B Lead Generation at Scale Build prospecting lists with confidence using job-related and firmographic filters to target decision-makers across verticals.

✅ Data Enrichment for SaaS & Platforms Supercharge ATS, CRMs, or HR tech products by syncing enriched, structured employee data through real-time or batch delivery.

✅ Investor & Market Intelligence Analyze team structures, hiring trends, and senior leadership signals to discover early-stage investment opportunities or evaluate portfolio companies.

//// 🧰 Built for Top-Tier Teams Who Move Fast ////

Zero duplicate, by design

<300ms API response time

99.99% guaranteed API uptime

Onboarding support including data samples, test credits, and consultations

Advanced data quality checks

//// ✅ Why Companies Choose Avanteer ////

➔ The largest daily-updated dataset of global professional profiles

➔ Trusted by sales, HR, and data teams building at enterprise scale

➔ Transparent, compliant data collection with opt-out infrastructure baked in

➔ Dedicated support with fast onboarding and hands-on implementation help

////////////////////////////////

Empower your team with reliable, current, and scalable employee data — all from a single source.
open-pii-masking-500k-ai4privacy
kaggle.com
dataverse.harvard.edu
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Anthony (2025). open-pii-masking-500k-ai4privacy [Dataset]. https://www.kaggle.com/datasets/mikedoes/open-pii-masking-500k-ai4privacy
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Michael Anthony
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🌍 World's largest open dataset for privacy masking 🌎

The dataset is useful to train and evaluate models to remove personally identifiable and sensitive information from text, especially in the context of AI assistants and LLMs.

Dataset Analytics 📊 - ai4privacy/open-pii-masking-500k-ai4privacy

p5y Data Analytics

Total Entries: 580,227

Total Tokens: 19,199,982

Average Source Text Length: 17.37 words

Total PII Labels: 5,705,973

Number of Unique PII Classes: 20 (Open PII Labelset)

Unique Identity Values: 704,215

Language Distribution Analytics

**Number of Unique Languages**: 8 | Language | Count | Percentage | |--------------------|----------|------------| | English (en) 🇺🇸🇬🇧🇨🇦🇮🇳 | 150,693 | 25.97% | | French (fr) 🇫🇷🇨🇭🇨🇦 | 112,136 | 19.33% | | German (de) 🇩🇪🇨🇭 | 82,384 | 14.20% | | Spanish (es) 🇪🇸 🇲🇽 | 78,013 | 13.45% | | Italian (it) 🇮🇹🇨🇭 | 68,824 | 11.86% | | Dutch (nl) 🇳🇱 | 26,628 | 4.59% | | Hindi (hi)* 🇮🇳 | 33,963 | 5.85% | | Telugu (te)* 🇮🇳 | 27,586 | 4.75% | *these languages are in experimental stages

Region Distribution Analytics

**Number of Unique Regions**: 11 | Region | Count | Percentage | |-----------------------|----------|------------| | Switzerland (CH) 🇨🇭 | 112,531 | 19.39% | | India (IN) 🇮🇳 | 99,724 | 17.19% | | Canada (CA) 🇨🇦 | 74,733 | 12.88% | | Germany (DE) 🇩🇪 | 41,604 | 7.17% | | Spain (ES) 🇪🇸 | 39,557 | 6.82% | | Mexico (MX) 🇲🇽 | 38,456 | 6.63% | | France (FR) 🇫🇷 | 37,886 | 6.53% | | Great Britain (GB) 🇬🇧 | 37,092 | 6.39% | | United States (US) 🇺🇸 | 37,008 | 6.38% | | Italy (IT) 🇮🇹 | 35,008 | 6.03% | | Netherlands (NL) 🇳🇱 | 26,628 | 4.59% |

Machine Learning Task Analytics

| Split | Count | Percentage | |-------------|----------|------------| | **Train** | 464,150 | 79.99% | | **Validate**| 116,077 | 20.01% |

Usage

Option 1: Python terminal pip install datasets python from datasets import load_dataset dataset = load_dataset("ai4privacy/open-pii-masking-500k-ai4privacy")

Compatible Machine Learning Tasks:

Tokenclassification. Check out a HuggingFace's guide on token classification.

ALBERT, BERT, BigBird, BioGpt, BLOOM, BROS, CamemBERT, CANINE, ConvBERT, Data2VecText, DeBERTa, DeBERTa-v2, DistilBERT, ELECTRA, ERNIE, ErnieM, ESM, Falcon, FlauBERT, FNet, Funnel Transformer, GPT-Sw3, OpenAI GPT-2, GPTBigCode, GPT Neo, GPT NeoX, I-BERT, [LayoutLM](http...
AI Training Dataset Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). AI Training Dataset Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-ai-training-dataset-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
AI Training Dataset Market Outlook

The global AI training dataset market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 6.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 20.5% from 2024 to 2032. This substantial growth is driven by the increasing adoption of artificial intelligence across various industries, the necessity for large-scale and high-quality datasets to train AI models, and the ongoing advancements in AI and machine learning technologies.

One of the primary growth factors in the AI training dataset market is the exponential increase in data generation across multiple sectors. With the proliferation of internet usage, the expansion of IoT devices, and the digitalization of industries, there is an unprecedented volume of data being generated daily. This data is invaluable for training AI models, enabling them to learn and make more accurate predictions and decisions. Moreover, the need for diverse and comprehensive datasets to improve AI accuracy and reliability is further propelling market growth.

Another significant factor driving the market is the rising investment in AI and machine learning by both public and private sectors. Governments around the world are recognizing the potential of AI to transform economies and improve public services, leading to increased funding for AI research and development. Simultaneously, private enterprises are investing heavily in AI technologies to gain a competitive edge, enhance operational efficiency, and innovate new products and services. These investments necessitate high-quality training datasets, thereby boosting the market.

The proliferation of AI applications in various industries, such as healthcare, automotive, retail, and finance, is also a major contributor to the growth of the AI training dataset market. In healthcare, AI is being used for predictive analytics, personalized medicine, and diagnostic automation, all of which require extensive datasets for training. The automotive industry leverages AI for autonomous driving and vehicle safety systems, while the retail sector uses AI for personalized shopping experiences and inventory management. In finance, AI assists in fraud detection and risk management. The diverse applications across these sectors underline the critical need for robust AI training datasets.

As the demand for AI applications continues to grow, the role of Ai Data Resource Service becomes increasingly vital. These services provide the necessary infrastructure and tools to manage, curate, and distribute datasets efficiently. By leveraging Ai Data Resource Service, organizations can ensure that their AI models are trained on high-quality and relevant data, which is crucial for achieving accurate and reliable outcomes. The service acts as a bridge between raw data and AI applications, streamlining the process of data acquisition, annotation, and validation. This not only enhances the performance of AI systems but also accelerates the development cycle, enabling faster deployment of AI-driven solutions across various sectors.

Regionally, North America currently dominates the AI training dataset market due to the presence of major technology companies and extensive R&D activities in the region. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid technological advancements, increasing investments in AI, and the growing adoption of AI technologies across various industries in countries like China, India, and Japan. Europe and Latin America are also anticipated to experience significant growth, supported by favorable government policies and the increasing use of AI in various sectors.

Data Type Analysis

The data type segment of the AI training dataset market encompasses text, image, audio, video, and others. Each data type plays a crucial role in training different types of AI models, and the demand for specific data types varies based on the application. Text data is extensively used in natural language processing (NLP) applications such as chatbots, sentiment analysis, and language translation. As the use of NLP is becoming more widespread, the demand for high-quality text datasets is continually rising. Companies are investing in curated text datasets that encompass diverse languages and dialects to improve the accuracy and efficiency of NLP models.

Image data is critical for computer vision application
List of all the skills
kaggle.com
Updated Aug 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ask9 (2020). List of all the skills [Dataset]. https://www.kaggle.com/datasets/arbazkhan971/allskillandnonskill
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ask9
Description
Context

which contains all the skills from linkedin ,Github and stackoverflow and all the skills from job descriptions across different platform like naukri ,indeed and monster.com

This is the World's Largest Collection of Dataset for skills which covers all the skills.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
P
DISL Dataset
paperswithcode.com
Updated Mar 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriele Morello; Mojtaba Eshghie; Sofia Bobadilla; Martin Monperrus (2024). DISL Dataset [Dataset]. https://paperswithcode.com/dataset/disl
Explore at:
Dataset updated
Mar 24, 2024
Authors
Gabriele Morello; Mojtaba Eshghie; Sofia Bobadilla; Martin Monperrus
Description
DISL The full dataset report is available at: https://arxiv.org/abs/2403.16861

The DISL dataset features a collection of 514, 506 unique Solidity files that have been deployed to Ethereum mainnet. It caters to the need for a large and diverse dataset of real-world smart contracts. DISL serves as a resource for developing machine learning systems and for benchmarking software engineering tools designed for smart contracts.

Curated by: Gabriele Morello License: [MIT]

Instructions to explore the dataset ```python from datasets import load_dataset

Load the raw dataset dataset = load_dataset("ASSERT-KTH/DISL", "raw")

OR Load the decomposed dataset dataset = load_dataset("ASSERT-KTH/DISL", "decomposed")

number of rows and columns num_rows = len(dataset["train"]) num_columns = len(dataset["train"].column_names)

random row import random random_row = random.choice(dataset["train"])

random source code random_sc = random.choice(dataset["train"])['source_code'] print(random_sc) ```
MoreFixes: Largest CVE dataset with fixes
zenodo.org
data.niaid.nih.gov
zip
Updated Sep 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jafar Akhoundali; Jafar Akhoundali; Sajad Rahim Nouri; Sajad Rahim Nouri; Kristian F. D. Rietveld; Kristian F. D. Rietveld; Olga GADYATSKAYA; Olga GADYATSKAYA (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. http://doi.org/10.5281/zenodo.11199120
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11199120
Dataset updated
Sep 25, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jafar Akhoundali; Jafar Akhoundali; Sajad Rahim Nouri; Sajad Rahim Nouri; Kristian F. D. Rietveld; Kristian F. D. Rietveld; Olga GADYATSKAYA; Olga GADYATSKAYA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits.
Our dataset containing 26,617 unique CVEs coming from 6,945 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 31,883 unique commits that fixed those vulnerabilities. Compared to prior work, our dataset brings about a 397% increase in CVEs, a 295% increase in covered open-source projects, and a 480% increase in commit fixes.
Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

We release to the community a 14GB PostgreSQL database that contains information on CVEs up to January 24, 2024, CWEs of each CVE, files and methods changed by each commit, and repository metadata.
Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

`cvedataset-patches.zip` file contains fix patches, and `dump_morefixes_27-03-2024_19_52_58.sql.zip` contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

This product uses the NVD API but is not endorsed or certified by the NVD.

This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d
LAS&T: Large Shape And Texture Dataset
zenodo.org
jpeg, zip
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagi Eppel; Sagi Eppel (2025). LAS&T: Large Shape And Texture Dataset [Dataset]. http://doi.org/10.5281/zenodo.15453634
Explore at:
jpeg, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15453634
Dataset updated
May 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sagi Eppel; Sagi Eppel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Large Shape And Texture Dataset (LAS&T)

LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures.

Overview

The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

Main Dataset webpage

The dataset contain four parts parts:

3D shape recognition and retrieval.

2D shape recognition and retrieval.

3D Materials recognition and retrieval.

2D Texture recognition and retrieval.

Each can be used independently for training and testing.

Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images (SHAPES_COLLECTION_350k.zip)

3D shape recognition real-world images benchmark

The scripts used to generate and test the dataset are supplied as in SCRIPT** files.

Shapes Recognition and Retrieval:

For shape recognition the goal is to identify the same shape in different images, where the material/texture/color of the shape is changed, the shape is rotated, and the background is replaced. Hence, only the shape remains the same in both images. All files with 3D shapes contain samples of the 3D shape dataset. This is tested for 3D shapes/objects with realistic light simulation. All files with 2D shapes contain samples of the 2D shape dataset. Examples files contain images with examples for each set.

Main files:

Real_Images_3D_shape_matching_Benchmarks.zip contains real-world image benchmarks for 3D shapes.

3D_Shape_Recognition_Synthethic_GENERAL_LARGE_SET_76k.zip A Large number of synthetic examples 3D shapes with max variability can be used for training/testing 3D shape/objects recognition/retrieval.

2D_Shapes_Recognition_Textured_Synthetic_Resize2_GENERAL_LARGE_SET_61k.zip A Large number of synthetic examples for 2D shapes with max variability can be used for training/testing 2D shape recognition/retrieval.

SHAPES_2D_365k.zip 365,000 2D shapes extracted from real-world images saved as black and white .png image files.

File structure:

All jpg images that are in the exact same subfolder contain the exact same shape (but with different texture/color/background/orientation).

Textures and Materials Recognition and Retrieval

For texture and materials, the goal is to identify and match images containing the same material or textures, however the shape/object on which the material texture is applied is different, and so is the background and light.

This is done for physics-based material in 3D and abstract 2D textures.

3D_Materials_PBR_Synthetic_GENERAL_LARGE_SET_80K.zip A Large number of examples of 3D materials in physics grounded can be used for training or testing of material recognition/retrieval.

2D_Textures_Recogition_GENERAL_LARGE_SET_Synthetic_53K.zip

Large number of images of 2D texture in maximum variability of setting can be used for training/testing 2D textured recognition/retrieval.

File structure:

All jpg images that are in the exact same subfolder contain the exact same texture/material (but overlay on different objects with different background/and illumination/orientation).

Data Generation:

The images in the synthetic part of the dataset were created by automatically extracting shapes and textures from natural images and combining them in synthetic images. This created synthetic images that completely rely on real-world patterns, making extremely diverse and complex shapes and textures. As far as we know this is the largest and most diverse shape and texture recognition/retrieval dataset. 3D data was generated using physics-based material and rendering (blender) making the images physically grounded and enabling using the data to train for real-world examples. The scripts for generating the data are supplied in files with the world SCRIPTS* in them.

Real-world image data:

For 3D shape recognition and retrieval, we also supply a real-world natural image benchmark. With a variety of natural images containing the exact same 3D shape but made/coated with different materials and in different environments and orientations. The goal is again to identify the same shape in different images. The benchmark is available at: Real_Images_3D_shape_matching_Benchmarks.zip

File structure:

Files containing the word 'GENERAL_LARGE_SET' contains synthetic images that can be used for training or testing, the type of data (2D shapes, 3D shapes, 2D textures, 3D materials) that appears in the file name, as well as the number of images. Files containing MultiTests contain a number of different tests in which only a single aspect of the aspect of the instance is changed (for example only the background.) File containing "SCRIPTS" contain data generation testing scripts. Images containing "examples" are example of each test.

Shapes Collections

The file SHAPES_COLLECTION_350k.zip contains 350,000 2D shapes extracted from natural images and used for the dataset generation.

Evaluating and Testing

For evaluating and testing see: SCRIPTS_Testing_LVLM_ON_LAST_VQA.zip
This can be use to test leading LVLMs using api, create human tests, and in general turn the dataset into multichoice question images similar to the one in the paper.
i
DeepSense 6G: A Large-Scale Real-World Multi-Modal Sensing and Communication...
ieee-dataport.org
Updated Aug 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wireless Intelligence Lab (2023). DeepSense 6G: A Large-Scale Real-World Multi-Modal Sensing and Communication Dataset [Dataset]. https://ieee-dataport.org/documents/deepsense-6g-large-scale-real-world-multi-modal-sensing-and-communication-dataset
Explore at:
Dataset updated
Aug 18, 2023
Authors
Wireless Intelligence Lab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
LiDAR
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
w
Dataset of books called The largest life-boats in the world : a history of...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The largest life-boats in the world : a history of the 60ft Barnett class twin screw motor life-boats [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+largest+life-boats+in+the+world+%3A+a+history+of+the+60ft+Barnett+class+twin+screw+motor+life-boats
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
This dataset is about books. It has 1 row and is filtered where the book is The largest life-boats in the world : a history of the 60ft Barnett class twin screw motor life-boats. It features 7 columns including author, publication date, language, and book publisher.
Most popular database management systems worldwide 2024
statista.com
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
Explore at:
Dataset updated
Jun 19, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2024
Area covered
Worldwide
Description
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of 1244.08; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
P
Data from: PolyU Dataset
paperswithcode.com
opendatalab.com
Updated Apr 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jun Xu; Hui Li; Zhetong Liang; David Zhang; Lei Zhang (2018). PolyU Dataset [Dataset]. https://paperswithcode.com/dataset/polyu-dataset
Explore at:
Dataset updated
Apr 6, 2018
Authors
Jun Xu; Hui Li; Zhetong Liang; David Zhang; Lei Zhang
Description
PolyU Dataset is a large dataset of real-world noisy images with reasonably obtained corresponding “ground truth” images. The basic idea is to capture the same and unchanged scene for many (e.g., 500) times and compute their mean image, which can be roughly taken as the “ground truth” image for the real-world noisy images. The rational of this strategy is that for each pixel, the noise is generated randomly larger or smaller than 0. Sampling the same pixel many times and computing the average value will approximate the truth pixel value and alleviate significantly the noise.
Largest Glaciers and Glacier Complexes in the World, Version 1 - Dataset -...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nasa.gov (2025). Largest Glaciers and Glacier Complexes in the World, Version 1 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/largest-glaciers-and-glacier-complexes-in-the-world-version-1-75379
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
World
Description
This data set provides a list of the three largest glaciers and glacier complexes in each of the 19 glacial regions of the world as defined by the Global Terrestrial Network for Glaciers. The data are provided in shapefile format with an outline for each of the largest ice bodies along with a number of attributes such as area in km2.
C
NCEI Standard Product: World Ocean Database (WOD)
data.cnra.ca.gov
catalog.data.gov
+1more
html
Updated May 9, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ocean Data Partners (2019). NCEI Standard Product: World Ocean Database (WOD) [Dataset]. https://data.cnra.ca.gov/dataset/ncei-standard-product-world-ocean-database-wod
Explore at:
htmlAvailable download formats
Dataset updated
May 9, 2019
Dataset authored and provided by
Ocean Data Partners
Description
The World Ocean Database (WOD) is the world's largest publicly available uniform format quality controlled ocean profile dataset. Ocean profile data are sets of measurements of an ocean variable vs. depth at a single geographic location within a short (minutes to hours) temporal period in some portion of the water column from the surface to the bottom. To be considered a profile for the WOD, there must be more than a single depth/variable pair. Multiple profiles at the same location from the same set of instruments is an oceanographic cast. Ocean variables in the WOD include temperature, salinity, oxygen, nutrients, tracers, and biological variables such as plankton and chlorophyll. Quality control procedures are documented and performed on each cast and the results are included as flags on each measurement. The WOD contains the data on the originally measured depth levels (observed) and also interpolated to standard depth levels to present a more uniform set of iso-surfaces for oceanographic and climate work.

The source of the WOD is more than 20,000 separate archived data sets contributed by institutions, project, government agencies, and individual investigators from the United States and around the world. Each data set is available in its original form in the National Centers for Environmental Information data archives. All data sets are converted to the same standard format, checked for duplication within the WOD, and assigned quality flags based on objective tests. Additional subjective flags are set upon calculation of ocean climatological mean fields which make up the World Ocean Atlas (WOA) series.

The WOD consists of periodic major releases and quarterly updates to those releases. Each major release is associated with a concurrent release of a WOA release, and contains final quality control flags used in the WOA, which includes manual as well as automated steps. Each quarterly update release includes additional historical and recent data and preliminary quality control. The latest major release was WOD 2018 (WOD18), which includes nearly 16 million oceanographic casts, from the second voyage of Captain Cook (1772) to the modern Argo floats (end of 2017).

The WOD presents data in netCDF ragged array format following the Climate and Forecast (CF) conventions for ease of use mindful of space limitations.
g
NCEI Standard Product: World Ocean Database (WOD) | gimi9.com
gimi9.com
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). NCEI Standard Product: World Ocean Database (WOD) | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_ncei-standard-product-world-ocean-database-wod3
Explore at:
Dataset updated
Dec 4, 2024
Description
The World Ocean Database (WOD) is the world's largest publicly available uniform format quality controlled ocean profile dataset. Ocean profile data are sets of measurements of an ocean variable vs. depth at a single geographic location within a short (minutes to hours) temporal period in some portion of the water column from the surface to the bottom. To be considered a profile for the WOD, there must be more than a single depth/variable pair. Multiple profiles at the same location from the same set of instruments is an oceanographic cast. Ocean variables in the WOD include temperature, salinity, oxygen, nutrients, tracers, and biological variables such as plankton and chlorophyll. Quality control procedures are documented and performed on each cast and the results are included as flags on each measurement. The WOD contains the data on the originally measured depth levels (observed) and also interpolated to standard depth levels to present a more uniform set of iso-surfaces for oceanographic and climate work. The source of the WOD is more than 20,000 separate archived datasets contributed by institutions, project, government agencies, and individual investigators from the United States and around the world. Each dataset is available in its original form in the National Centers for Environmental Information data archives. All datasets are converted to the same standard format, checked for duplication within the WOD, and assigned quality flags based on objective tests. Additional subjective flags are set upon calculation of ocean climatological mean fields which make up the World Ocean Atlas (WOA) series. The WOD consists of periodic major releases and quarterly updates to those releases. Each major release is associated with a concurrent release of a WOA release, and contains final quality control flags used in the WOA, which includes manual as well as automated steps. Each quarterly update release includes additional historical and recent data and preliminary quality control. The latest major release was WOD 2018 (WOD18), which includes nearly 16 million oceanographic casts, from the second voyage of Captain Cook (1772) to the modern Argo floats (end of 2017). The WOD presents data in netCDF ragged array format following the Climate and Forecast (CF) conventions for ease of use mindful of space limitations.
Forecast revenue big data market worldwide 2011-2027
statista.com
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
Explore at:
Dataset updated
Feb 13, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027.

What is Big data?

Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets.

Big data analytics

Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
World Religion Project - Regional Religion Dataset
thearda.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Association of Religion Data Archives, World Religion Project - Regional Religion Dataset [Dataset]. http://doi.org/10.17605/OSF.IO/XZABV
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/XZABV
Dataset provided by
Association of Religion Data Archives
Dataset funded by
The University of California, Davis
The John Templeton Foundation
Description
The World Religion Project (WRP) aims to provide detailed information about religious adherence worldwide since 1945. It contains data about the number of adherents by religion in each of the states in the international system. These numbers are given for every half-decade period (1945, 1950, etc., through 2010). Percentages of the states' populations that practice a given religion are also provided. (Note: These percentages are expressed as decimals, ranging from 0 to 1, where 0 indicates that 0 percent of the population practices a given religion and 1 indicates that 100 percent of the population practices that religion.) Some of the religions (as detailed below) are divided into religious families. To the extent data are available, the breakdown of adherents within a given religion into religious families is also provided.

The project was developed in three stages. The first stage consisted of the formation of a religion tree. A religion tree is a systematic classification of major religions and of religious families within those major religions. To develop the religion tree we prepared a comprehensive literature review, the aim of which was (i) to define a religion, (ii) to find tangible indicators of a given religion of religious families within a major religion, and (iii) to identify existing efforts at classifying world religions. (Please see the original survey instrument to view the structure of the religion tree.) The second stage consisted of the identification of major data sources of religious adherence and the collection of data from these sources according to the religion tree classification. This created a dataset that included multiple records for some states for a given point in time. It also contained multiple missing data for specific states, specific time periods and specific religions. The third stage consisted of cleaning the data, reconciling discrepancies of information from different sources and imputing data for the missing cases.

The Regional Religion Dataset: The unit of analysis is the region, measured at five-year intervals. The Correlates of War regional breakdown is used with one modification: the Oceania category is added for Correlates of War nation numbers 900 and above.
Major Cities of The World
johnsnowlabs.com
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs, Major Cities of The World [Dataset]. https://www.johnsnowlabs.com/marketplace/ai-in-health-care-trends-and-challenges-in-2022/
Explore at:
csvAvailable download formats
Dataset authored and provided by
John Snow Labs
Area covered
World, World
Description
This dataset lists cities which consists of above 15,000 inhabitants. Each city is associated with its country and sub-country to reduce the number of ambiguities. Subcountry can be the name of a state (eg in the United Kingdom or the United States of America) or the major administrative section (eg "region" in "France").
c
The global Big Data market size is USD 40.5 billion in 2024 and will expand...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). The global Big Data market size is USD 40.5 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 12.9% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/big-data-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Apr 9, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global Big Data marketsize is USD 40.5 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 12.9% from 2024 to 2031. Market Dynamics of Big Data Market Key Drivers for Big Data Market Increasing demand for decision-making based on data - One of the main reasons the Big Data market is growing is due to the increasing demand for decision-making based on data. Organizations understand the strategic benefit of using data insights to make accurate and informed decisions in the current competitive scenario. This change marks a break from conventional decision-making paradigms as companies depend more and more on big data analytics to maximize performance, reduce risk, and open up prospects. Real-time processing, analysis, and extraction of actionable insights from large datasets enables businesses to react quickly to consumer preferences and market trends. The increasing need to maximize performance, reduce risk, and open up prospects is anticipated to drive the Big Data market's expansion in the years ahead. Key Restraints for Big Data Market The lack of integrator and interoperability poses a serious threat to the Big Data industry. The market also faces significant difficulties because of the realization of its full potential. Introduction of the Big Data Market Big data software is a category of software used for gathering, storing, and processing large amounts of heterogeneous, dynamic data produced by humans, machines, and other technologies. It is concentrated on offering effective analytics for extraordinarily massive datasets, which help the organization obtain a profound understanding by transforming the data into superior knowledge relevant to the business scenario. Additionally, the programmer assists in identifying obscure correlations, market trends, customer preferences, hidden patterns, and other valuable information from a wide range of data sets. Due to the widespread use of digital solutions in sectors such as finance, healthcare, BFSI, retail, agriculture, telecommunications, and media, data is increasing dramatically on a worldwide scale. Smart devices, soil sensors, and GPS-enabled tractors generate massive amounts of data. Large data sets, such as supply tracks, natural trends, optimal crop conditions, sophisticated risk assessment, and more, are analyzed in agriculture through the application of big data analytics.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yuriy Syerov (2025). Analytics [Dataset]. https://ieee-dataport.org/documents/social-media-big-dataset-research-analytics-prediction-and-understanding-global-climate

Analytics

Prediction

Social Media Big Dataset for Research

and Understanding the Global Climate Change Trends

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 17, 2025

Authors

Yuriy Syerov

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

trends

Clear search

Close search

Google apps

Main menu

Analytics

Employee Data | The Largest Dataset Of Active Profiles | Global / 1B Records...

open-pii-masking-500k-ai4privacy

🌍 World's largest open dataset for privacy masking 🌎

Dataset Analytics 📊 - ai4privacy/open-pii-masking-500k-ai4privacy

p5y Data Analytics

Language Distribution Analytics

Region Distribution Analytics

Machine Learning Task Analytics

Usage

Compatible Machine Learning Tasks:

AI Training Dataset Market Report | Global Forecast From 2025 To 2033

AI Training Dataset Market Outlook

Data Type Analysis

List of all the skills

Context

Content

Acknowledgements

Inspiration

DISL Dataset

MoreFixes: Largest CVE dataset with fixes

LAS&T: Large Shape And Texture Dataset

The Large Shape And Texture Dataset (LAS&T)

Overview

Main Dataset webpage

The dataset contain four parts parts:

Shapes Recognition and Retrieval:

Main files:

Real_Images_3D_shape_matching_Benchmarks.zip contains real-world image benchmarks for 3D shapes.

File structure:

Textures and Materials Recognition and Retrieval

File structure:

Data Generation:

Real-world image data:

File structure:

Shapes Collections

Evaluating and Testing

DeepSense 6G: A Large-Scale Real-World Multi-Modal Sensing and Communication...

Dataset of development of business during the COVID-19 crisis

Dataset of books called The largest life-boats in the world : a history of...

Most popular database management systems worldwide 2024

Data from: PolyU Dataset

Largest Glaciers and Glacier Complexes in the World, Version 1 - Dataset -...

NCEI Standard Product: World Ocean Database (WOD)

NCEI Standard Product: World Ocean Database (WOD) | gimi9.com

Forecast revenue big data market worldwide 2011-2027

World Religion Project - Regional Religion Dataset

Major Cities of The World

The global Big Data market size is USD 40.5 billion in 2024 and will expand...

Analytics

Prediction

Social Media Big Dataset for Research

and Understanding the Global Climate Change Trends