12 datasets found

f
MongoDB dump (compressed)
figshare.com
7z
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Connor Coley (2023). MongoDB dump (compressed) [Dataset]. http://doi.org/10.6084/m9.figshare.4833482.v1
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4833482.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Connor Coley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This mongodump contains four collections associated with http://dx.doi.org/10.1021/acscentsci.7b00064 :reaction_examples/lowe_1976-2013_USPTOgrants - a collection of reaction SMILES extracted from USPTO grants by Daniel Lowereaction_examples/lowe_1976-2013_USPTOgrants_reactions - an incomplete collection of reactions extracted from USPTO grants by Daniel Lowe, containing some additional information about reagents/catalysts/solvents where knownaskcos_transforms/lowe_refs_general_v3 - a collection of highly-general reaction SMARTS strings extracted from the USPTO smilesprediction/candidate_edits_8_9_16 - a collection of reaction examples with possible products enumerated, used as input for a machine learning model
h
embedded_movies
huggingface.co
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MongoDB (2024). embedded_movies [Dataset]. https://huggingface.co/datasets/MongoDB/embedded_movies
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset authored and provided by
MongoDB
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
sample_mflix.embedded_movies

This data set contains details on movies with genres of Western, Action, or Fantasy. Each document contains a single movie, and information such as its title, release year, and cast. In addition, documents in this collection include a plot_embedding field that contains embeddings created using OpenAI's text-embedding-ada-002 embedding model that you can use with the Atlas Search vector search feature.

Overview

This dataset offers a… See the full description on the dataset page: https://huggingface.co/datasets/MongoDB/embedded_movies.
Transient Host Exchange
zenodo.org
application/gzip, txt
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
THEx Team; Yu-Jing Qin; Yu-Jing Qin; THEx Team (2021). Transient Host Exchange [Dataset]. http://doi.org/10.5281/zenodo.5568962
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5568962
Dataset updated
Oct 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
THEx Team; Yu-Jing Qin; Yu-Jing Qin; THEx Team
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The First Public Data Release (DR1) of Transient Host Exchange (THEx) Dataset

Paper describing the dataset: “Linking Extragalactic Transients and their Host Galaxy Properties: Transient Sample, Multi-Wavelength Host Identification, and Database Construction” (Qin et al. 2021)

The data release contains four compressed archives.

“BSON export” is a binary export of the “host_summary” collection, which is the “full version” of the dataset. The schema was presented in the Appendix section of the paper.

You need to set up a MongoDB server to use this version of the dataset. After setting up the server, you may import this BSON file into your local database as a collection using “mongorestore” command.

You may find some useful tutorials for setting up the server and importing BSON files into your local database at:

https://docs.mongodb.com/manual/installation/

https://www.mongodb.com/basics/bson

You may run common operations like query and aggregation once you import this BSON snapshot into your local database. An official tutorial can be found at:

https://docs.mongodb.com/manual/tutorial/query-documents/

There are other packages (e.g., pymongo for Python) and software to perform these database operations.

“JSON export” is a compressed archive of JSON files. Each file, named by the unique id and the preferred name of the event, contains complete host data of a single event. The data schema and contents are identical to the “BSON” version.

“NumPy export” contains a series of NumPy tables in “npy” format. There is a row-to-row correspondence across these files. Except for the “master table” (THEx-v8.0-release-assembled.npy), which contains all the columns, each file contains the host properties cross-matched in a single external catalog. The meta info and ancillary data are summarized in THEx-v8.0-release-assembled-index.npy.

There is also a THEx-v8.0-release-typerowmask.npy file, which has rows co-indexed with other files and columns named after each transient type. The “rowmask” file allows you to select a subset of events under a specific transient type.

Note that in this version, we only include cataloged properties of the confirmed hosts or primary candidates. If the confirmed host (or primary candidate) cross-matched multiple sources in a specific catalog, we only use the representative source for host properties. Properties of other cross-matched groups are not included. Finally, table THEx-v8.0-release-MWExt.npy contains the calculated foreground extinction (in magnitudes) at host positions. These extinction values have not been applied to magnitude columns in our dataset. You need to perform this correction by yourself if desired.

“FITS export” includes the same individual tables as in “NumPy export”. However, the FITS standard limits the number of columns in a table. Therefore, we do not include the “master table” in “FITS export.”

Finally, in BSON and JSON versions, cross-matched groups (under the “groups” key) are ordered by the default ranking function. Even if the first group in this list (namely, the confirmed host or primary host candidate) is a mismatched or misidentified one, we keep it in its original position. The result of visual inspection, including our manual reassignments, has been summarized under the “vis_insp” key.

For NumPy and FITS versions, if we have manually reassigned the host of an event, the data presented in these tables are also updated accordingly. You may use the “case_code” column in the “index” file to find the result of visual inspection and manual reassignment, where the flags for this “case_code” column are summarized in case-code.txt. Generally, codes “A1” and “F1” are known and new hosts that passed our visual inspection, while codes “B1” and “G1” are mismatched known hosts and possibly misidentified new hosts that have been manually reassigned.
l
Flora Samples Import and Taxonomy Update to Database
metadatacatalogue.lifewatch.eu
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Flora Samples Import and Taxonomy Update to Database [Dataset]. https://metadatacatalogue.lifewatch.eu/geonetwork/search?keyword=Taxonomy%20update
Explore at:
Dataset updated
Jun 1, 2024
Description
This workflow aims to efficiently integrate floral sample data from Excel files into a MongoDB database for botanical projects. It involves verifying and updating taxonomic information, importing georeferenced floral samples, converting data to JSON format, and uploading it to the database. This process ensures accurate taxonomy and enriches the database with comprehensive sample information, supporting robust data analysis and enhancing the project's overall dataset. Background Efficient management of flora sample data is essential in botanical projects, especially when integrating diverse information into a MongoDB database. This workflow addresses the challenge of incorporating floral samples, collected at various sampling points, into the MongoDB database. The database is divided into two segments: one storing taxonomic information and common characteristics of taxa, and the other containing georeferenced floral samples with relevant information. The workflow ensures that, upon importing new samples, taxonomic information is verified and updated, if necessary, before storing the sample data. Introduction In botanical projects, effective data handling is pivotal, particularly when incorporating diverse flora samples into a MongoDB database. This workflow focuses on importing floral samples from an Excel file into MongoDB, ensuring data integrity and taxonomic accuracy. The database is structured into taxonomic information and a collection of georeferenced floral samples, each with essential details about the collection location and the species' nativity. The workflow dynamically updates taxonomic records and stores new samples in the appropriate database sections, enriching the overall floral sample collection. Aims The primary aim of this workflow is to streamline the integration of floral sample data into the MongoDB database, maintaining taxonomic accuracy and enhancing the overall collection. The workflow includes the following key components: - Taxonomy Verification and Update: Checks and updates taxonomic information in the MongoDB database, ensuring accuracy before importing new floral samples. - Georeferenced Sample Import: Imports floral samples from the Excel file, containing georeferenced information and additional sample details. - JSON Transformation and Database Upload: Transforms the floral sample information from the Excel file into JSON format and uploads it to the appropriate sections of the MongoDB database. Scientific Questions - Taxonomy Verification Process: How effectively does the workflow verify and update taxonomic information before importing new floral samples? - Georeferenced Sample Storage: How does the workflow handle the storage of georeferenced floral samples, considering collection location and species nativity? - JSON Transformation Accuracy: How successful is the transformation of floral sample information from the Excel file into JSON format for MongoDB integration? - Database Enrichment: How does the workflow contribute to enriching the taxonomic and sample collections in the MongoDB database, and how is this reflected in the overall project dataset?
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
explore.openaire.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
f
Additional file 1: of A comprehensive and scalable database search system...
springernature.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandip Chatterjee; Gregory Stupp; Sung Park; Jean-Christophe Ducom; John Yates; Andrew Su; Dennis Wolan (2023). Additional file 1: of A comprehensive and scalable database search system for metaproteomics [Dataset]. http://doi.org/10.6084/m9.figshare.c.3640220_D1.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3640220_D1.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Sandip Chatterjee; Gregory Stupp; Sung Park; Jean-Christophe Ducom; John Yates; Andrew Su; Dennis Wolan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains supplementary figures, methods, and four supplementary tables: Table S1. Data sources used for generation of ComPIL database. Table S2. Adenovirus 5 proteins identified by a ComPIL search of a human HEK293 sample. Table S3. List of proteomes used for generation of the â€œ46 proteomesâ€ database. Table S4. Statistics summary of 3 technical replicates of 5 human fecal samples. (ZIP 1371Â kb)
c
Data Base Management Systems market size was USD 50.5 billion in 2022 !
cognitivemarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research, Data Base Management Systems market size was USD 50.5 billion in 2022 ! [Dataset]. https://www.cognitivemarketresearch.com/data-base-management-systems-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
The global Data Base Management Systems market was valued at USD 50.5 billion in 2022 and is projected to reach USD 120.6 Billion by 2030, registering a CAGR of 11.5 % for the forecast period 2023-2030. Factors Affecting Data Base Management Systems Market Growth

Growing inclination of organizations towards adoption of advanced technologies like cloud-based technology favours the growth of global DBMS market

The cloud-based data base management system solutions offer the organizations with an ability to scale their database infrastructure up or down as per requirement. In a crucial business environment data volume can vary over time. Here, the cloud allows organizations to allocate resources in a dynamic and systematic manner, thereby, ensuring optimal performance without underutilization. In addition, these cloud-based solutions are cost-efficient. As, these cloud-based DBMS solutions eliminate the need for companies to maintain and invest in physical infrastructure and hardware. It helps in reducing ongoing operational costs and upfront capital expenditures. Organizations can choose pay-as-you-go pricing models, where they need to pay only for the resources they consume. Therefore, it has been a cost-efficient option for both smaller businesses and large-enterprises. Moreover, the cloud-based data base management system platforms usually come with management tools which streamline administrative tasks such as backup, provisioning, recovery, and monitoring. It allows IT teams to concentrate on more of strategic tasks rather than routine maintenance activities, thereby, enhancing operational efficiency. Whereas, these cloud-based data base management systems allow users to remote access and collaboration among teams, irrespective of their physical locations. Thus, in regards with today's work environment, which focuses on distributed and remote workforces. These cloud-based DBMS solution enables to access data and update in real-time through authorized personnel, allowing collaboration and better decision-making. Thus, owing to all the above factors, the rising adoption of advanced technologies like cloud-based DBMS is favouring the market growth.

Availability of open-source solutions is likely to restrain the global data base management systems market growth

Open-source data base management system solutions such as PostgreSQL, MongoDB, and MySQL, offer strong functionality at minimal or no licensing costs. It makes open-source solutions an attractive option for companies, especially start-ups or smaller businesses with limited budgets. As these open-source solutions offer similar capabilities to various commercial DBMS offerings, various organizations may opt for this solutions in order to save costs. The open-source solutions may benefit from active developer communities which contribute to their development, enhancement, and maintenance. This type of collaborative environment supports continuous innovation and improvement, which results into solutions that are slightly competitive with commercial offerings in terms of performance and features. Thus, the open-source solutions create competition for commercial DBMS market, they thrive in the market by offering unique value propositions, addressing needs of organizations which prioritize professional support, seamless integration into complex IT ecosystems, and advanced features. Introduction of Data Base Management Systems

A Database Management System (DBMS) is a software which is specifically designed to organize and manage data in a structured manner. This system allows users to create, modify, and query a database, and also manage the security and access controls for that particular database. The DBMS offers tools for creating and modifying data models, that define the structure and relationships of data in a database. This system is also responsible for storing and retrieving data from the database, and also provide several methods for searching and querying the data. The data base management system also offers mechanisms to control concurrent access to the database, in order to ensure that number of users may access the data. The DBMS provides tools to enforce security constraints and data integrity, such as the constraints on the value of data and access controls that restricts who can access the data. The data base management system also provides mechanisms for recovering and backing up the data when a system failure occurs....
Non Relational Databases Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Non Relational Databases Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/non-relational-databases-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Non Relational Databases Market Outlook

The global market size for non-relational databases is expected to grow from USD 10.5 billion in 2023 to USD 35.2 billion by 2032, registering a Compound Annual Growth Rate (CAGR) of 14.6% over the forecast period. This substantial growth is primarily driven by increasing demand for scalable, flexible database solutions capable of handling diverse data types and large volumes of data generated across various industries.

One of the significant growth factors for the non-relational databases market is the exponential increase in data generated globally. With the proliferation of Internet of Things (IoT) devices, social media platforms, and digital transactions, the volume of semi-structured and unstructured data is growing at an unprecedented rate. Traditional relational databases often fall short in efficiently managing such data types, making non-relational databases a preferred choice. For example, document-oriented databases like MongoDB allow for the storage of JSON-like documents, offering flexibility in data modeling and retrieval.

Another key driver is the increasing adoption of non-relational databases among enterprises seeking agile and scalable database solutions. The need for high-performance applications that can scale horizontally and handle large volumes of transactions is pushing businesses to shift from traditional relational databases to non-relational databases. This is particularly evident in sectors like e-commerce, where the ability to manage customer data, product catalogs, and transaction histories in real-time is crucial. Additionally, companies in the BFSI (Banking, Financial Services, and Insurance) sector are leveraging non-relational databases for fraud detection, risk management, and customer relationship management.

The advent of cloud computing and the growing trend of digital transformation are also significant contributors to the market growth. Cloud-based non-relational databases offer numerous advantages, including reduced infrastructure costs, scalability, and ease of access. As more organizations migrate their operations to the cloud, the demand for cloud-based non-relational databases is set to rise. Moreover, the availability of Database-as-a-Service (DBaaS) offerings from major cloud providers like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) is simplifying the deployment and management of these databases, further driving their adoption.

Regionally, North America holds the largest market share, driven by the early adoption of advanced technologies and the presence of major market players. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. The rapid digitalization, growing adoption of cloud services, and increasing investments in IT infrastructure in countries like China and India are propelling the demand for non-relational databases in the region. Additionally, the expanding e-commerce sector and the proliferation of smart devices are further boosting market growth in Asia Pacific.

Type Analysis

The non-relational databases market is segmented into several types, including Document-Oriented Databases, Key-Value Stores, Column-Family Stores, Graph Databases, and Others. Each type offers unique functionalities and caters to specific use cases, making them suitable for different industry requirements. Document-Oriented Databases, such as MongoDB and CouchDB, store data in document format (e.g., JSON or BSON), allowing for flexible schema designs and efficient data retrieval. These databases are widely used in content management systems, e-commerce platforms, and real-time analytics applications due to their ability to handle semi-structured data.

Key-Value Stores, such as Redis and Amazon DynamoDB, store data as key-value pairs, providing extremely fast read and write operations. These databases are ideal for caching, session management, and real-time applications where speed is critical. They offer horizontal scalability and are highly efficient in managing large volumes of data with simple query requirements. The simplicity of the key-value data model and its performance benefits make it a popular choice for high-throughput applications.

Column-Family Stores, such as Apache Cassandra and HBase, store data in columns rather than rows, allowing for efficient storage and retrieval of large datasets. These databases are designed to handle massive amounts of data across distributed systems, making them suitable for use cases involving big data analytics, time-seri
The Public Jira Dataset
zenodo.org
Updated May 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lloyd Montgomery; Lloyd Montgomery; Clara Lüders; Prof. Dr. Walid Maalej; Clara Lüders; Prof. Dr. Walid Maalej (2025). The Public Jira Dataset [Dataset]. http://doi.org/10.5281/zenodo.5901804
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5901804
Dataset updated
May 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lloyd Montgomery; Lloyd Montgomery; Clara Lüders; Prof. Dr. Walid Maalej; Clara Lüders; Prof. Dr. Walid Maalej
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Jira is an issue tracking system that supports software companies (among other types of companies) with managing their projects, community, and processes. This dataset is a collection of public Jira repositories downloaded from the internet using the Jira API V2. We collected data from 16 pubic Jira repositories containing 1822 projects and 2.7 million issues. Included in this data are historical records of 32 million changes, 9 million comments, and 1 million issue links that connect the issues in complex ways. This artefact repository contains the data as a MongoDB dump, the scripts used to download the data, the scripts used to interpret the data, and qualitative work conducted to make the data more approachable.
Screenshots and metadata for 214 reCAPTCHA challenges encountered between...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Pettis (2024). Screenshots and metadata for 214 reCAPTCHA challenges encountered between September 2022 - September 2023 [Dataset]. http://doi.org/10.5061/dryad.h70rxwdsr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.h70rxwdsr
Dataset updated
Jun 19, 2024
Dataset provided by
University of Wisconsin–Madison
Authors
Ben Pettis
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
In Chapter 3 of my dissertation (tentatively titled " Becoming Users:Layers of People, Technology, and Power on the Internet. "), I describe how online user activities are datafied and monetized in subtle and often obfuscated ways. The chapter focuses on Google’s reCAPTCHA, a popular implementation of a CAPTCHA challenge. A CAPTCHA, or “Completely Automated Turning test to tell Computers and Humans Apart” is a simple task or challenge which is intended to differentiate between genuine human users and those who may be using software or other automated means to interact maliciously with a website, such as for spam, mass data scraping, or denial of service attacks. reCAPTCHA challenges are increasingly being hidden from direct view of the user, and instead assessing our mouse movements, browsing patterns, and other data to evaluate the likelihood that we are “authentic” users. These hidden challenges raise the stakes of understanding our own construction as Users because they obfuscate practices of surveillance and the ways that our activities as users are commodified by large corporations (Pettis, 2023). By studying the specifics of how such data collection works—that is, how we’re called upon and situated as Users—we can make more informed decisions about how we engage with the contemporary internet. This data set contains metadata for the 214 reCAPTCHA elements that I encountered during my personal use of the Web for the period of one year (September 2022 through September 2023). Of these reCAPTCHAs, 137 were visible challenges—meaning that there was some indication of the presence of a reCAPTCHA challenge. The remaining 77 reCAPTCHAs were entirely hidden on the page. If I had not been running my browser extension, I would likely never have been aware of the use of a reCAPTCHA on the page. The data set also includes screenshots for 174 of the reCAPTCHAs. Screenshots that contain sensitive or private information have been excluded from public access. Researchers can request access to these additional files by contacting Ben Pettis bpettis@wisc.edu. A browsable and searchable version of the data is also available at https://capturingcaptcha.com Methods I developed a custom Google Chrome extension which detects when a page contains a reCAPTCHA and prompts the user to save a screenshot or screen recording while also collecting basic metadata. During Summer 2022, I began work on this website to collate and present the screen captures that I save throughout the year. The purpose of collecting these examples of websites where reCAPTCHAs appear is to understand how this Web element is situated within websites and presented to users, along with sketching out the frequency of their use and on what kinds of websites. Given that I will only be collecting records of my own interactions with reCAPTCHAs, this will not be a comprehensive sample that I can generalize as representative of all Web users. Though my experiences of the reCAPTCHA will differ from those of any other person, this collection will nevertheless be useful for demonstrating how the interface element may be embedded within websites and presented to users. Following Niels Brügger’s descriptions of Web history methods, these screen capture techniques provide an effective way to preserve a portion of the Web as it was actually encountered by a person, as opposed to methods such as automated scraping. Therefore my dissertation offers a methodological contribution to Web historians by demonstrating a technique for identifying and preserving a representation of one Web element within a page, as opposed to focusing an analysis on a whole page or entire website. The browser extension is configured to store data in a cloud-based document database running in MongoDB Atlas. Any screenshots or video recordings are uploaded to a Google Cloud Storage bucket. Both the database and cloud storage bucket are private and are restricted from direct access. The data and screenshots are viewable and searchable at https://capturingcaptcha.com. This data set represents an export of the database as of June 10, 2024. After this date, it is possible that data collection will be resumed, causing more information to be displayed in the online website. The data was exported from the database to a single JSON file (lines format) using the mongoexport command line tool: mongoexport --uri mongodb+srv://[database-url].mongodb.net/production --collection submissions --out captcha-out.json --username [databaseuser]
Data from: What really changes when developers intend to improve their...
zenodo.org
application/gzip, bin +1
Updated Sep 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Trautsch; Alexander Trautsch; Johannes Erbel; Steffen Herbold; Steffen Herbold; Jens Grabowski; Johannes Erbel; Jens Grabowski (2022). What really changes when developers intend to improve their source code: A commit-level study of static metric value and static analysis warning changes [Dataset]. http://doi.org/10.5281/zenodo.7078179
Explore at:
bin, csv, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7078179
Dataset updated
Sep 14, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander Trautsch; Alexander Trautsch; Johannes Erbel; Steffen Herbold; Steffen Herbold; Jens Grabowski; Johannes Erbel; Jens Grabowski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset for the publication "What really changes when developers intend to improve their source code: A commit-level study of static metric value and static analysis warning changes".

It contains a random sample of 2533 commits from 54 Java Apache open source projects classified by two researchers into perfective, corrective and other changes (manual_labels.csv). Moreover, we include static source code metrics and static analysis warnings for the 2533 changes in al_changes_gt.csv.gz.

In addition, we include the full dataset of 125482 commits in all_changes_sebert.csv.gz with all metrics and automatic labels for every commit that was not manually labeled. The automatic labels were provided by a fine-tuned transformer model (BERT) pre-trained exclusively on software engineering data.

We also provide the fine tuned version of the pre-trained model in seBERT_fine_tuned_commit_intent.tar.gz as well as a Snapshot of the SmartSHARK MongoDB database used in gathering the raw data in smartshark_emse.agz.

The model can be tested live on the website accompanying the publication.
NoSQL Software Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). NoSQL Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-nosql-software-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
NoSQL Software Market Outlook

The global NoSQL software market size was valued at approximately USD 6 billion in 2023 and is projected to reach around USD 20 billion by 2032, growing at a compound annual growth rate (CAGR) of 14% during the forecast period. This market is driven by the escalating need for operational efficiency, flexibility, and scalability in database management systems, particularly in enterprises dealing with vast amounts of unstructured data.

One of the primary growth factors propelling the NoSQL software market is the exponential increase in data volumes generated by various digital platforms, IoT devices, and social media. Traditional relational databases often struggle to handle this surge efficiently, prompting organizations to shift towards NoSQL databases that offer more flexibility and scalability. The ability to store and process large sets of unstructured data without needing a predefined schema makes NoSQL databases an attractive choice for modern businesses seeking agility and speed in data management.

Moreover, the proliferation of cloud computing services has significantly contributed to the growth of the NoSQL software market. Cloud-based NoSQL databases provide cost-effective, scalable, and easily accessible solutions for enterprises of all sizes. The pay-as-you-go pricing model and the capacity to scale resources based on demand have made NoSQL databases a preferred option for startups and large enterprises alike. The seamless integration of NoSQL databases with cloud infrastructure enhances operational efficiencies and reduces the complexities associated with database management.

Another critical driver is the increasing adoption of NoSQL databases in various industry verticals such as retail, BFSI, IT, and healthcare. These industries require robust data management solutions to handle large volumes of diverse data types. NoSQL databases, with their flexible data models and high performance, cater to these requirements efficiently. In the retail sector, for example, NoSQL databases are used to manage customer data, product catalogs, and transaction histories, enabling more personalized and efficient customer services.

Regionally, North America holds a significant share of the NoSQL software market due to the presence of major technology companies and a mature IT infrastructure. The rapid digital transformation across enterprises in the region, alongside substantial investments in big data analytics and cloud computing, further fuels market growth. Additionally, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, driven by the expanding IT sector, increased adoption of cloud services, and significant investments in digital technologies in countries like China and India.

Graph Databases Software has emerged as a crucial component in the landscape of NoSQL databases, particularly for applications that require understanding complex relationships between data entities. Unlike traditional databases that store data in tables, graph databases use nodes, edges, and properties to represent and store data, making them ideal for scenarios where relationships are as important as the data itself. This approach is particularly beneficial in fields such as social networking, where the ability to analyze connections between users can provide deep insights into social dynamics and influence patterns. As businesses increasingly seek to leverage data for competitive advantage, the demand for graph databases is expected to grow, driven by their ability to efficiently model and query interconnected data.

Type Analysis

The NoSQL software market is segmented into various types, including Document-Oriented, Key-Value Store, Column-Oriented, and Graph-Based databases. Document-oriented databases, such as MongoDB, store data in JSON-like documents, offering flexibility in data modeling and ease of use. These databases are widely used for content management systems, e-commerce applications, and real-time analytics. Their ability to handle semi-structured data and scalability features make them a popular choice among developers and enterprises seeking agile database solutions.

Key-Value Store databases, such as Redis and Amazon DynamoDB, store data as a collection of key-value pairs, providing ultra-fast read and write operations. These databases are ideal for applications requiring high-speed data retrieval, such as caching, session manag
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Connor Coley (2023). MongoDB dump (compressed) [Dataset]. http://doi.org/10.6084/m9.figshare.4833482.v1

MongoDB dump (compressed)

Explore at:

7zAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.4833482.v1

Dataset updated

Jun 1, 2023

Dataset provided by

figshare

Authors

Connor Coley

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This mongodump contains four collections associated with http://dx.doi.org/10.1021/acscentsci.7b00064 :reaction_examples/lowe_1976-2013_USPTOgrants - a collection of reaction SMILES extracted from USPTO grants by Daniel Lowereaction_examples/lowe_1976-2013_USPTOgrants_reactions - an incomplete collection of reactions extracted from USPTO grants by Daniel Lowe, containing some additional information about reagents/catalysts/solvents where knownaskcos_transforms/lowe_refs_general_v3 - a collection of highly-general reaction SMARTS strings extracted from the USPTO smilesprediction/candidate_edits_8_9_16 - a collection of reaction examples with possible products enumerated, used as input for a machine learning model

Clear search

Close search

Google apps

Main menu

MongoDB dump (compressed)

embedded_movies

Transient Host Exchange

Flora Samples Import and Taxonomy Update to Database

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Additional file 1: of A comprehensive and scalable database search system...

Data Base Management Systems market size was USD 50.5 billion in 2022 !

Non Relational Databases Market Report | Global Forecast From 2025 To 2033

Non Relational Databases Market Outlook

Type Analysis

The Public Jira Dataset

Screenshots and metadata for 214 reCAPTCHA challenges encountered between...

Data from: What really changes when developers intend to improve their...

NoSQL Software Market Report | Global Forecast From 2025 To 2033

NoSQL Software Market Outlook

Type Analysis

MongoDB dump (compressed)