Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the age of digital transformation, scientific and social interest for data and data products is constantly on the rise. The quantity as well as the variety of digital research data is increasing significantly. This raises the question about the governance of this data. For example, how to store the data so that it is presented transparently, freely accessible and subsequently available for re-use in the context of good scientific practice. Research data repositories provide solutions to these issues.
Considering the variety of repository software, it is sometimes difficult to identify a fitting solution for a specific use case. For this purpose a detailed analysis of existing software is needed. Presented table of requirements can serve as a starting point and decision-making guide for choosing the most suitable for your purposes repository software. This table is dealing as a supplementary material for the paper "How to choose a research data repository software? Experience report." (persistent identifier to the paper will be added as soon as paper is published).
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
In 2023, the global data analytics software tools market size was estimated to be around USD 45 billion and is projected to reach approximately USD 120 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.5%. The growth of this market is driven by the increasing volume of data generated across various industries and the growing need for data-driven decision-making processes.
The expansion of data generation from diverse sources such as social media, IoT devices, and enterprise applications has fueled the need for advanced data analytics tools. Organizations are increasingly leveraging these tools to extract valuable insights from raw data, thus driving market growth. Additionally, advancements in machine learning and artificial intelligence are enhancing the capabilities of data analytics software, making them indispensable for businesses aiming to maintain a competitive edge. The integration of these advanced technologies enables more accurate predictive analytics and more efficient data management, further propelling market demand.
The rising adoption of cloud-based analytics solutions is another significant growth factor. Cloud computing provides scalable resources and flexibility, which are crucial for handling large datasets and performing complex analyses. This trend is particularly prevalent among small and medium-sized enterprises (SMEs) that benefit from the reduced upfront costs and operational efficiencies offered by cloud-based solutions. Moreover, the demand for real-time analytics is pushing organizations to adopt cloud services, as they offer the agility required to process data in real-time.
Another major driver is the increasing regulatory requirements across various industries. Compliance with data protection laws such as GDPR in Europe and CCPA in California necessitates robust data management and analytics capabilities. Organizations are investing in data analytics tools to ensure compliance and mitigate risks associated with data breaches. This compliance-driven need for advanced analytics capabilities is expected to significantly contribute to market growth over the forecast period.
From a regional outlook, North America is expected to dominate the market due to its early adoption of advanced technologies and the presence of major market players. Europe is also anticipated to show significant growth due to stringent data protection regulations and the rising adoption of analytics in various industries. The Asia Pacific region is projected to experience the highest CAGR during the forecast period, driven by the increasing digital transformation initiatives and the growing focus on big data analytics in countries like China and India.
The data analytics software tools market is segmented into software and services. The software segment includes various types of analytics tools such as data mining tools, predictive analytics tools, and business intelligence software. This segment is expected to hold the largest market share due to the increasing need for sophisticated data analysis techniques and the integration of artificial intelligence and machine learning technologies. The growing complexity of data sets and the demand for real-time analysis are also driving the adoption of advanced software solutions.
Within the software segment, business intelligence tools are gaining significant traction. These tools help organizations in strategic planning by providing actionable insights derived from data visualization and reporting. The widespread adoption of business intelligence tools can be attributed to their ability to enhance decision-making processes and improve operational efficiencies. Furthermore, the integration of AI and machine learning in business intelligence tools offers advanced analytics capabilities, making them vital for businesses looking to stay competitive.
The services segment comprises implementation services, consulting services, and support and maintenance services. The demand for these services is driven by the complexity involved in deploying and managing data analytics tools. Consulting services are increasingly sought after as organizations require expert advice to choose the right analytics solutions and integrate them seamlessly with their existing systems. Implementation services ensure that the analytics tools are correctly set up and configured to meet the organization's specific needs, while support and maintenance services provide ongoing assistance to resolve any issues that may arise.&l
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Do you want to uncover the power of language through analysis? The Lince Dataset is the answer! An expansive collection of language technologies and data, this dataset can be utilized for a multitude of purposes. With six different languages to explore - Spanish, Hindi, Nepali, Spanish-English, Hindi-English as well as Spanish Multi-Source-English (MSAEA) - you are granted access to an enormous selection of language identification (LID), part-of-speech (POS) tagging, Named-Entity Recognition (NER), sentiment analysis (SA) and much more. Train your models efficiently with the help of ML in order to automatically detect and classify tasks such as POS or NER from each variation. Or even build cross linguistic models between multiple languages if preferred! Push the boundaries with Lince Dataset's unparalleled diversity. Dive into exploratory research within this feast for NLP connoisseurs and unlock hidden opportunities today!
More Datasets For more datasets, click here.
Featured Notebooks π¨ Your notebook can be here! π¨! How to use the dataset Are you looking to unlock the potential of multilingual natural language processing (NLP) with the Lince Dataset? If so, youβre in the right place! With six languages and training data for language identification (LID), part-of-speech (POS) tagging, Named-Entity Recognition (NER), sentiment analysis (SA) and more, this is one of the most comprehensive datasets for NLP today.
Understand what is included in this dataset This dataset includes language technology data from six different languages. These include Spanish, Hindi, Nepali, Spanish-English, Hindi-English and Spanish MultiSourceEnglish (MSAEA). Each file is labelled according to its content - e.g. lid_msaea_test.csv which contains test data for language identificaiton (LID) with 5 columns containing words, part of speech tags as well as sentiment analysis labels. A brief summary of each file's contents can be found when you pull this dataset up on Kaggle or when running a script such as βhead()β or βdescribe()β depending on your software preferences
Decide What Kind Of Analysis You Want To Do Once you are familiar with what type of data is provided it will be necessary to decide which kind of model or analysis you want to do before diving into coding any algorithms relevant for that task . For example if one wants to build a cross lingual model for POS tagging then it would be ideal to have training and validation sets from 3 different languages so that one can take advantage multi domain knowledge interchange between them during training phase hence selecting files such as pos_spaeng _train , pos_hineng _validation will come into play . While designing your model architecture make sure that task specific hyper parameters should complement each other while taking decisions , also choosing an appropriate feature vector representation strategy helps in improved performance
Run Appropriate Algorithms On The Data Provided In The Dataset Now upon understanding all elements presented in front we can start running appropriate algorithms irespective respectively of tools used while tuning our models using metrics like accuracy , f1 score etc . Once tuned ensure that our system works reliably by testing on unseen test set and ensuring desired results . During optimization various hyper parameter tuning has makes significant role depending upon algorithm chosen irespective respective ly
Research Ideas Developing a multilingual sentiment analysis system that can analyze sentiment in any of the six languages. Training a model to identify and classify named entities across multiple languages, such as identifying certain words for proper nouns or locations regardless of language or coding scheme. Developing an AI-powered cross-lingual translator that is able to effectively translate text from one language to another with minimal errors and maximum accuracy
CC0
Original Data Source: LinCE (Linguistic Code-switching Evaluation)
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for Streaming Data Processing System Software was valued at approximately USD 9.5 billion in 2023 and is projected to reach around USD 23.8 billion by 2032, reflecting a compound annual growth rate (CAGR) of 10.8% over the forecast period. The surge in the need for real-time data processing capabilities, driven by the exponential growth of data from various sources such as social media, IoT devices, and enterprise data systems, is a significant growth factor for this market.
One of the primary growth drivers in this market is the increasing demand for real-time analytics across various industries. In a world where immediate decision-making can determine the success or failure of a business, organizations are increasingly turning to streaming data processing systems to gain instant insights from their data. This need for real-time information is particularly pronounced in sectors like finance, healthcare, and retail, where timely data can prevent fraud, improve patient outcomes, and optimize supply chains, respectively. Additionally, the proliferation of IoT devices generating massive amounts of data continuously requires robust systems for real-time data ingestion, processing, and analytics.
Another major factor contributing to the market's growth is technological advancements and innovations in big data and artificial intelligence. With improvements in machine learning algorithms, data mining, and in-memory computing, modern streaming data processing systems are becoming more efficient, scalable, and versatile. These advancements enable businesses to handle larger data volumes and more complex processing tasks, further driving the adoption of these systems. Moreover, open-source platforms and frameworks like Apache Kafka, Apache Flink, and Apache Storm are continually evolving, lowering the entry barriers for organizations looking to implement advanced streaming data solutions.
The increasing adoption of cloud-based solutions is also a significant growth factor for the streaming data processing system software market. Cloud platforms offer scalable, flexible, and cost-effective solutions for businesses, enabling them to handle variable workloads more efficiently. The shift to cloud-based systems is especially beneficial for small and medium enterprises (SMEs) that may lack the resources to invest in extensive on-premises infrastructure. Cloud service providers are also enhancing their offerings with integrated streaming data processing capabilities, making it easier for organizations to deploy and manage these systems.
Regionally, North America holds the largest market share for streaming data processing system software, driven by strong technological infrastructure, high cloud adoption rates, and significant investments in big data and AI technologies. The Asia Pacific region is also expected to witness substantial growth during the forecast period, primarily due to the rapid digital transformation initiatives, growing internet and smartphone penetration, and increasing adoption of IoT technologies across various industries. Europe, Latin America, and the Middle East & Africa are also contributing to the market growth, albeit at differing rates, each driven by region-specific factors and technological advancements.
The Streaming Data Processing System Software market is segmented by component into software and services. The software segment holds the lionβs share of the market, driven by the increasing need for sophisticated tools that facilitate real-time data analytics and processing. These software solutions are designed to handle the complexities of streaming data, providing functionalities like data ingestion, real-time analytics, data integration, and visualization. The continuous evolution of software capabilities, enhanced by artificial intelligence and machine learning, is significantly contributing to market growth. Furthermore, the availability of various open-source tools and platforms has democratized access to advanced streaming data processing solutions, fostering innovation and adoption across different industry verticals.
The services segment, while smaller in comparison to software, plays a critical role in the overall ecosystem. Services include consulting, integration, maintenance, and support, which are essential for the successful implementation and operation of streaming data processing systems. Organizations often require expert guidance to navigate the complexities of deploying these systems, ensuring they are optimally configure
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analysis tools help companies draw insights from customer data, and uncover trends and patterns to make better business decisions. There are a wide number of online data analysis tools that can make be use, whether to perform basic or more advanced data analysis. Because of the development of no-code machine learning software, advanced data analysis is now easier than ever, allowing businesses to reap the benefits from huge amounts of unstructured data.
This paper aims at pointing out the meaning of data analysis and it's benefits, type of data analysis, available data analysis tools and how to choose them.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Billy Bob Brumley (Tampere University, Tampere, Finland)
Juha Nurmi (Tampere University, Tampere, Finland)
Mikko NiemelΓ€ (Cyber Intelligence House, Singapore)
Funding
This project has received funding from the European Research Council (ERC) under the European Unionβs Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
Do you want to uncover the power of language through analysis? The Lince Dataset is the answer! An expansive collection of language technologies and data, this dataset can be utilized for a multitude of purposes. With six different languages to explore - Spanish, Hindi, Nepali, Spanish-English, Hindi-English as well as Spanish Multi-Source-English (MSAEA) - you are granted access to an enormous selection of language identification (LID), part-of-speech (POS) tagging, Named-Entity Recognition (NER), sentiment analysis (SA) and much more. Train your models efficiently with the help of ML in order to automatically detect and classify tasks such as POS or NER from each variation. Or even build cross linguistic models between multiple languages if preferred! Push the boundaries with Lince Dataset's unparalleled diversity. Dive into exploratory research within this feast for NLP connoisseurs and unlock hidden opportunities today!
For more datasets, click here.
- π¨ Your notebook can be here! π¨!
Are you looking to unlock the potential of multilingual natural language processing (NLP) with the Lince Dataset? If so, youβre in the right place! With six languages and training data for language identification (LID), part-of-speech (POS) tagging, Named-Entity Recognition (NER), sentiment analysis (SA) and more, this is one of the most comprehensive datasets for NLP today.
Understand what is included in this dataset This dataset includes language technology data from six different languages. These include Spanish, Hindi, Nepali, Spanish-English, Hindi-English and Spanish Multi**Source**English (MSAEA). Each file is labelled according to its content - e.g. lid_msaea_test.csv which contains test data for language identificaiton (LID) with 5 columns containing words, part of speech tags as well as sentiment analysis labels. A brief summary of each file's contents can be found when you pull this dataset up on Kaggle or when running a script such as βhead()β or βdescribe()β depending on your software preferences
Decide What Kind Of Analysis You Want To Do Once you are familiar with what type of data is provided it will be necessary to decide which kind of model or analysis you want to do before diving into coding any algorithms relevant for that task . For example if one wants to build a cross lingual model for POS tagging then it would be ideal to have training and validation sets from 3 different languages so that one can take advantage multi domain knowledge interchange between them during training phase hence selecting files such as pos_spaeng _train , pos_hineng _validation will come into play . While designing your model architecture make sure that task specific hyper parameters should complement each other while taking decisions , also choosing an appropriate feature vector representation strategy helps in improved performance
Run Appropriate Algorithms On The Data Provided In The Dataset Now upon understanding all elements presented in front we can start running appropriate algorithms irespective respectively of tools used while tuning our models using metrics like accuracy , f1 score etc . Once tuned ensure that our system works reliably by testing on unseen test set and ensuring desired results . During optimization various hyper parameter tuning has makes significant role depending upon algorithm chosen irespective respective ly
- Developing a multilingual sentiment analysis system that can analyze sentiment in any of the six languages.
- Training a model to identify and classify named entities across multiple languages, such as identifying certain words for proper nouns or locations regardless of language or coding scheme.
- Developing an AI-powered cross-lingual translator that is able to effectively translate text from one language to another with minimal errors and maximum accuracy
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: lid_msaea_test.csv...
An industrial questionnaire survey where a total of 33 practitioners, of varying roles, from 18 companies are tasked to compare two decision models for asset selection.
The objective of the study was to evaluate what characteristics of decision models for asset selection that determine industrial practitioner preference of a model when given the choice of a decision-model of high precision or a model with high speed.
The dataset was originally published in DiVA and moved to SND in 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the softwareβs capabilities to achieve the most accurate and credible results.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The size of the Data Privacy Software Market was valued at USD 2.76 USD Billion in 2023 and is projected to reach USD 25.26 USD Billion by 2032, with an expected CAGR of 37.2% during the forecast period significant growth driven by the increasing demand for compliance with privacy regulations, the growing awareness of data privacy rights, and the rampant adoption of cloud computing. The demand for comprehensive data privacy software solutions is further propelled by the rising instances of data breaches, the need to protect sensitive customer and employee data, and the escalating use of data analytics. Recent developments include: November 2023 β Protiviti India entered a partnership with Riskconnect to help companies in India bring all aspects of risk under one roof through an integrated risk management technology., July 2023 β Trust Arc introduced a new Truste EU-U.S. data privacy framework verification to help businesses transfer personal data from the EU to the U.S. in compliance with the EU and GDPR laws., April 2023 β Avepoint and Tech Data expanded their partnership for providing Microsoft 365 data management solutions in Japan and Asia Pacific. The extended partnership will cover Indonesia, India, Vietnam, Malaysia, Singapore, and Hong Kong., January 2023 - Sourcepoint launched a solution, Vendor Trace, to offer enterprises with a flexible evaluation of vendor behavior on their websites. With the help of Vendor Trace, users can isolate susceptibilities in third-party advertising and marketing technologies and determine the responsible parties., September 2022 - BigID launched data deletion abilities to minimize risk and accelerate compliance. The new advancement permits enterprises to effectively and quickly delete sensitive and personal data across various data stores such as Google Drive, AWS, Teradata, and others., October 2022 - Securiti launched the first Data Control cloud that facilitates enterprises with key obligations over data privacy, security, compliance, and governance. The new offerings developed a combined layer of data intelligence and controls across various clouds, such as public cloud, private cloud, data clouds, and SaaS. , March 2022 - AvePoint announced the addition of ransomware detection to its data protection proficiencies. The new addition proactively identifies apprehensive behavior within Microsoftβs OneDrive while reducing disruption to collaboration and productivity. Other features included in ransomware detection are faster investigation, early event detection, and quicker restoration of backup data.. Key drivers for this market are: Rising Adoption of IoT Devices to Aid Global Data Privacy Software Market Growth. Potential restraints include: Low Awareness and Insufficient Knowledge About Software Impede Industry Growth. Notable trends are: Integration of AI and ML to Surge Demand for Data Privacy Solutions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the softwareβs capabilities to achieve the most accurate and credible results.
According to a survey on software purchases conducted in Australia in 2022, security for end-user and data was the most important factor when choosing a new software vendor, with 35 percent of respondents voting for it. The second popular factor was ease of use.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Management Decision market, valued at $6.55 billion in 2025, is projected to experience robust growth, driven by the increasing need for data-driven decision-making across diverse industries. A Compound Annual Growth Rate (CAGR) of 13.64% from 2025 to 2033 indicates a significant expansion of this market. Key drivers include the rising adoption of advanced analytics, business intelligence tools, and artificial intelligence (AI) for optimizing operations, improving efficiency, and gaining a competitive edge. The increasing volume and complexity of data, coupled with the pressure to make faster, more informed decisions, fuels the demand for sophisticated management decision support systems. Growth is further fueled by the transition to cloud-based solutions offering scalability, cost-effectiveness, and enhanced accessibility. While the market faces restraints such as the high initial investment costs associated with implementing these systems and the need for skilled professionals to manage them, the overall positive outlook remains strong, driven by technological advancements and the persistent demand for improved decision-making capabilities across sectors. The market is segmented by component (software and services), deployment type (on-premises and cloud), and end-user industry (BFSI, IT and Telecom, Healthcare, Retail, Manufacturing, and Others). The cloud segment is expected to dominate due to its flexibility and cost-effectiveness. The BFSI sector is a major contributor, owing to the significant reliance on data-driven decisions for risk management, fraud detection, and customer relationship management. However, the healthcare, retail, and manufacturing sectors are also showing rapid growth in adoption, indicating broad applicability and diverse market penetration. The competitive landscape includes established players like IBM, Oracle, and SAS, alongside innovative technology providers. Geographically, North America currently holds a significant market share, followed by Europe and Asia Pacific, with emerging markets in Latin America and the Middle East and Africa exhibiting promising growth potential. This dynamic market is poised for continued expansion, driven by ongoing technological innovation and the ever-increasing importance of evidence-based decision-making in today's business environment. Recent developments include: November 2022 - IBM introduced Business Analytics Enterprise, a more advanced version of the program allowing companies to acquire a thorough perspective of the data sources across their entire business. The program will assist in business intelligence planning, budgeting, reporting, forecasting, and dashboard capabilities., January 2022 - The cloud platform LambdaTest, introduced Test Analytics, a solution to enable better decision-making. With the help of highly customized dashboards provided by LambdaTest Test Analytics, DevOps teams can monitor the status and effectiveness of testing across numerous LambdaTest product lines in a single view, enabling businesses to make better decisions., The Pennsylvania ed-tech company Frontline Education launched HR Capital Analytics Tool. The program will examine absenteeism trends and patterns, comprehend information about teacher candidates and open positions, determine staffing needs by position, quickly fill openings for substitutes, plan recruitment, and hiring strategies, and share professional development opportunities. The tool will increase administrators' ability to expedite decision-making and strategically plan.. Key drivers for this market are: Increasing need for business agility which requires faster and efficient decision making, Increasing demand for Decision Analytics in BFSI sector to drive the market. Potential restraints include: Increasing need for business agility which requires faster and efficient decision making, Increasing demand for Decision Analytics in BFSI sector to drive the market. Notable trends are: BSFI Sector is Expected to Hold Significant Share.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the softwareβs capabilities to achieve the most accurate and credible results.
This dataset contains raw data and processed data from the Dataverse Community Survey 2022. The main goal of the survey was to help the Global Dataverse Community Consortium (GDCC; https://dataversecommunity.global/) and the Dataverse Project (https://dataverse.org/) decide on what actions to take to improve the Dataverse software and the larger ecosystem of integrated tools and services as well as better support community members. The results from the survey may also be of interest to other communities working on software and services for managing research data. The survey was designed to map out the current status as well as the roadmaps and priorities of Dataverse installations around the world. The main target group for participating in the survey were the people/teams responsible for operating Dataverse installations around the world. A secondary target group were people/teams at organizations that are planning to deploy or considering deploying a Dataverse installation. There were 34 existing and planned Dataverse installations participating in the survey.
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized βββ csv_files_with_metadata_from_most_known_dataverse_installations β βββ author(citation).csv β βββ basic.csv β βββ contributor(citation).csv β βββ ... β βββ topic_classification(citation).csv βββ dataverse_json_metadata_from_each_known_dataverse_installation β βββ Abacus_2022.10.02_17.11.19.zip β βββ dataset_pids_Abacus_2022.10.02_17.11.19.csv β βββ Dataverse_JSON_metadata_2022.10.02_17.11.19 β βββ hdl_11272.1_AB2_0AQZNT_v1.0.json β βββ ... β βββ metadatablocks_v5.6 β βββ astrophysics_v5.6.json β βββ biomedical_v5.6.json β βββ citation_v5.6.json β βββ ... β βββ socialscience_v5.6.json β βββ ACSS_Dataverse_2022.10.02_17.26.19.zip β βββ ADA_Dataverse_2022.10.02_17.26.57.zip β βββ Arca_Dados_2022.10.02_17.44.35.zip β βββ ... β βββ World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip βββ dataset_pids_from_most_known_dataverse_installations.csv βββ licenses_used_by_dataverse_installations.csv βββ metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
For creating, optimizing, and evaluating our statistical model, we used the Public Unified Bug Dataset for Java. It contains the data entries of 5 different public bug datasets (PROMISE, Eclipse Bug Dataset, Bug Prediction Dataset, Bugcatchers Bug Dataset, and GitHub Bug Dataset) in a unified manner. The dataset contains 47,618 Java Classes altogether, from which 8,780 contain at least one bug, while 38,838 are bug-free. The total number of bugs recorded in the dataset is 17,365, which means that each bugged Java Class contains 1.98 bugs in average (with standard deviation of 2.39). Unfortunately, the PLS-DA implementation in PLS_Toolbox was too slow due to the tremendous amount of administrative calculations it performs. Therefore, we have developed and used a much faster PLS-DA script independently from PLS_Toolbox. According to the literature, there is no obvious way to choose the fastest and most accurate algorithm. Thus, we had to find the right balance between speed and accuracy, and chose the bidiag2stab method for our implementation. For tuning the model parameters and finding the best possible classification, we performed many model training runs, thus a very fast PLS core implementation was essential. With our PLS-DA Matlab script, we generated a classification using data splitting of 80% training, 10% validation and 10% test sets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains the underlying research data for the publication "High impact bug report identification with imbalanced learning strategies" and the full-text is available from: https://ink.library.smu.edu.sg/sis_research/3702In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.Supplementary code and data available from GitHub:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data are the foundation of science, and there is an increasing focus on how data can be reused and enhanced to drive scientific discoveries. However, most seemingly βopen dataβ do not provide legal permissions for reuse and redistribution. The inability to integrate and redistribute our collective data resources blocks innovation and stymies the creation of life-improving diagnostic and drug selection tools. To help the biomedical research and research support communities (e.g. libraries, funders, repositories, etc.) understand and navigate the data licensing landscape, the (Re)usable Data Project (RDP) (http://reusabledata.org) assesses the licensing characteristics of data resources and how licensing behaviors impact reuse. We have created a ruleset to determine the reusability of data resources and have applied it to 56 scientific data resources (e.g. databases) to date. The results show significant reuse and interoperability barriers. Inspired by game-changing projects like Creative Commons, the Wikipedia Foundation, and the Free Software movement, we hope to engage the scientific community in the discussion regarding the legal use and reuse of scientific data, including the balance of openness and how to create sustainable data resources in an increasingly competitive environment.
Success.aiβs B2B Contact Data and App Developer Data for Engineering Professionals Worldwide is a trusted resource for connecting with engineers and technical managers across industries and regions. This dataset draws from over 170 million verified professional profiles, ensuring you have access to high-quality contact data tailored to your business needs. From sales outreach to recruitment, Success.ai enables you to build meaningful relationships with engineering professionals at every level.
Why Choose Success.aiβs Engineering Professionals Data?
Data is AI-validated, ensuring 99% accuracy for your campaigns.
Global Engineering Coverage:
Includes engineers and technical managers from sectors like manufacturing, IT, construction, aerospace, automotive, and more.
Regions covered include North America, Europe, Asia-Pacific, South America, and the Middle East.
Real-Time Updates:
Continuous updates ensure you stay connected to current roles and decision-makers in engineering.
Compliance and Security:
Fully adheres to GDPR, CCPA, and other global data privacy standards, ensuring legal and ethical use.
Data Highlights: - 170M+ Verified Professional Profiles: Comprehensive data from various industries, including engineering. - 50M Work Emails: Accurate and AI-validated for reliable communication. - 30M Company Profiles: Detailed insights to support targeted outreach. - 700M Global Professional Profiles: A rich dataset designed to meet diverse business needs.
Key Features of the Dataset: - Extensive Engineer Profiles: Covers various roles, including mechanical, software, civil, and electrical engineers, as well as engineering managers and directors. - Customizable Filters: Segment profiles by location, industry, job title, and company size for precise targeting. - AI-Powered Insights: Enriches profiles with contextual details to support personalization.
Strategic Use Cases:
Reach technical decision-makers to accelerate your sales cycles.
Recruitment and Talent Acquisition:
Source skilled engineers and managers for specialized roles.
Use updated profiles to connect with potential candidates effectively.
Targeted Marketing Campaigns:
Launch precision-driven marketing campaigns aimed at engineers and engineering teams.
Personalize outreach with accurate and detailed contact data.
Engineering Services and Solutions:
Pitch your engineering tools, software, or consulting services to professionals who can benefit the most.
Establish connections with managers who influence procurement decisions.
Why Success.ai Stands Out:
Best Price Guarantee: Gain access to high-quality datasets at competitive prices.
Flexible Integration Options: Choose between API access or downloadable formats for seamless integration into your systems.
High Accuracy and Coverage: Benefit from AI-validated contact data for impactful results.
Customizable Datasets: Filter and refine datasets to focus on specific engineering roles, industries, or regions.
APIs for Enhanced Functionality:
Empower your business with B2B Contact Data for Engineering Professionals Worldwide from Success.ai. With verified work emails, phone numbers, and decision-maker profiles, you can confidently target engineers and managers in any sector.
Experience the Best Price Guarantee and unlock the potential of precise, AI-validated datasets. Contact us today and start connecting with engineering leaders worldwide!
No one beats us on price. Period.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the age of digital transformation, scientific and social interest for data and data products is constantly on the rise. The quantity as well as the variety of digital research data is increasing significantly. This raises the question about the governance of this data. For example, how to store the data so that it is presented transparently, freely accessible and subsequently available for re-use in the context of good scientific practice. Research data repositories provide solutions to these issues.
Considering the variety of repository software, it is sometimes difficult to identify a fitting solution for a specific use case. For this purpose a detailed analysis of existing software is needed. Presented table of requirements can serve as a starting point and decision-making guide for choosing the most suitable for your purposes repository software. This table is dealing as a supplementary material for the paper "How to choose a research data repository software? Experience report." (persistent identifier to the paper will be added as soon as paper is published).