100+ datasets found

m
6. Definitions and examples of the moves of the UPOCS genre
data.mendeley.com
Updated Nov 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Author Anonym (2021). 6. Definitions and examples of the moves of the UPOCS genre [Dataset]. http://doi.org/10.17632/7yg2y4sdkn.1
Explore at:
Unique identifier
https://doi.org/10.17632/7yg2y4sdkn.1
Dataset updated
Nov 5, 2021
Authors
Author Anonym
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Definitions and examples of the moves of the UPOCS genre
2021 Methodological Summary And Definitions
catalog.data.gov
odgavaprod.ogopendata.com
+1more
Updated Sep 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Substance Abuse and Mental Health Services Administration (2025). 2021 Methodological Summary And Definitions [Dataset]. https://catalog.data.gov/dataset/2021-methodological-summary-and-definitions
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
Description
Use this summary report to properly interpret 2021 NSDUH estimates of substance use and mental health issues. The report accompanies theannual detailed tablesand covers overall methodology, key definitions for measures and terms used in 2021 NSDUH reports and tables, and selected analyses of the measures and how they should be interpreted.The report is organized into six chapters:Introduction.Description of the survey, including information about the sample design, data collection procedures, and key aspects of data processing such as development of the analysis weights. The report also includes methodological changes and related issues in the 2021 NSDUH due to COVID-19.Technical details on the statistical methods and measurement, such as suppression criteria for unreliable estimates, statistical testing procedures, issues around selected substance use and mental health measures, and the impact of methodological changes on response rates.Special topics related to prescription psychotherapeutic drugs.A comparison between NSDUH and other sources of data on substance use and mental health issues, including data sources for populations outside the NSDUH target population.A more in-depth view of special methodological issues for the 2021 NSDUH, including the results of special analyses that led SAMHSA to not compare estimates from 2021 to estimates from previous years.An appendix covers key definitions used in NSDUH reports and tables.
f
Definitions of independent variables used in the statistical analysis.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Feb 20, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McConway, Kevin; Cameron, Robert; Bertorelle, Giorgio; Sattmann, Helmut; Cook, Laurence; Juan, Xavier; Anton, Christian; Fontaine, Benoît; Dodd, Mike; Skelton, Peter; Stalažs, Arturs; Féher, Zoltan; Schilthuizen, Menno; Rammul, Üllar; Oliveira, Cristina; Ożgo, Małgorzata; Pokryszko, Beata; Silvertown, Jonathan; Baur, Bruno; Bossdorf, Oliver; Sólymos, Péter; Correia, Maria; Worthington, Jenny; Gill, Eoin (2013). Definitions of independent variables used in the statistical analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001733024
Explore at:
Dataset updated
Feb 20, 2013
Authors
McConway, Kevin; Cameron, Robert; Bertorelle, Giorgio; Sattmann, Helmut; Cook, Laurence; Juan, Xavier; Anton, Christian; Fontaine, Benoît; Dodd, Mike; Skelton, Peter; Stalažs, Arturs; Féher, Zoltan; Schilthuizen, Menno; Rammul, Üllar; Oliveira, Cristina; Ożgo, Małgorzata; Pokryszko, Beata; Silvertown, Jonathan; Baur, Bruno; Bossdorf, Oliver; Sólymos, Péter; Correia, Maria; Worthington, Jenny; Gill, Eoin
Description
Definitions of independent variables used in the statistical analysis.

Social Media PII Disclosure Analyses

kaggle.com

zip

Updated Jul 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Eidan Rosado (2024). Social Media PII Disclosure Analyses [Dataset]. https://www.kaggle.com/datasets/edyvision/social-media-pii-disclosure-analyses

Explore at:

zip(29813203 bytes)Available download formats

Dataset updated

Jul 30, 2024

Authors

Eidan Rosado

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Privacy vs. Social Capital: Social Media PII Disclosure Analyses

This data was collected and analyzed as part of a study on PII disclosures in social media conversations with special attention to influencer characteristics in the interactions in the dissertation titled Privacy vs. Social Capital: Examining Information Disclosure Patterns within Social Media Influencer Networks and the research paper titled Unveiling Influencer-Driven Personal Data Sharing in Social Media Discourse.

Each study phase is different, with X (Twitter) data used in the pilot analysis and Reddit data used in the main study. Both folders will have the analyzed_posts and cluster summary csv files broken down by collection (either based on trend or collection date).

Note: Raw data is not made available in these datasets due to the nature of the study and to protect the original authors.

Notable Data Elements

Post Data

Column name	Type	Description
Node ID	UUID	Unique identifier for post (replaces original platform identifier)
User ID	UUID	Unique identifier assigned for user (replaces original platform identifier)
Cluster Name	Str	Composite ID for subgraph using collection name and subgraph index
Influence Power	Float	Eigenvector centrality
Influencer Tier	Str	Categorical label calculated by follower count
Collection Name	Str	Trend collection assigned based on search query
Hashtags	Set(str)	The set of hashtags included in the node
PII Disclosed	Bool	Whether or not PII was disclosed
PII Detected	Set(str)	The detected token types in post
PII Risk Score	Float	The PII score for all tokens in a post
Is Comment	Bool	Whether or not the post is a comment or reply
Is Text Starter	Bool	Whether or not the post has text content
Community	Str	The group, community, channel, etc. associated with
Timestamp	Timestamp	Creation timestamp (provided by social media API)
Time Elapsed	Int	Time elapsed (seconds) from original influencer’s post

Cluster Data

Column Name	Type	Description
Cluster Name	Str	Composite ID for subgraph using collection name and subgraph index
Influencer Tiers Frequencies	List[dict]	Frequency of influencer tiers of all users in the cluster
Top Influence Power Score	Float	Eigenvector centrality of top influencer
Top Influencer Tier	Str	Size tier of top influencer
Collection Name	Str	Trend collection assigned based on search query.
Hashtags	Set(str)	The set of hashtags included in the cluster
PII Detection Frequencies	List[dict]	The detected token types in post with frequencies
Node Count	Int	Count of all nodes in the influencer cluster
Node Disclosures	Int	Count of all nodes with mean_risk_score > 1*
Disclosure Ratio	Float	Sum of nodes with confirmed disclosed PII divided by overall cluster size (count of nodes in the cluster)
Mean Risk Score	Float	The mean risk score for an entire network cluster
Median Risk Score	Float	The median risk score for an entire network cluster
Min Risk Score	Float	The min risk score for an entire network cluster
Max Risk Score	Float	The max risk score for an entire network cluster
Time Span	Float	Total Time Elapsed

d
Tabular statistical summay of data analysis - Calawah River Riverscape Study...
catalog.data.gov
s.cnmilf.com
+1more
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2025). Tabular statistical summay of data analysis - Calawah River Riverscape Study [Dataset]. https://catalog.data.gov/dataset/tabular-statistical-summay-of-data-analysis-calawah-river-riverscape-study3
Explore at:
Dataset updated
May 24, 2025
Dataset provided by
(Point of Contact, Custodian)
Area covered
Calawah River
Description
The objective of this study was to identify the patterns of juvenile salmonid distribution and relative abundance in relation to habitat correlates. It is the first dataset of its kind because the entire river was snorkeled by one person in multiple years. During two consecutive summers, we completed a census of juvenile salmonids and stream habitat across a stream network. We used the data to test the ability of habitat models to explain the distribution of juvenile coho salmon (Oncorhynchus kisutch), young-of-the-year (age 0) steelhead (Oncorhynchus mykiss), and steelhead parr (= age 1) for a network consisting of several different sized streams. Our network-scale models, which included five stream habitat variables, explained 27%, 11%, and 19% of the variation in the density of juvenile coho salmon, age 0 steelhead, and steelhead parr, respectively. We found weak to strong levels of spatial auto-correlation in the model residuals (Moran's I values ranging from 0.25 - 0.71). Explanatory power of base habitat models increased substantially and the level of spatial auto-correlation decreased with sequential inclusion of variables accounting for stream size, year, stream, and reach location. The models for specific streams underscored the variability that was implied in the network-scale models. Associations between juvenile salmonids and individual habitat variables were rarely linear and ranged from negative to positive, and the variable accounting for location of the habitat within a stream was often more important than any individual habitat variable. The limited success in predicting the summer distribution and density of juvenile coho salmon and steelhead with our network-scale models was apparently related to variation in the strength and shape of fish-habitat associations across and within streams and years. Summary of statistical analysis of the Calawah Riverscape data. NOAA was not involved and did not pay for the collection of this data. This data represents the statistical analysis carried out by Martin Liermann as a NOAA employee.
Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx
frontiersin.figshare.com
docx
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ethan M. Meyers (2024). Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx [Dataset]. http://doi.org/10.3389/fninf.2023.1275903.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fninf.2023.1275903.s001
Dataset updated
Jan 3, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Ethan M. Meyers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Neural decoding is a powerful method to analyze neural activity. However, the code needed to run a decoding analysis can be complex, which can present a barrier to using the method. In this paper we introduce a package that makes it easy to perform decoding analyses in the R programing language. We describe how the package is designed in a modular fashion which allows researchers to easily implement a range of different analyses. We also discuss how to format data to be able to use the package, and we give two examples of how to use the package to analyze real data. We believe that this package, combined with the rich data analysis ecosystem in R, will make it significantly easier for researchers to create reproducible decoding analyses, which should help increase the pace of neuroscience discoveries.
The definitions of slums and favelas and its implication on population data:...
scielo.figshare.com
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfredo Pereira de Queiroz Filho (2023). The definitions of slums and favelas and its implication on population data: a content analysis approach [Dataset]. http://doi.org/10.6084/m9.figshare.7506944.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7506944.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Alfredo Pereira de Queiroz Filho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AbstractThis article aimed to discuss the different definitions of slums and favelas and their implication on population data. The definitions discussed were extracted from research related to the United Nations Human Settlements Programme (UN-Habitat) and the Instituto Brasileiro de Geografia e Estatística (IBGE). The data manipulation was performed according to the content analysis (CA) approach. The quantification performed with Iramuteq software was based on word frequency and factorial correspondence analysis (FCA). Qualitative and quantitative analyzes highlighted two major differences: in the object characterization (area, building and both); and qualification type (legal aspects, construction standards, infrastructure deficiency, land property, population density, geographic references and residents typing). With the high number of qualifications and diverse content, the population data aggregate different information, making its comparison less accurate. This imprecision tends to expand due to the area growth and the number of countries analyzed.
H
Introduction to Time Series Analysis for Hydrologic Data
hydroshare.org
hydroshare.cuahsi.org
zip
Updated Jan 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriela Garcia; Kateri Salk (2021). Introduction to Time Series Analysis for Hydrologic Data [Dataset]. https://www.hydroshare.org/resource/ee2a4c2151f24115a12e34d4d22d96fe
Explore at:
zip(1.1 MB)Available download formats
Dataset updated
Jan 29, 2021
Dataset provided by
HydroShare
Authors
Gabriela Garcia; Kateri Salk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 1, 1974 - Jan 27, 2021
Area covered
Description
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.

Introduction

Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme and lme4 R packages). We will be covering a few simple applications of time series analysis in these lessons.

Opportunities

Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:

Has there been an increasing or decreasing trend in the response variable over time?

Can we forecast conditions in the future?

Challenges

Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:

Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).

Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.

Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.

Heteroscedasticity: The variance of the time series is not constant over time.

Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.

Learning Objectives

After successfully completing this notebook, you will be able to:

Choose appropriate time series analyses for trend detection and forecasting

Discuss the influence of seasonality on time series analysis

Interpret and communicate results of time series analyses
Data from: THE ADVANCED ANALYTICS JUMPSTART: DEFINITION, PROCESS MODEL, BEST...
scielo.figshare.com
jpeg
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Rose; Mikael Berndtsson; Gunnar Mathiason; Peter Larsson (2023). THE ADVANCED ANALYTICS JUMPSTART: DEFINITION, PROCESS MODEL, BEST PRACTICES [Dataset]. http://doi.org/10.6084/m9.figshare.5862411.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5862411.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Jeremy Rose; Mikael Berndtsson; Gunnar Mathiason; Peter Larsson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT Companies are encouraged by the big data trend to experiment with advanced analytics and many turn to specialist consultancies to help them get started where they lack the necessary competences. We investigate the program of one such consultancy, Advectas - in particular the advanced analytics Jumpstart. Using qualitative techniques including semi structured interviews and content analysis we investigate the nature and value of the Jumpstart concept through five cases in different companies. We provide a definition, a process model and a set of thirteen best practices derived from these experiences, and discuss the distinctive qualities of this approach.
2022 Methodological Summary And Definitions
data.virginia.gov
gimi9.com
+1more
html
Updated Sep 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Substance Abuse and Mental Health Services Administration (2025). 2022 Methodological Summary And Definitions [Dataset]. https://data.virginia.gov/dataset/2022-methodological-summary-and-definitions
Explore at:
htmlAvailable download formats
Dataset updated
Sep 6, 2025
Dataset provided by
Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
Description
Use this summary report to properly interpret 2022 NSDUH estimates related to substance use, mental health, and treatment. The report accompanies theannual detailed tablesand covers overall methodology, key definitions for measures and terms used in 2022 NSDUH reports and tables, and selected analyses of the measures and how they should be interpreted.The report is organized into five chapters:Introduction.Description of the survey, including information about the sample design, data collection procedures and questionnaire changes, and key aspects of data processing such as development of the analysis weights.Technical details on the statistical methods and measurement, such as suppression criteria for unreliable estimates, statistical testing procedures, revised estimates for 2021 to account for data collection mode, and issues around selected substance use and mental health measures.Special topics related to prescription psychotherapeutic drugs.Description of other sources of data on substance use and mental health issues in the United States, including data sources for populations outside the NSDUH target population.An appendix covers key definitions used in NSDUH reports and tables.
g
2019 Methodological Summary and Definitions
gimi9.com
data.virginia.gov
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
2019 Methodological Summary and Definitions [Dataset]. https://gimi9.com/dataset/data-gov_2019-methodological-summary-and-definitions/
Explore at:
Description
Use this summary report to properly interpret 2019 NSDUH estimates of substance use and mental health issues. The report accompanies theannual detailed tablesand covers overall methodology, key definitions for measures and terms used in 2019 NSDUH reports and tables, and selected analyses of the measures and how they should be interpreted.The report is organized into five chapters:Introduction.Description of the survey, including information about the sample design, data collection procedures, and key aspects of data processing such as development of the analysis weights.Technical details on the statistical methods and measurement, such as suppression criteria for unreliable estimates, statistical testing procedures, issues around data accuracy, and measurement issues for selected substance use and mental health measures.Special topics related to prescription psychotherapeutic drugs.A comparison between NSDUH and other sources of data on substance use and mental health issues, including data sources for populations outside the NSDUH target population.An appendix covers key definitions used in NSDUH reports and tables.
Z
Conceptualization of public data ecosystems
data.niaid.nih.gov
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
Explore at:
Dataset updated
Sep 26, 2024
Dataset provided by
University of Tartu
University of Hradec Králové
Authors
Anastasija, Nikiforova; Martin, Lnenicka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

Description of the data in this data set

PublicDataEcosystem_SLR provides the structure of the protocol

Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

Spreadsheets #2 provides the protocol structure.

Spreadsheets #3 provides the filled protocol for relevant studies.

The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

Descriptive Information

Article number

A study number, corresponding to the study number assigned in an Excel worksheet

Complete reference

The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

Year of publication

The year in which the study was published.

Journal article / conference paper / book chapter

The type of the paper, i.e., journal article, conference paper, or book chapter.

Journal / conference / book

Journal article, conference, where the paper is published.

DOI / Website

A link to the website where the study can be found.

Number of words

A number of words of the study.

Number of citations in Scopus and WoS

The number of citations of the paper in Scopus and WoS digital libraries.

Availability in Open Access

Availability of a study in the Open Access or Free / Full Access.

Keywords

Keywords of the paper as indicated by the authors (in the paper).

Relevance for our study (high / medium / low)

What is the relevance level of the paper for our study

Approach- and research design-related information

Approach- and research design-related information

Objective / Aim / Goal / Purpose & Research Questions

The research objective and established RQs.

Research method (including unit of analysis)

The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

Study’s contributions

The study’s contribution as defined by the authors

Qualitative / quantitative / mixed method

Whether the study uses a qualitative, quantitative, or mixed methods approach?

Availability of the underlying research data

Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

Period under investigation

Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

Use of theory / theoretical concepts / approaches? If yes, specify them

Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

Quality-related information

Quality concerns

Whether there are any quality concerns (e.g., limited information about the research methods used)?

Public Data Ecosystem-related information

Public data ecosystem definition

How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

Public data ecosystem evolution / development

Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

What constitutes a public data ecosystem?

What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

Components and relationships

What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

Stakeholders

What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

Actors and their roles

What actors does the public data ecosystem involve? What are their roles?

Data (data types, data dynamism, data categories etc.)

What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

Processes / activities / dimensions, data lifecycle phases

What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

Level (if relevant)

What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

Other elements or relationships (if any)

What other elements or relationships does the public data ecosystem consist of?

Additional comments

Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

New papers

Does the study refer to any other potentially relevant papers?

Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

Format of the file.xls, .csv (for the first spreadsheet only), .docx

Licenses or restrictionsCC-BY

For more info, see README.txt
g
Dictionary of Algorithms and Data Structures (DADS)
gimi9.com
data.nist.gov
+3more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dictionary of Algorithms and Data Structures (DADS) [Dataset]. https://gimi9.com/dataset/data-gov_dictionary-of-algorithms-and-data-structures-dads/
Explore at:
Description
The Dictionary of Algorithms and Data Structures (DADS) is an online, publicly accessible dictionary of generally useful algorithms, data structures, algorithmic techniques, archetypal problems, and related definitions. In addition to brief definitions, some entries have links to related entries, links to implementations, and additional information. DADS is meant to be a resource for the practicing programmer, although students and researchers may find it a useful starting point. DADS has fundamental entries in areas such as theory, cryptography and compression, graphs, trees, and searching, for instance, Ackermann's function, quick sort, traveling salesman, big O notation, merge sort, AVL tree, hash table, and Byzantine generals. DADS also has index pages that list entries by area and by type. Currently DADS does not include algorithms particular to business data processing, communications, operating systems or distributed algorithms, programming languages, AI, graphics, or numerical analysis.
f
Statistical analyses.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menze, Bjoern H.; Schmitz, Désirée A.; Li, Hongwei Bran; Kümmerli, Rolf; Wechsler, Tobias (2024). Statistical analyses. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001429748
Explore at:
Dataset updated
Aug 8, 2024
Authors
Menze, Bjoern H.; Schmitz, Désirée A.; Li, Hongwei Bran; Kümmerli, Rolf; Wechsler, Tobias
Description
The zebrafish Danio rerio has become a popular model host to explore disease pathology caused by infectious agents. A main advantage is its transparency at an early age, which enables live imaging of infection dynamics. While multispecies infections are common in patients, the zebrafish model is rarely used to study them, although the model would be ideal for investigating pathogen-pathogen and pathogen-host interactions. This may be due to the absence of an established multispecies infection protocol for a defined organ and the lack of suitable image analysis pipelines for automated image processing. To address these issues, we developed a protocol for establishing and tracking single and multispecies bacterial infections in the inner ear structure (otic vesicle) of the zebrafish by imaging. Subsequently, we generated an image analysis pipeline that involved deep learning for the automated segmentation of the otic vesicle, and scripts for quantifying pathogen frequencies through fluorescence intensity measures. We used Pseudomonas aeruginosa, Acinetobacter baumannii, and Klebsiella pneumoniae, three of the difficult-to-treat ESKAPE pathogens, to show that our infection protocol and image analysis pipeline work both for single pathogens and pairwise pathogen combinations. Thus, our protocols provide a comprehensive toolbox for studying single and multispecies infections in real-time in zebrafish.
Lifestyle_and_Health_Risk_Prediction_Dataset
kaggle.com
zip
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zahra Nusrat (2025). Lifestyle_and_Health_Risk_Prediction_Dataset [Dataset]. https://www.kaggle.com/datasets/zahranusrat/lifestyle-and-health-risk-prediction-dataset
Explore at:
zip(61147 bytes)Available download formats
Dataset updated
Oct 23, 2025
Authors
Zahra Nusrat
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🧩 About Dataset

This dataset provides a detailed collection of information related to [your topic], offering valuable insights for data analysis, visualization, and model development. It consists of multiple features such as [list of important columns], which capture various dimensions of the subject in a structured and measurable way.

The purpose of this dataset is to support exploratory data analysis (EDA) and predictive modeling by allowing users to identify trends, patterns, and relationships among variables. It can serve as a foundation for building machine learning models, performing statistical studies, or generating data-driven visual reports.

Researchers, data enthusiasts, and students can use this dataset to enhance their analytical understanding, practice preprocessing techniques, and improve their ability to draw meaningful conclusions from real-world data.

Additionally, this dataset can be explored to uncover correlations, test hypotheses, and visualize behavioral or performance patterns. Its clean structure and well-defined variables make it suitable for both beginners learning EDA and experienced professionals developing predictive insights.
Supporting Data for Method Assessment for Non-Targeted Analyses (MANTA)...
data.nist.gov
datasets.ai
+2more
Updated May 24, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Place (2021). Supporting Data for Method Assessment for Non-Targeted Analyses (MANTA) Program: Interlaboratory Study 1 Results [Dataset]. http://doi.org/10.18434/mds2-2412
Explore at:
Unique identifier
https://doi.org/10.18434/mds2-2412, https://identifiers.org/ark:/88434/mds2-2412
Dataset updated
May 24, 2021
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Authors
Benjamin Place
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
Supporting data for the results of Interlaboratory 1 of the Method Assessment for Non-Targeted Analyses. The datasets include the chemical compound descriptions, laboratory mean responses, and the tools for the principal components analysis of the datasets. In addition, a Microsoft Excel file, which was given to all participants, allowed for the analysis of the metadata.
Exploratory Data Analysis (EDA) for COVIND-19
kaggle.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Badea-Matei Iuliana (2024). Exploratory Data Analysis (EDA) for COVIND-19 [Dataset]. https://www.kaggle.com/datasets/mateiiuliana/exploratory-data-analysis-eda-for-covind-19
Explore at:
zip(26972 bytes)Available download formats
Dataset updated
Apr 8, 2024
Authors
Badea-Matei Iuliana
Description
Description: The COVID-19 dataset used for this EDA project encompasses comprehensive data on COVID-19 cases, deaths, and recoveries worldwide. It includes information gathered from authoritative sources such as the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and national health agencies. The dataset covers global, regional, and national levels, providing a holistic view of the pandemic's impact.

Purpose: This dataset is instrumental in understanding the multifaceted impact of the COVID-19 pandemic through data exploration. It aligns perfectly with the objectives of the EDA project, aiming to unveil insights, patterns, and trends related to COVID-19. Here are the key objectives: 1. Data Collection and Cleaning: • Gather reliable COVID-19 datasets from authoritative sources (such as WHO, CDC, or national health agencies). • Clean and preprocess the data to ensure accuracy and consistency. 2. Descriptive Statistics: • Summarize key statistics: total cases, recoveries, deaths, and testing rates. • Visualize temporal trends using line charts, bar plots, and heat maps. 3. Geospatial Analysis: • Map COVID-19 cases across countries, regions, or cities. • Identify hotspots and variations in infection rates. 4. Demographic Insights: • Explore how age, gender, and pre-existing conditions impact vulnerability. • Investigate disparities in infection rates among different populations. 5. Healthcare System Impact: • Analyze hospitalization rates, ICU occupancy, and healthcare resource allocation. • Assess the strain on medical facilities. 6. Economic and Social Effects: • Investigate the relationship between lockdown measures, economic indicators, and infection rates. • Explore behavioral changes (e.g., mobility patterns, remote work) during the pandemic. 7. Predictive Modeling (Optional): • If data permits, build simple predictive models (e.g., time series forecasting) to estimate future cases.

Data Sources: The primary sources of the COVID-19 dataset include the Johns Hopkins CSSE COVID-19 Data Repository, Google Health’s COVID-19 Open Data, and the U.S. Economic Development Administration (EDA). These sources provide reliable and up-to-date information on COVID-19 cases, deaths, testing rates, and other relevant variables. Additionally, GitHub repositories and platforms like Medium host supplementary datasets and analyses, enriching the available data resources.

Data Format: The dataset is available in various formats, such as CSV and JSON, facilitating easy access and analysis. Before conducting the EDA, the data underwent preprocessing steps to ensure accuracy and consistency. Data cleaning procedures were performed to address missing values, inconsistencies, and outliers, enhancing the quality and reliability of the dataset.

License: The COVID-19 dataset may be subject to specific usage licenses or restrictions imposed by the original data sources. Proper attribution is essential to acknowledge the contributions of the WHO, CDC, national health agencies, and other entities providing the data. Users should adhere to any licensing terms and usage guidelines associated with the dataset.

Attribution: We acknowledge the invaluable contributions of the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), national health agencies, and other authoritative sources in compiling and disseminating the COVID-19 data used for this EDA project. Their efforts in collecting, curating, and sharing data have been instrumental in advancing our understanding of the pandemic and guiding public health responses globally.
The Canada Trademarks Dataset
zenodo.org
pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4999655
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeremy Sheff; Jeremy Sheff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Canada Trademarks Dataset

18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

Python and Stata Scripts (c) 2021 Jeremy Sheff

Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

Terms of Use:

As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

Details of Repository Contents:

This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

/csv: contains the .csv versions of the data files

/do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset

/dta: contains the .dta versions of the data files

/py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
m
Replication Data for: Upcoming issues, new methods: using Interactive...
data.mendeley.com
Updated Oct 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gustavo Behling (2021). Replication Data for: Upcoming issues, new methods: using Interactive Qualitative Analysis (IQA) in Management Research [Dataset]. http://doi.org/10.17632/kb76h5jtvw.1
Explore at:
Unique identifier
https://doi.org/10.17632/kb76h5jtvw.1
Dataset updated
Oct 18, 2021
Authors
Gustavo Behling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data refer to the paper “Upcoming issues, new methods: using Interactive Qualitative Analysis (IQA) in Management Research”. This article is a guide to the application of the IQA method in management research and the files available refer to: 1. 1-Affinities, definitions, and cards produced by focus group.docx: all cards, affinities and definitions create by focus group session.docx 2. 2-Step-by-step - Analysis procedures.docx: detailed data analysis procedures.docx 3. 3-Axial Coding Tables – Individual Interviews.docx: detailed axial coding procedures.docx 4. 4-Theoretical Coding Table – Individual Interviews.docx: detailed theoretical coding procedures.docx
OYO hotel dataset
kaggle.com
zip
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JIS College of Engineering (2025). OYO hotel dataset [Dataset]. https://www.kaggle.com/datasets/jiscecseaiml/oyo-hotel-dataset
Explore at:
zip(75756 bytes)Available download formats
Dataset updated
Feb 4, 2025
Authors
JIS College of Engineering
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview The OYO Hotel Rooms Dataset provides comprehensive data on hotel room listings from OYO, covering various attributes related to pricing, amenities, and customer ratings. This dataset is valuable for researchers, data scientists, and machine learning practitioners interested in hospitality analytics, price prediction, customer satisfaction analysis, and clustering-based insights.

Data Source The dataset has been collected from publicly available OYO hotel listings and includes structured information for analysis.

Features The dataset consists of multiple attributes that define each hotel room, including:

Hotel Name: The name of the hotel property. City: The location where the hotel is situated. Room Type: Category of the room (e.g., Standard, Deluxe, Suite). Price (INR): The cost per night in Indian Rupees. Discounted Price: The price after applying discounts. Rating: The customer rating for the hotel (out of 5). Reviews: The number of customer reviews. Amenities: A list of available facilities such as WiFi, AC, Breakfast, Parking, etc. Latitude & Longitude: Geolocation details for mapping and spatial analysis. Potential Use Cases Price Prediction: Using regression models to predict hotel room pricing. Customer Sentiment Analysis: Analyzing ratings and reviews to understand customer satisfaction. Market Segmentation: Clustering hotels based on price, rating, and location. Recommendation Systems: Building personalized hotel recommendations. File Format

OYO_HOTEL_ROOMS.xlsx (Excel format) – Contains structured tabular data.

Acknowledgment This dataset is intended for academic and research purposes. The data is sourced from publicly available hotel listings and does not contain any personally identifiable information.

Facebook

Twitter

Click to copy link

Link copied

Cite

Author Anonym (2021). 6. Definitions and examples of the moves of the UPOCS genre [Dataset]. http://doi.org/10.17632/7yg2y4sdkn.1

6. Definitions and examples of the moves of the UPOCS genre

Explore at:

Unique identifier

https://doi.org/10.17632/7yg2y4sdkn.1

Dataset updated

Nov 5, 2021

Authors

Author Anonym

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Definitions and examples of the moves of the UPOCS genre

Clear search

Close search

Google apps

Main menu

6. Definitions and examples of the moves of the UPOCS genre

2021 Methodological Summary And Definitions

Definitions of independent variables used in the statistical analysis.

Social Media PII Disclosure Analyses

Privacy vs. Social Capital: Social Media PII Disclosure Analyses

Notable Data Elements

Post Data

Cluster Data

Tabular statistical summay of data analysis - Calawah River Riverscape Study...

Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx

The definitions of slums and favelas and its implication on population data:...

Introduction to Time Series Analysis for Hydrologic Data

Data from: THE ADVANCED ANALYTICS JUMPSTART: DEFINITION, PROCESS MODEL, BEST...

2022 Methodological Summary And Definitions

2019 Methodological Summary and Definitions

Conceptualization of public data ecosystems

Dictionary of Algorithms and Data Structures (DADS)

Statistical analyses.

Lifestyle_and_Health_Risk_Prediction_Dataset

🧩 About Dataset

Supporting Data for Method Assessment for Non-Targeted Analyses (MANTA)...

Exploratory Data Analysis (EDA) for COVIND-19

The Canada Trademarks Dataset

Replication Data for: Upcoming issues, new methods: using Interactive...

OYO hotel dataset

6. Definitions and examples of the moves of the UPOCS genreSee More Versions

6. Definitions and examples of the moves of the UPOCS genre