100+ datasets found

Basic Functions of the Numerical Structure of Scientific Data
zenodo.org
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Ivanovich Khripkov; Alexander Ivanovich Khripkov (2025). Basic Functions of the Numerical Structure of Scientific Data [Dataset]. http://doi.org/10.5281/zenodo.8137903
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8137903
Dataset updated
Jun 3, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander Ivanovich Khripkov; Alexander Ivanovich Khripkov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thematic meanings of numerical definitions of subject data in various fields of science lead to manipulation of digital codes of known physical, chemical, biological, genetic and other quantities. In principle, each scientific justification contains, to one degree or another, a quantitative, qualitative characteristic of comparison or content. Thus, the language of natural numbers, like mathematical operations, can be accompanied by any definition in any terminology. In this text, the author does not use well-known terms related to the main scientific areas. In this text, the numbers speak for themselves. Any combination of orders or compositions of complex numerical structures presented in this text has its own logical meaning. Any paradox of numerical combinations is an algorithm of real values of numbers.
o
Data Sets for "The tensor t-function: a definition for functions of...
explore.openaire.eu
Updated Nov 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kathryn Lund (2019). Data Sets for "The tensor t-function: a definition for functions of third-order tensors" [Dataset]. http://doi.org/10.5281/zenodo.6420777
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6420777
Dataset updated
Nov 22, 2019
Authors
Kathryn Lund
Description
MATLAB data sets used for numerical tests in K. Lund, The tensor t-function: a definition for functions of third-order tensors, Numerical Linear Algebra with Applications, 27 (3), e2288, 2020. https://doi.org/10.1002/nla.2288 The data and associated code were originally published on GitLab (https://gitlab.com/katlund/bfomfom-main), ca. 2019. The code (drivers, test scripts, etc.) can still be found in the bfomfom repository.
d
815 Million Global Contact Data - B2B / Email / Mobile Phone / LinkedIn URL...
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RampedUp Global Data Solutions, 815 Million Global Contact Data - B2B / Email / Mobile Phone / LinkedIn URL - RampedUp [Dataset]. https://datarade.ai/data-products/global-contact-data-personal-and-professional-840-million-rampedup-global-data-solutions
Explore at:
.json, .csvAvailable download formats
Dataset authored and provided by
RampedUp Global Data Solutions
Area covered
Greece, Haiti, Ireland, Pakistan, Sint Eustatius and Saba, Bolivia (Plurinational State of), Grenada, Uganda, Chad, United States Minor Outlying Islands
Description
Sign Up for a free trial: https://rampedup.io/sign-up-%2F-log-in - 7 Days and 50 Credits to test our quality and accuracy.

These are the fields available within the RampedUp Global dataset.

CONTACT DATA: Personal Email Address - We manage over 115 million personal email addresses Professional Email - We manage over 200 million professional email addresses Home Address - We manage over 20 million home addresses Mobile Phones - 65 million direct lines to decision makers Social Profiles - Individual Facebook, Twitter, and LinkedIn Local Address - We manage 65M locations for local office mailers, event-based marketing or face-to-face sales calls.

JOB DATA: Job Title - Standardized titles for ease of use and selection Company Name - The Contact's current employer Job Function - The Company Department associated with the job role Title Level - The Level in the Company associated with the job role Job Start Date - Identify people new to their role as a potential buyer

EMPLOYER DATA: Websites - Company Website, Root Domain, or Full Domain Addresses - Standardized Address, City, Region, Postal Code, and Country Phone - E164 phone with country code Social Profiles - LinkedIn, CrunchBase, Facebook, and Twitter

FIRMOGRAPHIC DATA: Industry - 420 classifications for categorizing the company’s main field of business Sector - 20 classifications for categorizing company industries 4 Digit SIC Code - 239 classifications and their definitions 6 Digit NAICS - 452 classifications and their definitions Revenue - Estimated revenue and bands from 1M to over 1B Employee Size - Exact employee count and bands Email Open Scores - Aggregated data at the domain level showing relationships between email opens and corporate domains. IP Address -Company level IP Addresses associated to Domains from a DNS lookup

CONSUMER DATA: Education - Alma Mater, Degree, Graduation Date Skills - Accumulated Skills associated with work experience
Interests - Known interests of contact Connections - Number of social connections. Followers - Number of social followers

Download our data dictionary: https://rampedup.io/our-data
f
Data from: INTEGRAL BY WAY OF INFINITE PARTITIONS
figshare.com
pdf
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tiago s. dos Reis (2016). INTEGRAL BY WAY OF INFINITE PARTITIONS [Dataset]. http://doi.org/10.6084/m9.figshare.1133791.v4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1133791.v4
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Authors
Tiago s. dos Reis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We propose a new form of integral which arises from infinite partitions. We use upper and lower series instead of upper and lower Darboux finite sums. We show that every Riemann integrable function, both proper and improper, is integrable in the sense proposed here and both integrals have the same value. We show that the Riemann integral and our integral are equivalent for bounded functions in bounded intervals.
TxDOT Number of Through Lanes Data Dictionary
hub.arcgis.com
geoportal-mpo.opendata.arcgis.com
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Texas Department of Transportation (2025). TxDOT Number of Through Lanes Data Dictionary [Dataset]. https://hub.arcgis.com/documents/d6edcfa4df0b4add8d1d5671a620aa68
Explore at:
Dataset updated
Apr 24, 2025
Dataset authored and provided by
Texas Department of Transportationhttp://txdot.gov/
Description
Programmatically generated Data Dictionary document detailing the TxDOT Number of Through Lanes service.

The PDF contains service metadata and a complete list of data fields. For any questions or issues related to the document, please contact the data owner of the service identified in the PDF and Credits of this portal item. Related Links TxDOT Number of Through Lanes Service URL TxDOT Number of Through Lanes Portal Item
u
Data for Analysis of features in a sliding threshold of observation for...
deepblue.lib.umich.edu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liemohn, Michael W; Adam, Joshua G; Ganushkina, Natalia Y, Data for Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve [Dataset]. http://doi.org/10.7302/2mcx-5749
Explore at:
Unique identifier
https://doi.org/10.7302/2mcx-5749
Dataset provided by
Deep Blue Data
Authors
Liemohn, Michael W; Adam, Joshua G; Ganushkina, Natalia Y
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Sep 20, 2013
Description
Many statistical tools have been developed to aid in the assessment of a numerical model’s quality at reproducing observations. Some of these techniques focus on the identification of events within the data set, times when the observed value is beyond some threshold value that defines it as a value of keen interest. An example of this is whether it will rain, in which events are defined as any precipitation above some defined amount. A method called the sliding threshold of observation for numeric evaluation (STONE) curve sweeps the event definition threshold of both the model output and the observations, resulting in the identification of threshold intervals for which the model does well at sorting the observations into events and nonevents. An excellent data-model comparison will have a smooth STONE curve, but the STONE curve can have wiggles and ripples in it. These features reveal clusters when the model systematically overestimates or underestimates the observations. This study establishes the connection between features in the STONE curve and attributes of the data-model relationship. The method is applied to a space weather example.
V
Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories
data.virginia.gov
data.transportation.gov
+2more
csv, json, rdf, xsl
Updated Jan 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S Department of Transportation (2025). Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories [Dataset]. https://data.virginia.gov/dataset/third-generation-simulation-data-tgsim-i-90-i-94-moving-trajectories
Explore at:
csv, xsl, rdf, jsonAvailable download formats
Dataset updated
Jan 24, 2025
Dataset provided by
Federal Highway Administration
Authors
U.S Department of Transportation
Area covered
Interstate 90
Description
The main dataset is a 130 MB file of trajectory data (I90_94_moving_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for four distinct data collection “Runs” (I90_94_moving_RunX_with_lanes.png, where X equals 1, 2, 3, and 4). Associated centerline files are also provided for each “Run” (I-90-moving-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94moving.csv” for more details). The dataset defines six northbound lanes using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The northbound lanes are shown visually from left to right in I90_94_moving_lane1.png through I90_94_moving_lane6.png.

This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 2 ADAS-equipped vehicles (one at a time) northbound through the 4 km long segment at an altitude of 200 meters. Once a vehicle finished the segment, the helicopter would return to the beginning of the segment to follow the next SAE Level 2 ADAS-equipped vehicle to ensure continuous data collection. The segment was selected to study mandatory and discretionary lane changing and last-minute, forced lane-changing maneuvers. The segment has five off-ramps and three on-ramps to the right and one off-ramp and one on-ramp to the left. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a cloudy day.

As part of this dataset, the following files were provided:
I90_94_moving_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the automated test vehicles ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion.

I90_94_moving_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound lanes) for each run X.

I-90-moving-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and vertical locations in the reference image, respectively. The "ramp" columns define the type of roadway segment (0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments). In total, the centerline files define six northbound lanes.

Annotation on Regions.zip, which includes images that visually map lanes (I90_9

Medical Service Study Area Data Dictionary

data.chhs.ca.gov

Updated Sep 5, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Department of Health Care Access and Information (2024). Medical Service Study Area Data Dictionary [Dataset]. https://data.chhs.ca.gov/dataset/medical-service-study-area-data-dictionary

Explore at:

csv, geojson, html, zip, kml, arcgis geoservices rest apiAvailable download formats

Dataset updated

Sep 5, 2024

Dataset provided by

CA Department of Health Care Access and Information

Authors

Department of Health Care Access and Information

Description

Field Name	Data Type	Description
Statefp	Number	US Census Bureau unique identifier of the state
Countyfp	Number	US Census Bureau unique identifier of the county
Countynm	Text	County name
Tractce	Number	US Census Bureau unique identifier of the census tract
Geoid	Number	US Census Bureau unique identifier of the state + county + census tract
Aland	Number	US Census Bureau defined land area of the census tract
Awater	Number	US Census Bureau defined water area of the census tract
Asqmi	Number	Area calculated in square miles from the Aland
MSSAid	Text	ID of the Medical Service Study Area (MSSA) the census tract belongs to
MSSAnm	Text	Name of the Medical Service Study Area (MSSA) the census tract belongs to
Definition	Text	Type of MSSA, possible values are urban, rural and frontier.
TotalPovPop	Number	US Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701

h
Data from: Numerical ferromagnetic resonance experiments in nano-sized...
rodare.hzdr.de
zip
Updated Dec 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kai, Wagner; Körber, Lukas; Stienen, Sven; Lindner, Jürgen; Farle, Michael; Kákay, Attila (2020). Numerical ferromagnetic resonance experiments in nano-sized elements [Dataset]. http://doi.org/10.14278/rodare.667
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.14278/rodare.667
Dataset updated
Dec 14, 2020
Dataset provided by
Universität Duisburg-Essen
HZDR, TU Dresden
HZDR
Authors
Kai, Wagner; Körber, Lukas; Stienen, Sven; Lindner, Jürgen; Farle, Michael; Kákay, Attila
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the raw data for our paper "Numerical ferromagnetic resonance experiments in nano-sized elements" published in IEEE Magnetic Letters. It is organized in folders according to the figures in the paper. Each folder contains the experimental and numerical data, together with the MuMax3 definition files and possible scripts used for evaluation.
l
Artificial Symbol Learning With Training - Experiment 2 Data analysis
repository.lboro.ac.uk
zip
Updated Jan 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Camilla Gilmore; Matthew Inglis; Hanna Weiers (2025). Artificial Symbol Learning With Training - Experiment 2 Data analysis [Dataset]. http://doi.org/10.17028/rd.lboro.13645850.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.17028/rd.lboro.13645850.v1
Dataset updated
Jan 16, 2025
Dataset provided by
Loughborough University
Authors
Camilla Gilmore; Matthew Inglis; Hanna Weiers
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Zip file containing all data and analysis files for Experiment 2 in:Weiers, H., Inglis, M., & Gilmore, C. (under review). Learning artificial number symbols with ordinal and magnitude information.Article abstractThe question of how numerical symbols gain semantic meaning is a key focus of mathematical cognition research. Some have suggested that symbols gain meaning from magnitude information, by being mapped onto the approximate number system, whereas others have suggested symbols gain meaning from their ordinal relations to other symbols. Here we used an artificial symbol learning paradigm to investigate the effects of magnitude and ordinal information on number symbol learning. Across two experiments, we found that after either magnitude or ordinal training, adults successfully learned novel symbols and were able to infer their ordinal and magnitude meanings. Furthermore, adults were able to make relatively accurate judgements about, and map between, the novel symbols and non-symbolic quantities (dot arrays). Although both ordinal and magnitude training was sufficient to attach meaning to the symbols, we found beneficial effects on the ability to learn and make numerical judgements about novel symbols when combining small amounts of magnitude information for a symbol subset with ordinal information about the whole set. These results suggest that a combination of magnitude and ordinal information is a plausible account of the symbol learning process.© The Authors
Trends in COVID-19 Cases and Deaths in the United States, by County-level...
data.cdc.gov
healthdata.gov
+1more
application/rdfxml +5
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC COVID-19 Response (2023). Trends in COVID-19 Cases and Deaths in the United States, by County-level Population Factors - ARCHIVED [Dataset]. https://data.cdc.gov/w/njmz-dpbc/tdwk-ruhb?cur=K0_qEbFad0O&from=gspC_chSyVH
Explore at:
tsv, xml, csv, json, application/rssxml, application/rdfxmlAvailable download formats
Dataset updated
Jun 6, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC COVID-19 Response
Area covered
United States
Description
Reporting of Aggregate Case and Death Count data was discontinued on May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.

The surveillance case definition for COVID-19, a nationally notifiable disease, was first described in a position statement from the Council for State and Territorial Epidemiologists, which was later revised. However, there is some variation in how jurisdictions implemented these case definitions. More information on how CDC collects COVID-19 case surveillance data can be found at FAQ: COVID-19 Data and Surveillance.

Aggregate Data Collection Process Since the beginning of the COVID-19 pandemic, data were reported from state and local health departments through a robust process with the following steps:
Aggregate county-level counts were obtained indirectly, via automated overnight web collection, or directly, via a data submission process.
If more than one official county data source existed, CDC used a comprehensive data selection process comparing each official county data source to retrieve the highest case and death counts, unless otherwise specified by the state.
A CDC data team reviewed counts for congruency prior to integration and set up alerts to monitor for discrepancies in the data.
CDC routinely compiled these data and post the finalized information on COVID Data Tracker.
County level data were aggregated to obtain state- and territory- specific totals.
Counting of cases and deaths is based on date of report and not on the date of symptom onset. CDC calculates rates in these data by using population estimates provided by the US Census Bureau Population Estimates Program (2019 Vintage).
COVID-19 aggregate case and death data are organized in a time series that includes cumulative number of cases and deaths as reported by a jurisdiction on a given date. New case and death counts are calculated as the week-to-week change in cumulative counts of cases and deaths reported (i.e., newly reported cases and deaths = cumulative number of cases/deaths reported this week minus the cumulative total reported the prior week.

This process was collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provided the most up-to-date numbers on cases and deaths by report date. Throughout data collection, CDC retrospectively updated counts to correct known data quality issues.

Description This archived public use dataset focuses on the cumulative and weekly case and death rates per 100,000 persons within various sociodemographic factors across all states and their counties. All resulting data are expressed as rates calculated as the number of cases or deaths per 100,000 persons in counties meeting various classification criteria using the US Census Bureau Population Estimates Program (2019 Vintage).

Each county within jurisdictions is classified into multiple categories for each factor. All rates in this dataset are based on classification of counties by the characteristics of their population, not individual-level factors. This applies to each of the available factors observed in this dataset. Specific factors and their corresponding categories are detailed below.

Population-level factors Each unique population factor is detailed below. Please note that the “Classification” column describes each of the 12 factors in the dataset, including a data dictionary describing what each numeric digit means within each classification. The “Category” column uses numeric digits (2-6, depending on the factor) defined in the “Classification” column.

Metro vs. Non-Metro – “Metro_Rural” Metro vs. Non-Metro classification type is an aggregation of the 6 National Center for Health Statistics (NCHS) Urban-Rural classifications, where “Metro” counties include Large Central Metro, Large Fringe Metro, Medium Metro, and Small Metro areas and “Non-Metro” counties include Micropolitan and Non-Core (Rural) areas. 1 – Metro, including “Large Central Metro, Large Fringe Metro, Medium Metro, and Small Metro” areas 2 – Non-Metro, including “Micropolitan, and Non-Core” areas

Urban/rural - “NCHS_Class” Urban/rural classification type is based on the 2013 National Center for Health Statistics Urban-Rural Classification Scheme for Counties. Levels consist of:

1 Large Central Metro
2 Large Fringe Metro 3 Medium Metro 4 Small Metro 5 Micropolitan 6 Non-Core (Rural)

American Community Survey (ACS) data were used to classify counties based on their age, race/ethnicity, household size, poverty level, and health insurance status distributions. Cut points were generated by using tertiles and categorized as High, Moderate, and Low percentages. The classification “Percent non-Hispanic, Native Hawaiian/Pacific Islander” is only available for “Hawaii” due to low numbers in this category for other available locations. This limitation also applies to other race/ethnicity categories within certain jurisdictions, where 0 counties fall into the certain category. The cut points for each ACS category are further detailed below:

Age 65 - “Age65”

1 Low (0-24.4%) 2 Moderate (>24.4%-28.6%) 3 High (>28.6%)

Non-Hispanic, Asian - “NHAA”

1 Low (<=5.7%) 2 Moderate (>5.7%-17.4%) 3 High (>17.4%)

Non-Hispanic, American Indian/Alaskan Native - “NHIA”

1 Low (<=0.7%) 2 Moderate (>0.7%-30.1%) 3 High (>30.1%)

Non-Hispanic, Black - “NHBA”

1 Low (<=2.5%) 2 Moderate (>2.5%-37%) 3 High (>37%)

Hispanic - “HISP”

1 Low (<=18.3%) 2 Moderate (>18.3%-45.5%) 3 High (>45.5%)

Population in Poverty - “Pov”

1 Low (0-12.3%) 2 Moderate (>12.3%-17.3%) 3 High (>17.3%)

Population Uninsured- “Unins”

1 Low (0-7.1%) 2 Moderate (>7.1%-11.4%) 3 High (>11.4%)

Average Household Size - “HH”

1 Low (1-2.4) 2 Moderate (>2.4-2.6) 3 High (>2.6)

Community Vulnerability Index Value - “CCVI” COVID-19 Community Vulnerability Index (CCVI) scores are from Surgo Ventures, which range from 0 to 1, were generated based on tertiles and categorized as:

1 Low Vulnerability (0.0-0.4) 2 Moderate Vulnerability (0.4-0.6) 3 High Vulnerability (0.6-1.0)

Social Vulnerability Index Value – “SVI" Social Vulnerability Index (SVI) scores (vintage 2020), which also range from 0 to 1, are from CDC/ASTDR’s Geospatial Research, Analysis & Service Program. Cut points for CCVI and SVI scores were generated based on tertiles and categorized as:

1 Low Vulnerability (0-0.333) 2 Moderate Vulnerability (0.334-0.666) 3 High Vulnerability (0.667-1)
Z
Conceptualization of public data ecosystems
data.niaid.nih.gov
Updated Sep 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
Explore at:
Dataset updated
Sep 26, 2024
Dataset provided by
Martin, Lnenicka
Anastasija, Nikiforova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

Description of the data in this data set

PublicDataEcosystem_SLR provides the structure of the protocol

Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

Spreadsheets #2 provides the protocol structure.

Spreadsheets #3 provides the filled protocol for relevant studies.

The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

Descriptive Information

Article number

A study number, corresponding to the study number assigned in an Excel worksheet

Complete reference

The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

Year of publication

The year in which the study was published.

Journal article / conference paper / book chapter

The type of the paper, i.e., journal article, conference paper, or book chapter.

Journal / conference / book

Journal article, conference, where the paper is published.

DOI / Website

A link to the website where the study can be found.

Number of words

A number of words of the study.

Number of citations in Scopus and WoS

The number of citations of the paper in Scopus and WoS digital libraries.

Availability in Open Access

Availability of a study in the Open Access or Free / Full Access.

Keywords

Keywords of the paper as indicated by the authors (in the paper).

Relevance for our study (high / medium / low)

What is the relevance level of the paper for our study

Approach- and research design-related information

Approach- and research design-related information

Objective / Aim / Goal / Purpose & Research Questions

The research objective and established RQs.

Research method (including unit of analysis)

The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

Study’s contributions

The study’s contribution as defined by the authors

Qualitative / quantitative / mixed method

Whether the study uses a qualitative, quantitative, or mixed methods approach?

Availability of the underlying research data

Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

Period under investigation

Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

Use of theory / theoretical concepts / approaches? If yes, specify them

Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

Quality-related information

Quality concerns

Whether there are any quality concerns (e.g., limited information about the research methods used)?

Public Data Ecosystem-related information

Public data ecosystem definition

How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

Public data ecosystem evolution / development

Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

What constitutes a public data ecosystem?

What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

Components and relationships

What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

Stakeholders

What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

Actors and their roles

What actors does the public data ecosystem involve? What are their roles?

Data (data types, data dynamism, data categories etc.)

What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

Processes / activities / dimensions, data lifecycle phases

What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

Level (if relevant)

What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

Other elements or relationships (if any)

What other elements or relationships does the public data ecosystem consist of?

Additional comments

Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

New papers

Does the study refer to any other potentially relevant papers?

Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

Format of the file.xls, .csv (for the first spreadsheet only), .docx

Licenses or restrictionsCC-BY

For more info, see README.txt
f
Data from: Extensive theoretical/numerical comparative studies on H 2 and...
tandf.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jung Hoon Kim; Tomomichi Hagiwara (2023). Extensive theoretical/numerical comparative studies on H 2 and generalised H 2 norms in sampled-data systems [Dataset]. http://doi.org/10.6084/m9.figshare.4206924.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4206924.v3
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Jung Hoon Kim; Tomomichi Hagiwara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper is concerned with linear time-invariant (LTI) sampled-data systems (by which we mean sampled-data systems with LTI generalised plants and LTI controllers) and studies their H 2 norms from the viewpoint of impulse responses and generalised H 2 norms from the viewpoint of the induced norms from L 2 to L ∞. A new definition of the H 2 norm of LTI sampled-data systems is first introduced through a sort of intermediate standpoint of those for the existing two definitions. We then establish unified treatment of the three definitions of the H 2 norm through a matrix function G(τ) defined on the sampling interval [0, h). This paper next considers the generalised H 2 norms, in which two types of the L ∞ norm of the output are considered as the temporal supremum magnitude under the spatial 2-norm and ∞-norm of a vector-valued function. We further give unified treatment of the generalised H 2 norms through another matrix function F(θ) which is also defined on [0, h). Through a close connection between G(τ) and F(θ), some theoretical relationships between the H 2 and generalised H 2 norms are provided. Furthermore, appropriate extensions associated with the treatment of G(τ) and F(θ) to the closed interval [0, h] are discussed to facilitate numerical computations and comparisons of the H 2 and generalised H 2 norms. Through theoretical and numerical studies, it is shown that the two generalised H 2 norms coincide with neither of the three H 2 norms of LTI sampled-data systems even though all the five definitions coincide with each other when single-output continuous-time LTI systems are considered as a special class of LTI sampled-data systems. To summarise, this paper clarifies that the five control performance measures are mutually related with each other but they are also intrinsically different from each other.
V
Third Generation Simulation Data (TGSIM) I-90/I-94 Stationary Trajectories
data.virginia.gov
data.transportation.gov
+1more
csv, json, rdf, xsl
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S Department of Transportation (2025). Third Generation Simulation Data (TGSIM) I-90/I-94 Stationary Trajectories [Dataset]. https://data.virginia.gov/dataset/third-generation-simulation-data-tgsim-i-90-i-94-stationary-trajectories
Explore at:
xsl, rdf, csv, jsonAvailable download formats
Dataset updated
Jan 24, 2025
Dataset provided by
Federal Highway Administration
Authors
U.S Department of Transportation
Area covered
Interstate 90
Description
The main dataset is a 304 MB file of trajectory data (I90_94_stationary_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) vehicles and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for six distinct data collection “Runs” (I90_94_Stationary_Run_X_ref_image.png, where X equals 1, 2, 3, 4, 5, and 6). Associated centerline files are also provided for each “Run” (I-90-stationary-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94Stationary.csv” for more details). The dataset defines six northbound lanes using these centerline files. Twelve different numerical IDs are used to define the six northbound lanes (1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, and 15) depending on the run. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. Lane IDs are provided in the reference images in red text for each data collection run (I90_94_Stationary_Run_X_ref_image_annotated.jpg, where X equals 1, 2, 3, 4, 5, and 6).

This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using the fixed location aerial videography approach with one high-resolution 8K camera mounted on a helicopter hovering over a short segment of I-94 focusing on the merge and diverge points in Chicago, IL. The altitude of the helicopter (approximately 213 meters) enabled the camera to capture 1.3 km of highway driving and a major weaving section in each direction (where I-90 and I-94 diverge in the northbound direction and merge in the southbound direction). The segment has two off-ramps and two on-ramps in the northbound direction. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (4:00 PM-6:00 PM CT) on a cloudy day. During this period, two SAE Level 2 ADAS-equipped vehicles drove through the segment, entering the northbound direction upstream of the target section, exiting the target section on the right through I-94, and attempting to perform a total of three lane-changing maneuvers (if safe to do so). These vehicles are indicated in the dataset.

As part of this dataset, the following files were provided:
I90_94_stationary_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle type, width, and length are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion.

I90_94_Stationary_Run_X_ref_image.png are the aerial reference images that define the geographic region for each run X.

I-90-stationary-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane centerlines for each Run X. The "x" and "y" columns represent the horizontal and ve
U
United States US: Business Enterprise Sector: Number of Researchers: Total
ceicdata.com
Updated Oct 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). United States US: Business Enterprise Sector: Number of Researchers: Total [Dataset]. https://www.ceicdata.com/en/united-states/number-of-researchers-and-personnel-on-research-and-development-oecd-member-annual
Explore at:
Dataset updated
Oct 24, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2008 - Dec 1, 2021
Area covered
United States
Description
US: Business Enterprise Sector: Number of Researchers: Total data was reported at 1,466,769.000 Person in 2022. This records an increase from the previous number of 1,454,417.000 Person for 2021. US: Business Enterprise Sector: Number of Researchers: Total data is updated yearly, averaging 1,014,000.000 Person from Dec 2008 (Median) to 2022, with 11 observations. The data reached an all-time high of 1,466,769.000 Person in 2022 and a record low of 950,000.000 Person in 2010. US: Business Enterprise Sector: Number of Researchers: Total data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s United States – Table US.OECD.MSTI: Number of Researchers and Personnel on Research and Development: OECD Member: Annual.
For the United States, some respondents revised their reporting practices and eliminated expenditures that did not meet the definition of R&D during the 2023 BERD data collection. This has resulted in a meaningful decrease in estimated U.S. R&D performance compared to the amount of 2023 R&D performance that would have been estimated based on respondent reporting practices used in 2022 and earlier..From 2021 onwards, changes to the US BERD survey questionnaire allowed for more exhaustive identification of acquisition costs for ‘identifiable intangible assets’ used for R&D. This has resulted in a substantial increase in reported R&D capital expenditure within BERD. In the business sector, the funds from the rest of the world previously included in the business-financed BERD, are available separately from 2008. From 2006 onwards, GOVERD includes state government intramural performance (most of which being financed by the federal government and state government own funds). From 2016 onwards, PNPERD data are based on a new R&D performer survey. In the higher education sector all fields of SSH are included from 2003 onwards.
Following a survey of federally-funded research and development centers (FFRDCs) in 2005, it was concluded that FFRDC R&D belongs in the government sector - rather than the sector of the FFRDC administrator, as had been reported in the past. R&D expenditures by FFRDCs were reclassified from the other three R&D performing sectors to the Government sector; previously published data were revised accordingly. Between 2003 and 2004, the method used to classify data by industry has been revised. This particularly affects the ISIC category “wholesale trade” and consequently the BERD for total services.
U.S. R&D data are generally comparable, but there are some areas of underestimation:

i) Up to 2008, Government sector R&D performance covers only federal government activities. That by State and local government establishments is excluded;

ii) Except for the Government and the Business Enterprise sectors, the R&D data exclude most capital expenditures. For the Business Enterprise sector, depreciation is reported in place of gross capital expenditures up to 2014. Higher education (and national total) data were revised back to 1998 due to an improved methodology that corrects for double-counting of R&D funds passed between institutions.
Breakdown by type of R&D (basic research, applied research, etc.) was also revised back to 1998 in the business enterprise and higher education sectors due to improved estimation procedures.
The methodology for estimating researchers was changed as of 1985. In the Government, Higher Education and PNP sectors the data since then refer to employed doctoral scientists and engineers who report their primary work activity as research, development or the management of R&D, plus, for the Higher Education sector, the number of full-time equivalent graduate students with research assistantships averaging an estimated 50 % of their time engaged in R&D activities. As of 1985 researchers in the Government sector exclude military personnel. As of 1987, Higher education R&D personnel also include those who report their primary work activity as design.
Due to lack of official data for the different employment sectors, the total researchers figure is an OECD estimate up to 2021. As of 2022, it is based on official personnel data available for all sectors. For years 2020 and 2021, it is based on official personnel data available for the business, PNP and Higher Education sectors, and OECD estimates for the Government sector (for estimating the missing FFRDC component). For previous years, OECD estimates were readjusted back to 2000.
The government personnel data includes the state government R&D personnel from 2021 and FFRDC R&D personnel from 2022. However, 8 FFRDC centres are not included as they could not report their R&D personnel data. These 8 centres account for 24% of the total R&D expenditure of all FFRDCs in 2022. Pre-production development is excluded from Defence GBARD (in accordance with the Frascati Manual) as of 2000. 2009 GBARD data also includes the one time incremental R&D funding legislated in the American Recovery and Reinvestment Act of 2009. Beginning with the 2000 GBARD data, budgets for capital expenditure – “R&D plant” in national terminology - are included. GBARD data for earlier years relate to budgets for current costs only.
Dictionary of Algorithms and Data Structures (DADS)
catalog.data.gov
datadiscoverystudio.org
+3more
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Dictionary of Algorithms and Data Structures (DADS) [Dataset]. https://catalog.data.gov/dataset/dictionary-of-algorithms-and-data-structures-dads-910e0
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The Dictionary of Algorithms and Data Structures (DADS) is an online, publicly accessible dictionary of generally useful algorithms, data structures, algorithmic techniques, archetypal problems, and related definitions. In addition to brief definitions, some entries have links to related entries, links to implementations, and additional information. DADS is meant to be a resource for the practicing programmer, although students and researchers may find it a useful starting point. DADS has fundamental entries in areas such as theory, cryptography and compression, graphs, trees, and searching, for instance, Ackermann's function, quick sort, traveling salesman, big O notation, merge sort, AVL tree, hash table, and Byzantine generals. DADS also has index pages that list entries by area and by type. Currently DADS does not include algorithms particular to business data processing, communications, operating systems or distributed algorithms, programming languages, AI, graphics, or numerical analysis.
Forecast: Number of Employees in Medium-High (3-Digit Definition) R&D...
reportlinker.com
Updated Apr 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ReportLinker (2024). Forecast: Number of Employees in Medium-High (3-Digit Definition) R&D Intensive Activities in the US 2024 - 2028 [Dataset]. https://www.reportlinker.com/dataset/f146026552454c9d93154f19b153b5d7c4422f2a
Explore at:
Dataset updated
Apr 8, 2024
Dataset authored and provided by
ReportLinker
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States
Description
Forecast: Number of Employees in Medium-High (3-Digit Definition) R&D Intensive Activities in the US 2024 - 2028 Discover more data with ReportLinker!
V
Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories
data.virginia.gov
data.transportation.gov
+1more
csv, json, rdf, xsl
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S Department of Transportation (2025). Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories [Dataset]. https://data.virginia.gov/dataset/third-generation-simulation-data-tgsim-i-294-l1-trajectories
Explore at:
xsl, json, rdf, csvAvailable download formats
Dataset updated
Jan 24, 2025
Dataset provided by
Federal Highway Administration
Authors
U.S Department of Transportation
Area covered
Interstate 294
Description
The main dataset is a 70 MB file of trajectory data (I294_L1_final.csv) that contains position, speed, and acceleration data for small and large automated (L1) vehicles and non-automated vehicles on a highway in a suburban environment. Supporting files include aerial reference images for ten distinct data collection “Runs” (I294_L1_RunX_with_lanes.png, where X equals 8, 18, and 20 for southbound runs and 1, 3, 7, 9, 11, 19, and 21 for northbound runs). Associated centerline files are also provided for each “Run” (I-294-L1-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I294 L1.csv” for more details). The dataset defines eight lanes (four lanes in each direction) using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The southbound lanes are shown visually in I294_L1_Lane-2.png through I294_L1_Lane-5.png and the northbound lanes are shown visually in I294_L1_Lane2.png through I294_L1_Lane5.png.

This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 1 ADAS-equipped vehicles with adaptive cruise control (ACC) enabled. The three vehicles manually entered the highway, moved to the second from left most lane, then enabled ACC with minimum following distance settings to initiate a string. The helicopter then followed the string of vehicles (which sometimes broke from the sting due to large following distances) northbound through the 4.8 km section of highway at an altitude of 300 meters. The goal of the data collection effort was to collect data related to human drivers' responses to vehicle strings. The road segment has four lanes in each direction and covers major on-ramp and an off-ramp in the southbound direction and one on-ramp in the northbound direction. The segment of highway is operated by Illinois Tollway and contains a high percentage of heavy vehicles. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a sunny day.

As part of this dataset, the following files were provided:
I294_L1_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the test vehicles with ACC engaged ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion.

I294_L1_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound and southbound lanes) for each run X.

I-294-L1-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane cent
H
Dictionary of Titles
dataverse.harvard.edu
search.dataone.org
Updated Apr 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahad Althobaiti; Ahmad Alabdulkareem; Judy Hanwen Shen; Iyad Rahwan; Esteban Moro; Alex Rutherford (2022). Dictionary of Titles [Dataset]. http://doi.org/10.7910/DVN/DQW8IP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/DQW8IP
Dataset updated
Apr 6, 2022
Dataset provided by
Harvard Dataverse
Authors
Shahad Althobaiti; Ahmad Alabdulkareem; Judy Hanwen Shen; Iyad Rahwan; Esteban Moro; Alex Rutherford
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Hand transcribed content from the United States Bureau of Labour Statistics Dictionary of Titles (DoT). The DoT is a record of occupations and a description of the tasks performed. Five editions exist from 1939, 1949, 1965, 1977 and 1991. The DoT was replaced by O*NET structured data on jobs, workers and their characteristics. However, apart from the 1991 data, the data in the DoT is not easily ingestible, existing only in scalar PDF documents. Attempts at Optical Character Recognition led to low accuracy. For that reason we present here hand transcribed textual data from these documents. Various data are available for each occupation e.g. numerical codes, references to other occupations as well as the free text description. For that reason the data for each edition is presented in 'long' format with a variable number of lines, with a blank line between occupations. Consult the transcription instructions for more details. Structured meta-data (see here) on occupations is also available for the 1965, 1977 and 1991 editions. For the 1965, 1977 and 1991 editions, this data can be extracted from the numerical codes with the occupational entries, the key for these codes is found in the 1965 edition in separate tables exist which were transcribed. The instructions provided to transcribers for this edition are also added to the repository. The original documents are freely available in PDF format (e.g. here) This data accompanies the paper 'Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation' by Althobaiti et al
t
Data from: Trusted Research Environments: Analysis of Characteristics and...
researchdata.tuwien.ac.at
bin, csv
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber (2024). Trusted Research Environments: Analysis of Characteristics and Data Availability [Dataset]. http://doi.org/10.48436/cv20m-sg117
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.48436/cv20m-sg117
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.
Methodology
We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:
Peer-reviewed articles where available,
TRE websites,
TRE metadata catalogs.
The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.
Technical details
This dataset consists of five comma-separated values (.csv) files describing our inventory:
countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional)
tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional)
access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional)
inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional)
major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional).
Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:
schema.sql: Schema definition file to create the tables and views used in the analysis.
The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

Facebook

Twitter

Click to copy link

Link copied

Cite

Alexander Ivanovich Khripkov; Alexander Ivanovich Khripkov (2025). Basic Functions of the Numerical Structure of Scientific Data [Dataset]. http://doi.org/10.5281/zenodo.8137903

Basic Functions of the Numerical Structure of Scientific Data

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.8137903

Dataset updated

Jun 3, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Alexander Ivanovich Khripkov; Alexander Ivanovich Khripkov

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Thematic meanings of numerical definitions of subject data in various fields of science lead to manipulation of digital codes of known physical, chemical, biological, genetic and other quantities. In principle, each scientific justification contains, to one degree or another, a quantitative, qualitative characteristic of comparison or content. Thus, the language of natural numbers, like mathematical operations, can be accompanied by any definition in any terminology. In this text, the author does not use well-known terms related to the main scientific areas. In this text, the numbers speak for themselves. Any combination of orders or compositions of complex numerical structures presented in this text has its own logical meaning. Any paradox of numerical combinations is an algorithm of real values of numbers.

Clear search

Close search

Google apps

Main menu

Basic Functions of the Numerical Structure of Scientific Data

Data Sets for "The tensor t-function: a definition for functions of...

815 Million Global Contact Data - B2B / Email / Mobile Phone / LinkedIn URL...

Data from: INTEGRAL BY WAY OF INFINITE PARTITIONS

TxDOT Number of Through Lanes Data Dictionary

Data for Analysis of features in a sliding threshold of observation for...

Third Generation Simulation Data (TGSIM) I-90/I-94 Moving Trajectories

Medical Service Study Area Data Dictionary

Data from: Numerical ferromagnetic resonance experiments in nano-sized...

Artificial Symbol Learning With Training - Experiment 2 Data analysis

Trends in COVID-19 Cases and Deaths in the United States, by County-level...

Conceptualization of public data ecosystems

Data from: Extensive theoretical/numerical comparative studies on H 2 and...

Third Generation Simulation Data (TGSIM) I-90/I-94 Stationary Trajectories

United States US: Business Enterprise Sector: Number of Researchers: Total

Dictionary of Algorithms and Data Structures (DADS)

Forecast: Number of Employees in Medium-High (3-Digit Definition) R&D...

Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories

Dictionary of Titles

Data from: Trusted Research Environments: Analysis of Characteristics and...

Methodology

Technical details

Basic Functions of the Numerical Structure of Scientific Data