Field Name | Data Type | Description |
Statefp | Number | US Census Bureau unique identifier of the state |
Countyfp | Number | US Census Bureau unique identifier of the county |
Countynm | Text | County name |
Tractce | Number | US Census Bureau unique identifier of the census tract |
Geoid | Number | US Census Bureau unique identifier of the state + county + census tract |
Aland | Number | US Census Bureau defined land area of the census tract |
Awater | Number | US Census Bureau defined water area of the census tract |
Asqmi | Number | Area calculated in square miles from the Aland |
MSSAid | Text | ID of the Medical Service Study Area (MSSA) the census tract belongs to |
MSSAnm | Text | Name of the Medical Service Study Area (MSSA) the census tract belongs to |
Definition | Text | Type of MSSA, possible values are urban, rural and frontier. |
TotalPovPop | Number | US Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701 |
This digital dataset was created as part of a U.S. Geological Survey study, done in cooperation with the Monterey County Water Resource Agency, to conduct a hydrologic resource assessment and develop an integrated numerical hydrologic model of the hydrologic system of Salinas Valley, CA. As part of this larger study, the USGS developed this digital dataset of geologic data and three-dimensional hydrogeologic framework models, referred to here as the Salinas Valley Geological Framework (SVGF), that define the elevation, thickness, extent, and lithology-based texture variations of nine hydrogeologic units in Salinas Valley, CA. The digital dataset includes a geospatial database that contains two main elements as GIS feature datasets: (1) input data to the 3D framework and textural models, within a feature dataset called “ModelInput”; and (2) interpolated elevation, thicknesses, and textural variability of the hydrogeologic units stored as arrays of polygonal cells, within a feature dataset called “ModelGrids”. The model input data in this data release include stratigraphic and lithologic information from water, monitoring, and oil and gas wells, as well as data from selected published cross sections, point data derived from geologic maps and geophysical data, and data sampled from parts of previous framework models. Input surface and subsurface data have been reduced to points that define the elevation of the top of each hydrogeologic units at x,y locations; these point data, stored in a GIS feature class named “ModelInputData”, serve as digital input to the framework models. The location of wells used a sources of subsurface stratigraphic and lithologic information are stored within the GIS feature class “ModelInputData”, but are also provided as separate point feature classes in the geospatial database. Faults that offset hydrogeologic units are provided as a separate line feature class. Borehole data are also released as a set of tables, each of which may be joined or related to well location through a unique well identifier present in each table. Tables are in Excel and ascii comma-separated value (CSV) format and include separate but related tables for well location, stratigraphic information of the depths to top and base of hydrogeologic units intercepted downhole, downhole lithologic information reported at 10-foot intervals, and information on how lithologic descriptors were classed as sediment texture. Two types of geologic frameworks were constructed and released within a GIS feature dataset called “ModelGrids”: a hydrostratigraphic framework where the elevation, thickness, and spatial extent of the nine hydrogeologic units were defined based on interpolation of the input data, and (2) a textural model for each hydrogeologic unit based on interpolation of classed downhole lithologic data. Each framework is stored as an array of polygonal cells: essentially a “flattened”, two-dimensional representation of a digital 3D geologic framework. The elevation and thickness of the hydrogeologic units are contained within a single polygon feature class SVGF_3DHFM, which contains a mesh of polygons that represent model cells that have multiple attributes including XY location, elevation and thickness of each hydrogeologic unit. Textural information for each hydrogeologic unit are stored in a second array of polygonal cells called SVGF_TextureModel. The spatial data are accompanied by non-spatial tables that describe the sources of geologic information, a glossary of terms, a description of model units that describes the nine hydrogeologic units modeled in this study. A data dictionary defines the structure of the dataset, defines all fields in all spatial data attributer tables and all columns in all nonspatial tables, and duplicates the Entity and Attribute information contained in the metadata file. Spatial data are also presented as shapefiles. Downhole data from boreholes are released as a set of tables related by a unique well identifier, tables are in Excel and ascii comma-separated value (CSV) format.
The main dataset is a 304 MB file of trajectory data (I90_94_stationary_final.csv) that contains position, speed, and acceleration data for small and large automated (L2) vehicles and non-automated vehicles on a highway in an urban environment. Supporting files include aerial reference images for six distinct data collection “Runs” (I90_94_Stationary_Run_X_ref_image.png, where X equals 1, 2, 3, 4, 5, and 6). Associated centerline files are also provided for each “Run” (I-90-stationary-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I90_94Stationary.csv” for more details). The dataset defines six northbound lanes using these centerline files. Twelve different numerical IDs are used to define the six northbound lanes (1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, and 15) depending on the run. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. Lane IDs are provided in the reference images in red text for each data collection run (I90_94_Stationary_Run_X_ref_image_annotated.jpg, where X equals 1, 2, 3, 4, 5, and 6).
This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using the fixed location aerial videography approach with one high-resolution 8K camera mounted on a helicopter hovering over a short segment of I-94 focusing on the merge and diverge points in Chicago, IL. The altitude of the helicopter (approximately 213 meters) enabled the camera to capture 1.3 km of highway driving and a major weaving section in each direction (where I-90 and I-94 diverge in the northbound direction and merge in the southbound direction). The segment has two off-ramps and two on-ramps in the northbound direction. All roads have 88 kph (55 mph) speed limits. The camera captured footage during the evening rush hour (4:00 PM-6:00 PM CT) on a cloudy day. During this period, two SAE Level 2 ADAS-equipped vehicles drove through the segment, entering the northbound direction upstream of the target section, exiting the target section on the right through I-94, and attempting to perform a total of three lane-changing maneuvers (if safe to do so). These vehicles are indicated in the dataset.
As part of this dataset, the following files were provided:
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Numerical phantom data for an MR Fingerprinting reconstruction. Further described in repository and manuscript.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Global Number of Employees in High- and Medium-High (3-Digit Definition) R&D Intensive Activities by Country, 2023 Discover more data with ReportLinker!
The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.
The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Sample survey data [ssd]
The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.
Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.
For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.
For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).
Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).
For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.
For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.
For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.
Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).
Computer Assisted Personal Interview [capi]
Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.
For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.
For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.
Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Zip file containing all data and analysis files for Experiment 1 in:Weiers, H., Inglis, M., & Gilmore, C. (under review). Learning artificial number symbols with ordinal and magnitude information.Article abstractThe question of how numerical symbols gain semantic meaning is a key focus of mathematical cognition research. Some have suggested that symbols gain meaning from magnitude information, by being mapped onto the approximate number system, whereas others have suggested symbols gain meaning from their ordinal relations to other symbols. Here we used an artificial symbol learning paradigm to investigate the effects of magnitude and ordinal information on number symbol learning. Across two experiments, we found that after either magnitude or ordinal training, adults successfully learned novel symbols and were able to infer their ordinal and magnitude meanings. Furthermore, adults were able to make relatively accurate judgements about, and map between, the novel symbols and non-symbolic quantities (dot arrays). Although both ordinal and magnitude training was sufficient to attach meaning to the symbols, we found beneficial effects on the ability to learn and make numerical judgements about novel symbols when combining small amounts of magnitude information for a symbol subset with ordinal information about the whole set. These results suggest that a combination of magnitude and ordinal information is a plausible account of the symbol learning process.© The Authors
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Hand transcribed content from the United States Bureau of Labour Statistics Dictionary of Titles (DoT). The DoT is a record of occupations and a description of the tasks performed. Five editions exist from 1939, 1949, 1965, 1977 and 1991. The DoT was replaced by O*NET structured data on jobs, workers and their characteristics. However, apart from the 1991 data, the data in the DoT is not easily ingestible, existing only in scalar PDF documents. Attempts at Optical Character Recognition led to low accuracy. For that reason we present here hand transcribed textual data from these documents. Various data are available for each occupation e.g. numerical codes, references to other occupations as well as the free text description. For that reason the data for each edition is presented in 'long' format with a variable number of lines, with a blank line between occupations. Consult the transcription instructions for more details. Structured meta-data (see here) on occupations is also available for the 1965, 1977 and 1991 editions. For the 1965, 1977 and 1991 editions, this data can be extracted from the numerical codes with the occupational entries, the key for these codes is found in the 1965 edition in separate tables exist which were transcribed. The instructions provided to transcribers for this edition are also added to the repository. The original documents are freely available in PDF format (e.g. here) This data accompanies the paper 'Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation' by Althobaiti et al
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table contains data at regional level on the number of persons employed on agricultural holdings, the corresponding annual work units (AWUs) and the number of holdings with workers.
The figures in this table are derived from the agricultural census. Data collection for the agricultural census is part of a combined data collection for a.o. agricultural policy use and enforcement of the manure law.
Regional breakdown is based on the main location of the holding. Due to this the region where activities (crops, animals) are allocated may differ from the location where these activities actually occur.
The agricultural census is also used as the basis for the European Farm Structure Survey (FSS). Data from the agricultural census do not fully coincide with the FSS. In the FSS years (2000, 2003, 2005, 2007 and 2010) additional information was collected to meet the requirements of the FSS.
Data on labour force refer to the period April to March of the year preceding the agricultural census.
In 2022, equidae are not part of the Agricultural Census. This affects the farm type and the total number of farms in the Agricultural Census. Farms with horses, ponies and donkeys that were previously classified as ‘specialist grazing livestock' could be classified, according to their dominant activity, as another farm type in 2022.
From 2018 onwards the number of calves for fattening, pigs for fattening, chicken and turkey are adjusted in the case of temporary breaks in the production cycle (e.g. sanitary cleaning). The agricultural census is a structural survey, in which adjustment for temporary breaks in the production cycle is a.o. relevant for the calculation of the economic size of the holding, and its farm type. In the livestock surveys the number of animals on the reference day is relevant, therefore no adjustment for temporary breaks in the production cycle are made. This means that the number of animals in the tables of the agricultural census may differ from those in the livestock tables (see ‘links to relevant tables and relevant articles).
From 2017 onwards, animal numbers are increasingly derived from I&R registers (Identification and Registration of animals), instead of by means of the combined data collection. The I&R registers are the responsibility of RVO (Netherlands Enterprise Agency). Since 2017, cattle numbers are derived from I&R cattle, and from 2018 sheep, goats and poultry are also derived from the relevant I&R registers. The registration of cattle, sheep and goats takes place directly at RVO. Poultry data is collected via the designated database Poultry Information System Poultry (KIP) from Avined. Avined is a branch organization for the egg and poultry meat sectors. Avined passes the data on to the central database of RVO. Due to the transition to the use of I&R registers, a change in classification will occur for sheep and goats from 2018 onwards.
Since 2016, information of the Dutch Business Register is used to define the agricultural census. Registration in the Business Register with an agricultural standard industrial classification code, related to NACE/ISIC, (in Dutch SBI: ‘Standaard BedrijfsIndeling’) is leading to determine whether there is an agricultural holding. This aligns the agricultural census as closely as possible to the statistical regulations of Eurostat and the (Dutch) implementation of the definition of 'active farmer' as described in the common agricultural policy.
The definition of the agricultural census based on information from the Dutch Business Register mainly affects the number of holdings, a clear deviation of the trend occurs. The impact on areas (except for other land and rough grazing) and the number of animals (except for sheep, and horses and ponies) is limited. This is mainly due to the holdings that are excluded as a result of the new delimitation of agricultural holdings (such as equestrian centres, city farms and organisations in nature management).
In 2011 there were changes in geographic assignment of holdings with a foreign main seat. This may influence regional figures, mainly in border regions.
Until 2010 the economic size of agricultural holdings was expressed in Dutch size units (in Dutch NGE: 'Nederlandse Grootte Eenheid'). From 2010 onwards this has become Standard Output (SO). This means that the threshold for holdings in the agricultural census has changed from 3 NGE to 3000 euro SO. For comparable time series the figures for 2000 up to and including 2009 have been recalculated, based on SO coefficients and SO typology. The latest update was in 2016.
Data available from: 2000
Status of the figures: The figures for 2024 are provisional, all other figures are final.
Changes as of November 28, 2024: the provisional figures for 2024 have been added.
When will new figures be published? According to regular planning provisional figures for the current year are published in November and the definite figures will follow in March of the following year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Poland PL: Total Business Enterprise R&D Personnel: Per Thousand Employment In Industry data was reported at 8.068 Per 1000 in 2021. This records an increase from the previous number of 7.452 Per 1000 for 2020. Poland PL: Total Business Enterprise R&D Personnel: Per Thousand Employment In Industry data is updated yearly, averaging 1.897 Per 1000 from Dec 1994 (Median) to 2021, with 28 observations. The data reached an all-time high of 8.068 Per 1000 in 2021 and a record low of 0.786 Per 1000 in 2002. Poland PL: Total Business Enterprise R&D Personnel: Per Thousand Employment In Industry data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s Poland – Table PL.OECD.MSTI: Number of Researchers and Personnel on Research and Development: OECD Member: Annual. In Poland, in 2016, some units previously classified in the Government sector were reallocated to the Business sector. From 2013, improvements in R&D surveys enable the distribution of all expenditure by type of R&D, leading to a break in basic research series.From reference year 2019 onwards, GBARD for GUF is derived from the Higher Education R&D survey, in which units now report on subsidies received from the ministry responsible for science and higher education, whereas these estimates came directly from this ministry before 2019. GBARD data exclude European Commission funds since 2012.;
Definition of MSTI variables 'Value Added of Industry' and 'Industrial Employment':
R&D data are typically expressed as a percentage of GDP to allow cross-country comparisons. When compiling such indicators for the business enterprise sector, one may wish to exclude, from GDP measures, economic activities for which the Business R&D (BERD) is null or negligible by definition. By doing so, the adjusted denominator (GDP, or Value Added, excluding non-relevant industries) better correspond to the numerator (BERD) with which it is compared to.
The MSTI variable 'Value added in industry' is used to this end:
It is calculated as the total Gross Value Added (GVA) excluding 'real estate activities' (ISIC rev.4 68) where the 'imputed rent of owner-occupied dwellings', specific to the framework of the System of National Accounts, represents a significant share of total GVA and has no R&D counterpart. Moreover, the R&D performed by the community, social and personal services is mainly driven by R&D performers other than businesses.
Consequently, the following service industries are also excluded: ISIC rev.4 84 to 88 and 97 to 98. GVA data are presented at basic prices except for the People's Republic of China, Japan and New Zealand (expressed at producers' prices).In the same way, some indicators on R&D personnel in the business sector are expressed as a percentage of industrial employment. The latter corresponds to total employment excluding ISIC rev.4 68, 84 to 88 and 97 to 98.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).
As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.
This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.
Description of the data in this data set
PublicDataEcosystem_SLR provides the structure of the protocol
Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies
Spreadsheets #2 provides the protocol structure.
Spreadsheets #3 provides the filled protocol for relevant studies.
The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information
Descriptive Information
Article number
A study number, corresponding to the study number assigned in an Excel worksheet
Complete reference
The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.
Year of publication
The year in which the study was published.
Journal article / conference paper / book chapter
The type of the paper, i.e., journal article, conference paper, or book chapter.
Journal / conference / book
Journal article, conference, where the paper is published.
DOI / Website
A link to the website where the study can be found.
Number of words
A number of words of the study.
Number of citations in Scopus and WoS
The number of citations of the paper in Scopus and WoS digital libraries.
Availability in Open Access
Availability of a study in the Open Access or Free / Full Access.
Keywords
Keywords of the paper as indicated by the authors (in the paper).
Relevance for our study (high / medium / low)
What is the relevance level of the paper for our study
Approach- and research design-related information
Approach- and research design-related information
Objective / Aim / Goal / Purpose & Research Questions
The research objective and established RQs.
Research method (including unit of analysis)
The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.
Study’s contributions
The study’s contribution as defined by the authors
Qualitative / quantitative / mixed method
Whether the study uses a qualitative, quantitative, or mixed methods approach?
Availability of the underlying research data
Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?
Period under investigation
Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)
Use of theory / theoretical concepts / approaches? If yes, specify them
Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).
Quality-related information
Quality concerns
Whether there are any quality concerns (e.g., limited information about the research methods used)?
Public Data Ecosystem-related information
Public data ecosystem definition
How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?
Public data ecosystem evolution / development
Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?
What constitutes a public data ecosystem?
What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).
Components and relationships
What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).
Stakeholders
What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?
Actors and their roles
What actors does the public data ecosystem involve? What are their roles?
Data (data types, data dynamism, data categories etc.)
What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.
Processes / activities / dimensions, data lifecycle phases
What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?
Level (if relevant)
What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).
Other elements or relationships (if any)
What other elements or relationships does the public data ecosystem consist of?
Additional comments
Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).
New papers
Does the study refer to any other potentially relevant papers?
Additional references to potentially relevant papers that were found in the analysed paper (snowballing).
Format of the file.xls, .csv (for the first spreadsheet only), .docx
Licenses or restrictionsCC-BY
For more info, see README.txt
This survey was conducted in Tunisia between March 2013 and July 2014, as part of the joint World Bank, European Bank for Reconstruction and Development (EBRD) and European Investment Bank (EIB) Enterprise Survey. The objective of the survey is to obtain feedback from enterprises on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country's business environment. The remaining questions assess the survey respondents' opinions on what are the obstacles to firm growth and performance.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
The sample was selected using stratified random sampling. Three levels of stratification were used in this country: industry, establishment size, and region.
Industry was stratified into three manufacturing (food, garments, and other manufacturing) and two service (retail and other services) sectors.
Size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not common practice, apart from the construction and agriculture sectors which are not included in the survey.
Regional stratification was defined in five regions: Tunis, Sfax, Northeast (consisting of Ariana, Ben Arous, Bizerte, Manouba, and Nabeul), South Coast/West (Sousse, Monastir, Mahdia, Gabes, Medenine) and the Interior (Beja, Gafsa, Jendouba, Kairouan, Kasserine, Kebili, Kef, Sidi Bouzid, Siliana, Tataouine, and Tozeur).
For Tunisia ES, two sample frames were used: the Guide Economique de la Tunisie, 2013 and the Orbis database from Bureau van Dijk. The former did not include firm size information based on size, while the latter was considered to have a full representation of large firms. The Guide Economique source was used for small and medium strata, while the Orbis source was used for large firms. Duplicate entries were removed, with preference for the frame with present size information.
The enumerated establishments with five employees or more were then used as the sample frame with the aim of obtaining interviews at 600 establishments. Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 8.5% (576 out of 6,806 establishments).
Face-to-face [f2f]
The following survey instruments are available: - Manufacturing Questionnaire; - Services Questionnaire.
All variables are named using, first, the letter of each section and, second, the number of the variable within the section, i.e. a1 denotes section A, question 1. Variable names proceeded by a prefix "MNA" indicate questions specific to the Middle East and North Africa region, therefore, they may not be found in the implementation of the rollout in other countries. All other suffixed variables are global and are present in all economy surveys over the world. All variables are numeric with the exception of those variables with an "x" at the end of their names. The suffix "x" denotes that the variable is alpha-numeric.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
The number of realized interviews per contacted establishment was 0.25. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 0.73.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as a different option from don’t know. b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. So far this has only been the case for the month September 2021, while it will also be the case for October, November and December 2021. For months prior to September 2021 the final release has always been equal to ERA5T, and the goal is to align the two again after December 2021. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 monthly mean data on pressure levels from 1940 to present".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Total Researchers: Full-Time Equivalent data was reported at 2,637,193.100 FTE in 2022. This records an increase from the previous number of 2,405,509.400 FTE for 2021. China Total Researchers: Full-Time Equivalent data is updated yearly, averaging 1,181,575.900 FTE from Dec 1991 (Median) to 2022, with 32 observations. The data reached an all-time high of 2,637,193.100 FTE in 2022 and a record low of 471,400.000 FTE in 1991. China Total Researchers: Full-Time Equivalent data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s China – Table CN.OECD.MSTI: Number of Researchers and Personnel on Research and Development: Non OECD Member: Annual.
The national breakdown by source of funds does not fully match with the classification defined in the Frascati Manual. The R&D financed by the government, business enterprises, and by the rest of the world can be retrieved but part of the expenditure has no specific source of financing, i.e. self-raised funding (in particular for independent research institutions), the funds from the higher education sector and left-over government grants from previous years.
The government and higher education sectors cover all fields of NSE and SSH while the business enterprise sector only covers the fields of NSE. There are only few organisations in the private non-profit sector, hence no R&D survey has been carried out in this sector and the data are not available.
From 2009, researcher data are collected according to the Frascati Manual definition of researcher. Beforehand, this was only the case for independent research institutions, while for the other sectors data were collected according to the UNESCO concept of “scientist and engineer”.
In 2009, the survey coverage in the business and the government sectors has been expanded.
Before 2000, all of the personnel data and 95% of the expenditure data in the business enterprise sector are for large and medium-sized enterprises only. Since 2000 however, the survey covers almost all industries and all enterprises above a certain threshold. In 2000 and 2004, a census of all enterprises was held, while in the intermediate years data for small enterprises are estimated.
Due to the reform of the S&T system some government institutions have become enterprises, and their R&D data have been reflected in the Business Enterprise sector since 2000.
This table contains data on the number of licensed day care center slots (facility capacity) per 1,000 children aged 0-5 years in California, its regions, counties, cities, towns, and census tracts. The table contains 2015 data, and includes type of facility (day care center or infant center). Access to child care has become a critical support for working families. Many working families find high-quality child care unaffordable, and the increasing cost of child care can be crippling for low-income families and single parents. These barriers can impact parental choices of child care. Increased availability of child care facilities can positively impact families by providing more choices of child care in terms of price and quality. Estimates for this indicator are provided for the total population, and are not available by race/ethnicity. More information on the data table and a data dictionary can be found in the Data and Resources section. The licensed day care centers table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. More information on HCI can be found here: https://www.cdph.ca.gov/Programs/OHE/CDPH%20Document%20Library/Accessible%202%20CDPH_Healthy_Community_Indicators1pager5-16-12.pdf The format of the licensed day care centers table is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.
The role of Data Science and AI for predicting the decline of professionals in the recruitment process: augmenting decision-making in human resources management Features Description: Declined: Variable to be predict, where value 0 means that the candi- date continued in the recruit- ment process until the hiring, and value 1 implies the candi- date’s declination from recruit- ment process. ValueClient: The total amount the customer plan to pay by the hired candidate. The value 0 means that client yet did not define a value to pay the candidate. Values must be greater than or equal to 0. ExtraCost: Extra cost the customer has to pay to hire the candidate. Values must be greater than or equal to 0. ValueResources: Requested value by the candidate to work. The value 0 means that the candidate did not request a salary amount yet an this value will be negotiate later. Values must be greater than or equal to 0. Net: The difference between the “ValueClient”, yearly taxes and “ValueResources”. Negative values mean that the amount the client plans to pay the candidate has not yet been defined and is still open for negotiation. DaysOnContact: Number of days that the candidate is in the “Contact” step of the recruitment process. Values must be greater than or equal to 0. DaysOnInterview: Number of days that the candidate is in the “Interview” step of the recruitment process. Values must be greater than or equal to 0. DaysOnSendCV: Number of days that the candidate is in the “Send CV” step of the recruitment process. Values must be greater than or equal to 0. DaysOnReturn: Number of days that the candidate is in the “Return” step of the recruitment process. Values must be greater than or equal to 0. DaysOnCSchedule: Number of days that the candidate is in the “C. Schedule” step of the recruitment process. Values must be greater than or equal to 0. DaysOnCRealized: Number of days that the candidate is in the “C. Realized” step of the recruitment process. Values must be greater than or equal to 0. ProcessDuration: Duration of entire recruitment process in days. Values must be greater than or equal to 0
Reporting of Aggregate Case and Death Count data was discontinued on May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.
The surveillance case definition for COVID-19, a nationally notifiable disease, was first described in a position statement from the Council for State and Territorial Epidemiologists, which was later revised. However, there is some variation in how jurisdictions implemented these case definitions. More information on how CDC collects COVID-19 case surveillance data can be found at FAQ: COVID-19 Data and Surveillance.
Aggregate Data Collection Process Since the beginning of the COVID-19 pandemic, data were reported from state and local health departments through a robust process with the following steps:
This process was collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provided the most up-to-date numbers on cases and deaths by report date. Throughout data collection, CDC retrospectively updated counts to correct known data quality issues.
Description This archived public use dataset focuses on the cumulative and weekly case and death rates per 100,000 persons within various sociodemographic factors across all states and their counties. All resulting data are expressed as rates calculated as the number of cases or deaths per 100,000 persons in counties meeting various classification criteria using the US Census Bureau Population Estimates Program (2019 Vintage).
Each county within jurisdictions is classified into multiple categories for each factor. All rates in this dataset are based on classification of counties by the characteristics of their population, not individual-level factors. This applies to each of the available factors observed in this dataset. Specific factors and their corresponding categories are detailed below.
Population-level factors Each unique population factor is detailed below. Please note that the “Classification” column describes each of the 12 factors in the dataset, including a data dictionary describing what each numeric digit means within each classification. The “Category” column uses numeric digits (2-6, depending on the factor) defined in the “Classification” column.
Metro vs. Non-Metro – “Metro_Rural” Metro vs. Non-Metro classification type is an aggregation of the 6 National Center for Health Statistics (NCHS) Urban-Rural classifications, where “Metro” counties include Large Central Metro, Large Fringe Metro, Medium Metro, and Small Metro areas and “Non-Metro” counties include Micropolitan and Non-Core (Rural) areas. 1 – Metro, including “Large Central Metro, Large Fringe Metro, Medium Metro, and Small Metro” areas 2 – Non-Metro, including “Micropolitan, and Non-Core” areas
Urban/rural - “NCHS_Class” Urban/rural classification type is based on the 2013 National Center for Health Statistics Urban-Rural Classification Scheme for Counties. Levels consist of:
1 Large Central Metro
2 Large Fringe Metro
3 Medium Metro
4 Small Metro
5 Micropolitan
6 Non-Core (Rural)
American Community Survey (ACS) data were used to classify counties based on their age, race/ethnicity, household size, poverty level, and health insurance status distributions. Cut points were generated by using tertiles and categorized as High, Moderate, and Low percentages. The classification “Percent non-Hispanic, Native Hawaiian/Pacific Islander” is only available for “Hawaii” due to low numbers in this category for other available locations. This limitation also applies to other race/ethnicity categories within certain jurisdictions, where 0 counties fall into the certain category. The cut points for each ACS category are further detailed below:
Age 65 - “Age65”
1 Low (0-24.4%) 2 Moderate (>24.4%-28.6%) 3 High (>28.6%)
Non-Hispanic, Asian - “NHAA”
1 Low (<=5.7%) 2 Moderate (>5.7%-17.4%) 3 High (>17.4%)
Non-Hispanic, American Indian/Alaskan Native - “NHIA”
1 Low (<=0.7%) 2 Moderate (>0.7%-30.1%) 3 High (>30.1%)
Non-Hispanic, Black - “NHBA”
1 Low (<=2.5%) 2 Moderate (>2.5%-37%) 3 High (>37%)
Hispanic - “HISP”
1 Low (<=18.3%) 2 Moderate (>18.3%-45.5%) 3 High (>45.5%)
Population in Poverty - “Pov”
1 Low (0-12.3%) 2 Moderate (>12.3%-17.3%) 3 High (>17.3%)
Population Uninsured- “Unins”
1 Low (0-7.1%) 2 Moderate (>7.1%-11.4%) 3 High (>11.4%)
Average Household Size - “HH”
1 Low (1-2.4) 2 Moderate (>2.4-2.6) 3 High (>2.6)
Community Vulnerability Index Value - “CCVI” COVID-19 Community Vulnerability Index (CCVI) scores are from Surgo Ventures, which range from 0 to 1, were generated based on tertiles and categorized as:
1 Low Vulnerability (0.0-0.4) 2 Moderate Vulnerability (0.4-0.6) 3 High Vulnerability (0.6-1.0)
Social Vulnerability Index Value – “SVI" Social Vulnerability Index (SVI) scores (vintage 2020), which also range from 0 to 1, are from CDC/ASTDR’s Geospatial Research, Analysis & Service Program. Cut points for CCVI and SVI scores were generated based on tertiles and categorized as:
1 Low Vulnerability (0-0.333) 2 Moderate Vulnerability (0.334-0.666) 3 High Vulnerability (0.667-1)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of Cases, Means, Standard Deviations, and Reliability by Climate Scale.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chile CL: Business Enterprise Researchers: Per Thousand Employment in Industry data was reported at 0.468 Per 1000 in 2020. This records an increase from the previous number of 0.416 Per 1000 for 2019. Chile CL: Business Enterprise Researchers: Per Thousand Employment in Industry data is updated yearly, averaging 0.361 Per 1000 from Dec 2007 (Median) to 2020, with 14 observations. The data reached an all-time high of 0.468 Per 1000 in 2020 and a record low of 0.206 Per 1000 in 2009. Chile CL: Business Enterprise Researchers: Per Thousand Employment in Industry data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s Chile – Table CL.OECD.MSTI: Number of Researchers and Personnel on Research and Development: OECD Member: Annual. For Chile, the method for reporting international observatories' R&D expenditure has been revised in 2016, leading to a break in series in the PNP and HE sectors. Prior to 2014, higher education data was obtained from the research departments of each institution (in a centralised way). Thereafter, it is obtained from the units directly (research centres of universities, scientific centres, etc). In 2013, some institutions, previously classified in the PNP sector, were included in the government sector. BERD funded by the business and the rest of the world sectors has also significantly increased as a result of better reporting in the R&D surveys starting with reference year 2013. From reference year 2009 in the business sector innovation and R&D surveys were separated and the survey sampling modified. Astronomical observatories are surveyed and included in the PNP sector from 2009; this may include some observatories operated by international organisations.;
Definition of MSTI variables 'Value Added of Industry' and 'Industrial Employment':
R&D data are typically expressed as a percentage of GDP to allow cross-country comparisons. When compiling such indicators for the business enterprise sector, one may wish to exclude, from GDP measures, economic activities for which the Business R&D (BERD) is null or negligible by definition. By doing so, the adjusted denominator (GDP, or Value Added, excluding non-relevant industries) better correspond to the numerator (BERD) with which it is compared to.
The MSTI variable 'Value added in industry' is used to this end:
It is calculated as the total Gross Value Added (GVA) excluding 'real estate activities' (ISIC rev.4 68) where the 'imputed rent of owner-occupied dwellings', specific to the framework of the System of National Accounts, represents a significant share of total GVA and has no R&D counterpart. Moreover, the R&D performed by the community, social and personal services is mainly driven by R&D performers other than businesses.
Consequently, the following service industries are also excluded: ISIC rev.4 84 to 88 and 97 to 98. GVA data are presented at basic prices except for the People's Republic of China, Japan and New Zealand (expressed at producers' prices).In the same way, some indicators on R&D personnel in the business sector are expressed as a percentage of industrial employment. The latter corresponds to total employment excluding ISIC rev.4 68, 84 to 88 and 97 to 98.
Field Name | Data Type | Description |
Statefp | Number | US Census Bureau unique identifier of the state |
Countyfp | Number | US Census Bureau unique identifier of the county |
Countynm | Text | County name |
Tractce | Number | US Census Bureau unique identifier of the census tract |
Geoid | Number | US Census Bureau unique identifier of the state + county + census tract |
Aland | Number | US Census Bureau defined land area of the census tract |
Awater | Number | US Census Bureau defined water area of the census tract |
Asqmi | Number | Area calculated in square miles from the Aland |
MSSAid | Text | ID of the Medical Service Study Area (MSSA) the census tract belongs to |
MSSAnm | Text | Name of the Medical Service Study Area (MSSA) the census tract belongs to |
Definition | Text | Type of MSSA, possible values are urban, rural and frontier. |
TotalPovPop | Number | US Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701 |