Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Data Description: This data set provides all public datasets, links, documents and community created filters hosted on the City of Cincinnati's Open Data Portal.
Data Creation: This data set is maintained by the City of Cincinnati's Open Data host, Socrata.
Data Created By: Socrata
Refresh Frequency: Daily
Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this data set.
Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).
Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset represents a collection of records for users' visits to a website, where certain variables related to these users are studied to determine whether they clicked on a particular ad or not. Here’s a detailed description of the data:
Daily Time Spent on Site: The number of minutes the user spends on the website daily.
Age: The age of the user in years.
Area Income: The average annual income of the area where the user resides, measured in U.S. dollars.
Daily Internet Usage: The number of minutes the user spends on the internet daily.
Ad Topic Line: The headline or main topic of the ad that was shown to the user.
City: The city where the user resides.
Male: An indicator of the user's gender, where 1 represents male and 0 represents female.
Country: The country where the user resides.
Timestamp: The date and time when this record was logged.
Clicked on Ad: An indicator of whether the user clicked on the ad, where 1 means the user clicked on the ad, and 0 means they did not.
In summary, this data is used to analyze users' behavior on the website based on a set of demographic and usage factors, with a focus on whether they clicked on a particular ad or not.
Facebook
TwitterThis dataset comes from the Annual Community Survey question related to satisfaction with the city website. Respondents are asked to provide their level of satisfaction related to the “Usefulness of the City's website” on a scale of 5 to 1, where 5 means "Very Satisfied" and 1 means "Very Dissatisfied" (without "don't know" as an option).The survey is mailed to a random sample of households in the City of Tempe and has a 95% confidence level.This page provides data for the City Website Quality Satisfaction performance measure. The performance measure dashboard is available at 2.04 City Website Satisfaction. Additional InformationSource: Community Attitude Survey ( Vendor: ETC Institute)Contact: Wydale HolmesContact E-Mail: Wydale_Holmes@tempe.govData Source Type: Excel and PDFPreparation Method: Extracted from Annual Community Survey resultsPublish Frequency: AnnualPublish Method: ManualData Dictionary
Facebook
Twitterhttps://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
List of web and marketing agencies in Martinique (972). ‘’ { “description” means: “Data descriptor for companies in Martinique and Guadeloupe”, “source”: “Data provided by the user”, “Fields”:[ { “name” means:“NAME”, ‘type’ means:“string”, “description” means: “The name of the company” }, { “name” means:“ADDRESS”, ‘type’ means: “string”, “description” means:“The main address of the company” }, { “name” means: “ADRESS2”, ‘type’ means: “string”, “description” means:“The secondary address of the undertaking, if applicable” }, { “name” means: “CPO”, ‘type’ means:“integer”, “description” means: “The company’s postal code” }, { “name” means: “CITY”, ‘type’ means:“string”, “description” means: The city where the company is located }, { “name” means:“PHONE1”, ‘type’ means: “string”, “description” means: “The company’s main telephone number” }, { “name” means:“PHONE2”, ‘type’ means: “string”, “description” means: “The company’s secondary telephone number” }, { “name” means:“EMAIL”, ‘type’ means:“string”, “description” means: “The company’s contact email address” }, { “name” means:“URL”, ‘type’ means: “string”, “description” means: “The URL of the company’s website” }, { “name” means: ‘TYPE’, ‘type’ means:“string”, “description” means: The type of business (Marketing, Web, etc.) } ], “format”:“JSON”, “Encoding”:“UTF-8” } ‘’ ‘’ { “description” means:“Data descriptor for companies in Martinique and Guadeloupe”, “source”: “Data provided by the user”, “Fields”: [ { “name” means: “NAME”, ‘type’ means:“string”, “description” means: “The name of the company” }, { “name” means: “ADDRESS”, ‘type’ means: “string”, “description” means:“The main address of the company” }, { “name” means:“ADRESS2”, ‘type’ means: “string”, “description” means: “The secondary address of the undertaking, if applicable” }, { “name” means:“CPO”, ‘type’ means: “integer”, “description” means:“The company’s postal code” }, { “name” means:“CITY”, ‘type’ means: “string”, “description” means:The city where the company is located }, { “name” means: “PHONE1”, ‘type’ means: “string”, “description” means:“The company’s main telephone number” }, { “name” means:“PHONE2”, ‘type’ means: “string”, “description” means: “The company’s secondary telephone number” }, { “name” means: “EMAIL”, ‘type’ means:“string”, “description” means: “The company’s contact email address” }, { “name” means: “URL”, ‘type’ means:“string”, “description” means: “The URL of the company’s website” }, { “name” means: ‘TYPE’, ‘type’ means: “string”, “description” means: The type of business (Marketing, Web, etc.) } ], “format”: “JSON”, “Encoding”:“UTF-8” } ‘’ ‘’ { “description” means: “Data descriptor for companies in Martinique and Guadeloupe”, “source”:“Data provided by the user”, “Fields”: [ { “name” means: “NAME”, ‘type’ means: “string”, “description” means:“The name of the company” }, { “name” means: “ADDRESS”, ‘type’ means: “string”, “description” means: “The main address of the company” }, { “name” means: “ADRESS2”, ‘type’ means:“string”, “description” means: “The secondary address of the undertaking, if applicable” }, { “name” means: “CPO”, ‘type’ means: “integer”, “description” means:“The company’s postal code” }, { “name” means: “CITY”, ‘type’ means:“string”, “description” means: The city where the company is located }, { “name” means:“PHONE1”, ‘type’ means: “string”, “description” means:“The company’s main telephone number” }, { “name” means: “PHONE2”, ‘type’ means: “string”,
“description” means: “The company’s secondary telephone number”
},
{
“name” means: “EMAIL”,
‘type’ means: “string”,
“description” means: “The company’s contact email address”
},
{
“name” means: “URL”,
‘type’ means: “string”,
“description” means: “The URL of the company’s website”
},
{
“name” means: ‘TYPE’,
‘type’ means: “string”,
“description” means: The type of business (Marketing, Web, etc.)
}
],
“format”: “JSON”,
“Encoding”: “UTF-8”
}
‘’
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset captures Kaggle machine learning competitions over time by project type, host-organization classification, and host-organization headquartered states. Data extraction and analysis were done by the Internet Association.
The following variables are included in the dataset:
start_date: Start date of the competition
end_date: End date of the competition
comp_org_conf: Host organization, company, or conference
primary_us_host: Primary host organization or company if the competition is sponsored by a conference or multiple hosts.
host_type: Private, nonprofit, or government
NAICS_code: 6 digit NAICS classification
NAICS: Definition of the 6 digit NAICS classification
hq_in_us: 1 - Yes, primary host is headquartered in US. 0 - No, host is not headquartered in US.
hq: Headquartered state of primary host
two_digit_definition: First 2 digit NAICS definition
three_digit_definition: First 3 digit NAICS definition
project_type: A classification of project based on project description
subtopic: Subtopic of the project type
project_title: Title of the competition
description: A brief description of the competition
prize: Prizes in US dollars
NAICS.link: link to NAICS code
Source: Internet Association. 2016. Machine Learning Awards. District of Columbia: Internet Association [producer]. Washington, DC: Internet Association. San Francisco, CA: Kaggle [distributor]. Web. 4 November 2016.
Facebook
TwitterColumn definitions for the flat file data set - Austin Finance Online eCheckbook - found on the data portal . The data contained in this dataset is for informational purposes only. Please visit the Austin Finance Online website for a searchable front end to this data set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset represents an inventory of research data services at 120 US colleges and universities. The data was collected using a systematic web content analysis process in late 2019. This dataset underlies the following report: Jane Radecki and Rebecca Springer, "Research Data Services in US Higher Education: A Web-Based Inventory," Ithaka S+R, Nov. 2020, https://doi.org/10.18665/sr.314397.
We defined research data services as any concrete, programmatic offering intended to support researchers (including faculty, postdoctoral researchers, and graduate students) in working with data, and identified services within the following campus units: library, IT department/research computing, independent research centers and facilities, academic departments, medical school, business school, and other professional schools. We also recorded whether the institution offered local high performance computing facilities. For detailed definitions, exclusions, and data collection procedures, please see the report referenced above.
Facebook
Twitterhttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-licensehttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-license
On October 15, 2013, Louisville Mayor Greg Fischer announced the signing of an open data policy executive order in conjunction with his compelling talk at the 2013 Code for America Summit. In nonchalant cadence, the mayor announced his support for complete information disclosure by declaring, "It's data, man."Sunlight Foundation - New Louisville Open Data Policy Insists Open By Default is the Future Open Data Annual ReportsSection 5.A. Within one year of the effective Data of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously (2014-16) it was led by the Director of IT.Full Executive OrderEXECUTIVE ORDER NO. 1, SERIES 2013AN EXECUTIVE ORDERCREATING AN OPEN DATA PLAN. WHEREAS, Metro Government is the catalyst for creating a world-class city that provides its citizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovation, and a high quality of life; andWHEREAS, it should be easy to do business with Metro Government. Online government interactions mean more convenient services for citizens and businesses and online government interactions improve the cost effectiveness and accuracy of government operations; andWHEREAS, an open government also makes certain that every aspect of the built environment also has reliable digital descriptions available to citizens and entrepreneurs for deep engagement mediated by smart devices; andWHEREAS, every citizen has the right to prompt, efficient service from Metro Government; andWHEREAS, the adoption of open standards improves transparency, access to public information and improved coordination and efficiencies among Departments and partner organizations across the public, nonprofit and private sectors; andWHEREAS, by publishing structured standardized data in machine readable formats the Louisville Metro Government seeks to encourage the local software community to develop software applications and tools to collect, organize, and share public record data in new and innovative ways; andWHEREAS, in commitment to the spirit of Open Government, Louisville Metro Government will consider public information to be open by default and will proactively publish data and data containing information, consistent with the Kentucky Open Meetings and Open Records Act; andNOW, THEREFORE, BE IT PROMULGATED BY EXECUTIVE ORDER OF THE HONORABLE GREG FISCHER, MAYOR OF LOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:Section 1. Definitions. As used in this Executive Order, the terms below shall have the following definitions:(A) “Open Data” means any public record as defined by the Kentucky Open Records Act, which could be made available online using Open Format data, as well as best practice Open Data structures and formats when possible. Open Data is not information that is treated exempt under KRS 61.878 by Metro Government.(B) “Open Data Report” is the annual report of the Open Data Management Team, which shall (i) summarize and comment on the state of Open Data availability in Metro Government Departments from the previous year; (ii) provide a plan for the next year to improve online public access to Open Data and maintain data quality. The Open Data Management Team shall present an initial Open Data Report to the Mayor within 180 days of this Executive Order.(C) “Open Format” is any widely accepted, nonproprietary, platform-independent, machine-readable method for formatting data, which permits automated processing of such data and is accessible to external search capabilities.(D) “Open Data Portal” means the Internet site established and maintained by or on behalf of Metro Government, located at portal.louisvilleky.gov/service/data or its successor website.(E) “Open Data Management Team” means a group consisting of representatives from each Department within Metro Government and chaired by the Chief Information Officer (CIO) that is responsible for coordinating implementation of an Open Data Policy and creating the Open Data Report.(F) “Department” means any Metro Government department, office, administrative unit, commission, board, advisory committee, or other division of Metro Government within the official jurisdiction of the executive branch.Section 2. Open Data Portal.(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by Metro Government(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an Open Format.Section 3. Open Data Management Team.(A) The Chief Information Officer (CIO) of Louisville Metro Government will work with the head of each Department to identify a Data Coordinator in each Department. Data Coordinators will serve as members of an Open Data Management Team facilitated by the CIO and Metro Technology Services. The Open Data Management Team will work to establish a robust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data management policy that will adopt prevailing Open Format standards for Open Data, and develop agreements with regional partners to publish and maintain Open Data that is open and freely available while respecting exemptions allowed by the Kentucky Open Records Act or other federal or state law.Section 4. Department Open Data Catalogue.(A) Each Department shall be responsible for creating an Open Data catalogue, which will include comprehensive inventories of information possessed and/or managed by the Department.(B) Each Department’s Open Data catalogue will classify information holdings as currently “public” or “not yet public”; Departments will work with Metro Technology Services to develop strategies and timelines for publishing open data containing information in a way that is complete, reliable, and has a high level of detail.Section 5. Open Data Report and Policy Review.(A) Within one year of the effective date of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.(B) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy should be reviewed and considered for revisions or additions that will continue to position Metro Government as a leader on issues of openness, efficiency, and technical best practices.Section 6. This Executive Order shall take effect as of October 11, 2013.Signed this 11th day of October, 2013, by Greg Fischer, Mayor of Louisville/Jefferson County Metro Government.GREG FISCHER, MAYOR
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.
Facebook
TwitterOn October 15, 2013, Louisville Mayor Greg Fischer announced the signing of an open data policy executive order in conjunction with his compelling talk at the 2013 Code for America Summit. In nonchalant cadence, the mayor announced his support for complete information disclosure by declaring, "It's data, man."Sunlight Foundation - New Louisville Open Data Policy Insists Open By Default is the Future Open Data Annual ReportsSection 5.A. Within one year of the effective Data of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously (2014-16) it was led by the Director of IT.Full Executive OrderEXECUTIVE ORDER NO. 1, SERIES 2013AN EXECUTIVE ORDERCREATING AN OPEN DATA PLAN. WHEREAS, Metro Government is the catalyst for creating a world-class city that provides its citizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovation, and a high quality of life; andWHEREAS, it should be easy to do business with Metro Government. Online government interactions mean more convenient services for citizens and businesses and online government interactions improve the cost effectiveness and accuracy of government operations; andWHEREAS, an open government also makes certain that every aspect of the built environment also has reliable digital descriptions available to citizens and entrepreneurs for deep engagement mediated by smart devices; andWHEREAS, every citizen has the right to prompt, efficient service from Metro Government; andWHEREAS, the adoption of open standards improves transparency, access to public information and improved coordination and efficiencies among Departments and partner organizations across the public, nonprofit and private sectors; andWHEREAS, by publishing structured standardized data in machine readable formats the Louisville Metro Government seeks to encourage the local software community to develop software applications and tools to collect, organize, and share public record data in new and innovative ways; andWHEREAS, in commitment to the spirit of Open Government, Louisville Metro Government will consider public information to be open by default and will proactively publish data and data containing information, consistent with the Kentucky Open Meetings and Open Records Act; andNOW, THEREFORE, BE IT PROMULGATED BY EXECUTIVE ORDER OF THE HONORABLE GREG FISCHER, MAYOR OF LOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:Section 1. Definitions. As used in this Executive Order, the terms below shall have the following definitions:(A) “Open Data” means any public record as defined by the Kentucky Open Records Act, which could be made available online using Open Format data, as well as best practice Open Data structures and formats when possible. Open Data is not information that is treated exempt under KRS 61.878 by Metro Government.(B) “Open Data Report” is the annual report of the Open Data Management Team, which shall (i) summarize and comment on the state of Open Data availability in Metro Government Departments from the previous year; (ii) provide a plan for the next year to improve online public access to Open Data and maintain data quality. The Open Data Management Team shall present an initial Open Data Report to the Mayor within 180 days of this Executive Order.(C) “Open Format” is any widely accepted, nonproprietary, platform-independent, machine-readable method for formatting data, which permits automated processing of such data and is accessible to external search capabilities.(D) “Open Data Portal” means the Internet site established and maintained by or on behalf of Metro Government, located at portal.louisvilleky.gov/service/data or its successor website.(E) “Open Data Management Team” means a group consisting of representatives from each Department within Metro Government and chaired by the Chief Information Officer (CIO) that is responsible for coordinating implementation of an Open Data Policy and creating the Open Data Report.(F) “Department” means any Metro Government department, office, administrative unit, commission, board, advisory committee, or other division of Metro Government within the official jurisdiction of the executive branch.Section 2. Open Data Portal.(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by Metro Government(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an Open Format.Section 3. Open Data Management Team.(A) The Chief Information Officer (CIO) of Louisville Metro Government will work with the head of each Department to identify a Data Coordinator in each Department. Data Coordinators will serve as members of an Open Data Management Team facilitated by the CIO and Metro Technology Services. The Open Data Management Team will work to establish a robust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data management policy that will adopt prevailing Open Format standards for Open Data, and develop agreements with regional partners to publish and maintain Open Data that is open and freely available while respecting exemptions allowed by the Kentucky Open Records Act or other federal or state law.Section 4. Department Open Data Catalogue.(A) Each Department shall be responsible for creating an Open Data catalogue, which will include comprehensive inventories of information possessed and/or managed by the Department.(B) Each Department’s Open Data catalogue will classify information holdings as currently “public” or “not yet public”; Departments will work with Metro Technology Services to develop strategies and timelines for publishing open data containing information in a way that is complete, reliable, and has a high level of detail.Section 5. Open Data Report and Policy Review.(A) Within one year of the effective date of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.(B) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy should be reviewed and considered for revisions or additions that will continue to position Metro Government as a leader on issues of openness, efficiency, and technical best practices.Section 6. This Executive Order shall take effect as of October 11, 2013.Signed this 11th day of October, 2013, by Greg Fischer, Mayor of Louisville/Jefferson County Metro Government.GREG FISCHER, MAYOR
Facebook
TwitterOn August 25th, 2022, Metro Council Passed Open Data Ordinance; previously open data reports were published on Mayor Fischer's Executive Order, You can find here both the Open Data Ordinance, 2022 (PDF) and the Mayor's Open Data Executive Order, 2013 Open Data Annual ReportsPage 6 of the Open Data Ordinance, Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously, it was led by the former Data Officer, Michael Schnuerle and prior to that by Director of IT.Open Data Ordinance O-243-22 TextLouisville Metro GovernmentLegislation TextFile #: O-243-22, Version: 3ORDINANCE NO._, SERIES 2022AN ORDINANCE CREATING A NEW CHAPTER OF THE LOUISVILLE/JEFFERSONCOUNTY METRO CODE OF ORDINANCES CREATING AN OPEN DATA POLICYAND REVIEW. (AMENDMENT BY SUBSTITUTION)(AS AMENDED).SPONSORED BY: COUNCIL MEMBERS ARTHUR, WINKLER, CHAMBERS ARMSTRONG,PIAGENTINI, DORSEY, AND PRESIDENT JAMESWHEREAS, Metro Government is the catalyst for creating a world-class city that provides itscitizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovationand a high quality of life;WHEREAS, it should be easy to do business with Metro Government. Online governmentinteractions mean more convenient services for citizens and businesses and online governmentinteractions improve the cost effectiveness and accuracy of government operations;WHEREAS, an open government also makes certain that every aspect of the builtenvironment also has reliable digital descriptions available to citizens and entrepreneurs for deepengagement mediated by smart devices;WHEREAS, every citizen has the right to prompt, efficient service from Metro Government;WHEREAS, the adoption of open standards improves transparency, access to publicinformation and improved coordination and efficiencies among Departments and partnerorganizations across the public, non-profit and private sectors;WHEREAS, by publishing structured standardized data in machine readable formats, MetroGovernment seeks to encourage the local technology community to develop software applicationsand tools to display, organize, analyze, and share public record data in new and innovative ways;WHEREAS, Metro Government’s ability to review data and datasets will facilitate a betterUnderstanding of the obstacles the city faces with regard to equity;WHEREAS, Metro Government’s understanding of inequities, through data and datasets, willassist in creating better policies to tackle inequities in the city;WHEREAS, through this Ordinance, Metro Government desires to maintain its continuousimprovement in open data and transparency that it initiated via Mayoral Executive Order No. 1,Series 2013;WHEREAS, Metro Government’s open data work has repeatedly been recognized asevidenced by its achieving What Works Cities Silver (2018), Gold (2019), and Platinum (2020)certifications. What Works Cities recognizes and celebrates local governments for their exceptionaluse of data to inform policy and funding decisions, improve services, create operational efficiencies,and engage residents. The Certification program assesses cities on their data-driven decisionmakingpractices, such as whether they are using data to set goals and track progress, allocatefunding, evaluate the effectiveness of programs, and achieve desired outcomes. These datainformedstrategies enable Certified Cities to be more resilient, respond in crisis situations, increaseeconomic mobility, protect public health, and increase resident satisfaction; andWHEREAS, in commitment to the spirit of Open Government, Metro Government will considerpublic information to be open by default and will proactively publish data and data containinginformation, consistent with the Kentucky Open Meetings and Open Records Act.NOW, THEREFORE, BE IT ORDAINED BY THE COUNCIL OF THELOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:SECTION I: A new chapter of the Louisville Metro Code of Ordinances (“LMCO”) mandatingan Open Data Policy and review process is hereby created as follows:§ XXX.01 DEFINITIONS. For the purpose of this Chapter, the following definitions shall apply unlessthe context clearly indicates or requires a different meaning.OPEN DATA. Any public record as defined by the Kentucky Open Records Act, which could bemade available online using Open Format data, as well as best practice Open Data structures andformats when possible, that is not Protected Information or Sensitive Information, with no legalrestrictions on use or reuse. Open Data is not information that is treated as exempt under KRS61.878 by Metro Government.OPEN DATA REPORT. The annual report of the Open Data Management Team, which shall (i)summarize and comment on the state of Open Data availability in Metro Government Departmentsfrom the previous year, including, but not limited to, the progress toward achieving the goals of MetroGovernment’s Open Data portal, an assessment of the current scope of compliance, a list of datasetscurrently available on the Open Data portal and a description and publication timeline for datasetsenvisioned to be published on the portal in the following year; and (ii) provide a plan for the next yearto improve online public access to Open Data and maintain data quality.OPEN DATA MANAGEMENT TEAM. A group consisting of representatives from each Departmentwithin Metro Government and chaired by the Data Officer who is responsible for coordinatingimplementation of an Open Data Policy and creating the Open Data Report.DATA COORDINATORS. The members of an Open Data Management Team facilitated by theData Officer and the Office of Civic Innovation and Technology.DEPARTMENT. Any Metro Government department, office, administrative unit, commission, board,advisory committee, or other division of Metro Government.DATA OFFICER. The staff person designated by the city to coordinate and implement the city’sopen data program and policy.DATA. The statistical, factual, quantitative or qualitative information that is maintained or created byor on behalf of Metro Government.DATASET. A named collection of related records, with the collection containing data organized orformatted in a specific or prescribed way.METADATA. Contextual information that makes the Open Data easier to understand and use.OPEN DATA PORTAL. The internet site established and maintained by or on behalf of MetroGovernment located at https://data.louisvilleky.gov/ or its successor website.OPEN FORMAT. Any widely accepted, nonproprietary, searchable, platform-independent, machinereadablemethod for formatting data which permits automated processes.PROTECTED INFORMATION. Any Dataset or portion thereof to which the Department may denyaccess pursuant to any law, rule or regulation.SENSITIVE INFORMATION. Any Data which, if published on the Open Data Portal, could raiseprivacy, confidentiality or security concerns or have the potential to jeopardize public health, safety orwelfare to an extent that is greater than the potential public benefit of publishing that data.§ XXX.02 OPEN DATA PORTAL(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by MetroGovernment.(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an OpenFormat.(C) In the event a successor website is used, the Data Officer shall notify the Metro Council andshall provide notice to the public on the main city website.§ XXX.03 OPEN DATA MANAGEMENT TEAM(A) The Data Officer of Metro Government will work with the head of each Department to identify aData Coordinator in each Department. The Open Data Management Team will work to establish arobust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data Policy that will adopt prevailingOpen Format standards for Open Data and develop agreements with regional partners to publish andmaintain Open Data that is open and freely available while respecting exemptions allowed by theKentucky Open Records Act or other federal or state law.§ XXX.04 DEPARTMENT OPEN DATA CATALOGUE(A) Each Department shall retain ownership over the Datasets they submit to the Open DataPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityof the Dataset contents, including updating its Data and associated Metadata.(B) Each Department shall be responsible for creating an Open Data catalogue which shall includecomprehensive inventories of information possessed and/or managed by the Department.(C) Each Department’s Open Data catalogue will classify information holdings as currently “public”or “not yet public;” Departments will work with the Office of Civic Innovation and Technology todevelop strategies and timelines for publishing Open Data containing information in a way that iscomplete, reliable and has a high level of detail.§ XXX.05 OPEN DATA REPORT AND POLICY REVIEW(A) Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council anannual Open Data Report.(B) Metro Council may request a specific Department to report on any data or dataset that may bebeneficial or pertinent in implementing policy and legislation.(C) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy shouldshall be reviewed annually and considered for revisions or additions that will continue to positionMetro Government as a leader on issues of
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Wikipedia SQLite Portable DB" is a compact and efficient database derived from the Kensho Derived Wikimedia Dataset (KDWD). This dataset provides a condensed subset of raw Wikimedia data in a format optimized for natural language processing (NLP) research and applications.
I am not affiliated or partnered with the Kensho in any way, just really like the dataset for giving my agents to query easily.
Key Features:
Contains over 5 million rows of data from English Wikipedia and Wikidata Stored in a portable SQLite database format for easy integration and querying Includes a link-annotated corpus of English Wikipedia pages and a compact sample of the Wikidata knowledge base Ideal for NLP tasks, machine learning, data analysis, and research projects
The database consists of four main tables:
This dataset is derived from the Kensho Derived Wikimedia Dataset (KDWD), which is built from the English Wikipedia snapshot from December 1, 2019, and the Wikidata snapshot from December 2, 2019. The KDWD is a condensed subset of the raw Wikimedia data in a form that is helpful for NLP work, and it is released under the CC BY-SA 3.0 license. Credits: The "Wikipedia SQLite Portable DB" is derived from the Kensho Derived Wikimedia Dataset (KDWD), created by the Kensho R&D group. The KDWD is based on data from Wikipedia and Wikidata, which are crowd-sourced projects supported by the Wikimedia Foundation. We would like to acknowledge and thank the Kensho R&D group for their efforts in creating the KDWD and making it available for research and development purposes. By providing this portable SQLite database, we aim to make Wikipedia data more accessible and easier to use for researchers, data scientists, and developers working on NLP tasks, machine learning projects, and other data-driven applications. We hope that this dataset will contribute to the advancement of NLP research and the development of innovative applications utilizing Wikipedia data.
https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data/data
Tags: encyclopedia, wikipedia, sqlite, database, reference, knowledge-base, articles, information-retrieval, natural-language-processing, nlp, text-data, large-dataset, multi-table, data-science, machine-learning, research, data-analysis, data-mining, content-analysis, information-extraction, text-mining, text-classification, topic-modeling, language-modeling, question-answering, fact-checking, entity-recognition, named-entity-recognition, link-prediction, graph-analysis, network-analysis, knowledge-graph, ontology, semantic-web, structured-data, unstructured-data, data-integration, data-processing, data-cleaning, data-wrangling, data-visualization, exploratory-data-analysis, eda, corpus, document-collection, open-source, crowdsourced, collaborative, online-encyclopedia, web-data, hyperlinks, categories, page-views, page-links, embeddings
Usage with LIKE queries: ``` import aiosqlite import asyncio
class KenshoDatasetQuery: def init(self, db_file): self.db_file = db_file
async def _aenter_(self):
self.conn = await aiosqlite.connect(self.db_file)
return self
async def _aexit_(self, exc_type, exc_val, exc_tb):
await self.conn.close()
async def search_pages_by_title(self, title):
query = """
SELECT pages.page_id, pages.item_id, pages.title, pages.views,
items.labels AS item_labels, items.description AS item_description,
link_annotated_text.sections
FROM pages
JOIN items ON pages.item_id = items.id
JOIN link_annotated_text ON pages.page_id = link_annotated_text.page_id
WHERE pages.title LIKE ?
"""
async with self.conn.execute(query, (f"%{title}%",)) as cursor:
return await cursor.fetchall()
async def search_items_by_label_or_description(self, keyword):
query = """
SELECT id, labels, description
FROM items
WHERE labels LIKE ? OR description LIKE ?
"""
async with self.conn.execute(query, (f"%{keyword}%", f"%{keyword}%")) as cursor:
return await cursor.fetchall()
async def search_items_by_label(self, label):
query = """
SELECT id, labels, description
FROM items
WHERE labels LIKE ?
"""
async with self.conn.execute(query, (f"%{label}%",)) as cursor:
return await cursor.fetchall()
async def search_properties_by_label_or_desc...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.
The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.
Methodology
To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).
These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.
To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.
Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.
Description of the data in this data set
Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies
The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information
Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet
2) Complete reference - the complete source information to refer to the study
3) Year of publication - the year in which the study was published
4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter}
5) DOI / Website- a link to the website where the study can be found
6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science
7) Availability in OA - availability of an article in the Open Access
8) Keywords - keywords of the paper as indicated by the authors
9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}
Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?
Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)?
18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))
HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term?
20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output")
21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description)
22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles?
23) Data - what data do HVD cover?
24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)
Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx
Licenses or restrictions CC-BY
For more info, see README.txt
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Data Description: This data set provides all usage information for all web pages on the City of Cincinnati's interactive dashboard portal, CincyInsights.
Data Creation: This data set is maintained by the City of Cincinnati's Open Data host, Socrata.
Data Created By: Socrata
Refresh Frequency: Daily
Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this data set.
Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).
Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
Facebook
TwitterEurostat data contains many indicators (short-term, structural, theme-specific and others) on the EU-28 and the Eurozone, the Member States and their partners. The database of Eurostat contains always the latest version of the datasets meaning that there is no versioning on the data. Datasets are updated twice a day, at 11:00 and at 23:00, in case new data is available or because of structural change. It is possible to access the datasets through SDMX Web Services, as well as through Json and Unicode Web Services. SDMX Web Services are a programmatic access to Eurostat data, with the possibility to: get a complete list of publicly available datasets; detail the complete structure definition of a given dataset; download a subset of a given dataset or a full dataset. SDMX Web Services: provide access to datasets listed under database by themes, and predefined tables listed under tables by themes; provide data in SDMX 2.0 and 2.1 formats; support both Representation State Transfer (REST) and Simple Object Access Protocol (SOAP) protocols; return responses in English language only; are free of charge. The JSON & UNICODE Web Services are a programmatic access to Eurostat data, with the possibility to download a subset of a given dataset. This operation allows customizing requests for data. You can filter on dimensions to retrieve specific data subsets. The JSON & UNICODE Web Services: provide data in JSON-stat and UNICODE formats; support only Representation State Transfer (REST) protocol; deliver responses in English, French and German language; are free of charge.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
17 tables and two figures of this paper. Table 1 is a subset of explicit entries identified in NHANES demographics data. Table 2 is a subset of implicit entries identified in NHANES demographics data. Table 3 is a subset of NHANES demographic Codebook entries. Table 4 presents a subset of explicit entries identified in SEER. Table 5 is a subset of Dictionary Mapping for the MIMIC-III Admission table. Table 6 shows high-level comparison of semantic data dictionaries, traditional data dictionaries, approaches involving mapping languages, and general data integration tools. Table A1 shows namespace prefixes and IRIs for relevant ontologies. Table B1 shows infosheet specification. Table B2 shows infosheet metadata supplement. Table B3 shows dictionary mapping specification. Table B4 is a codebook specification. Table B5 is a timeline specification. Table B6 is properties specification. Table C1 shows NHANES demographics infosheet. Table C2 shows NHANES demographic implicit entries. Table C3 shows NHANES demographic explicit entries. Table C4 presents expanded NHANES demographic Codebook entries. Figure 1 is a conceptual diagram of the Dictionary Mapping that allows for a representation model that aligns with existing scientific ontologies. The Dictionary Mapping is used to create a semantic representation of data columns. Each box, along with the “Relation” label, corresponds to a column in the Dictionary Mapping table. Blue rounded boxes correspond to columns that contain resource URIs, while white boxes refer to entities that are generated on a per-row/column basis. The actual cell value in concrete columns is, if there is no Codebook for the column, mapped to the “has value” object of the column object, which is generally either an attribute or an entity. Figure 2 presents (a) A conceptual diagram of the Codebook, which can be used to assign ontology classes to categorical concepts. Unlike other mapping approaches, the use of the Codebook allows for the annotation of cell values, rather than just columns. (b) A conceptual diagram of the Timeline, which can be used to represent complex time associated concepts, such as time intervals.
Facebook
TwitterOn October 15, 2013, Louisville Mayor Greg Fischer announced the signing of an open data policy executive order in conjunction with his compelling talk at the 2013 Code for America Summit. In nonchalant cadence, the mayor announced his support for complete information disclosure by declaring, "It's data, man."Sunlight Foundation - New Louisville Open Data Policy Insists Open By Default is the Future Open Data Annual ReportsSection 5.A. Within one year of the effective Data of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously (2014-16) it was led by the Director of IT.Full Executive OrderEXECUTIVE ORDER NO. 1, SERIES 2013AN EXECUTIVE ORDERCREATING AN OPEN DATA PLAN. WHEREAS, Metro Government is the catalyst for creating a world-class city that provides its citizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovation, and a high quality of life; andWHEREAS, it should be easy to do business with Metro Government. Online government interactions mean more convenient services for citizens and businesses and online government interactions improve the cost effectiveness and accuracy of government operations; andWHEREAS, an open government also makes certain that every aspect of the built environment also has reliable digital descriptions available to citizens and entrepreneurs for deep engagement mediated by smart devices; andWHEREAS, every citizen has the right to prompt, efficient service from Metro Government; andWHEREAS, the adoption of open standards improves transparency, access to public information and improved coordination and efficiencies among Departments and partner organizations across the public, nonprofit and private sectors; andWHEREAS, by publishing structured standardized data in machine readable formats the Louisville Metro Government seeks to encourage the local software community to develop software applications and tools to collect, organize, and share public record data in new and innovative ways; andWHEREAS, in commitment to the spirit of Open Government, Louisville Metro Government will consider public information to be open by default and will proactively publish data and data containing information, consistent with the Kentucky Open Meetings and Open Records Act; andNOW, THEREFORE, BE IT PROMULGATED BY EXECUTIVE ORDER OF THE HONORABLE GREG FISCHER, MAYOR OF LOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:Section 1. Definitions. As used in this Executive Order, the terms below shall have the following definitions:(A) “Open Data” means any public record as defined by the Kentucky Open Records Act, which could be made available online using Open Format data, as well as best practice Open Data structures and formats when possible. Open Data is not information that is treated exempt under KRS 61.878 by Metro Government.(B) “Open Data Report” is the annual report of the Open Data Management Team, which shall (i) summarize and comment on the state of Open Data availability in Metro Government Departments from the previous year; (ii) provide a plan for the next year to improve online public access to Open Data and maintain data quality. The Open Data Management Team shall present an initial Open Data Report to the Mayor within 180 days of this Executive Order.(C) “Open Format” is any widely accepted, nonproprietary, platform-independent, machine-readable method for formatting data, which permits automated processing of such data and is accessible to external search capabilities.(D) “Open Data Portal” means the Internet site established and maintained by or on behalf of Metro Government, located at portal.louisvilleky.gov/service/data or its successor website.(E) “Open Data Management Team” means a group consisting of representatives from each Department within Metro Government and chaired by the Chief Information Officer (CIO) that is responsible for coordinating implementation of an Open Data Policy and creating the Open Data Report.(F) “Department” means any Metro Government department, office, administrative unit, commission, board, advisory committee, or other division of Metro Government within the official jurisdiction of the executive branch.Section 2. Open Data Portal.(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by Metro Government(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an Open Format.Section 3. Open Data Management Team.(A) The Chief Information Officer (CIO) of Louisville Metro Government will work with the head of each Department to identify a Data Coordinator in each Department. Data Coordinators will serve as members of an Open Data Management Team facilitated by the CIO and Metro Technology Services. The Open Data Management Team will work to establish a robust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data management policy that will adopt prevailing Open Format standards for Open Data, and develop agreements with regional partners to publish and maintain Open Data that is open and freely available while respecting exemptions allowed by the Kentucky Open Records Act or other federal or state law.Section 4. Department Open Data Catalogue.(A) Each Department shall be responsible for creating an Open Data catalogue, which will include comprehensive inventories of information possessed and/or managed by the Department.(B) Each Department’s Open Data catalogue will classify information holdings as currently “public” or “not yet public”; Departments will work with Metro Technology Services to develop strategies and timelines for publishing open data containing information in a way that is complete, reliable, and has a high level of detail.Section 5. Open Data Report and Policy Review.(A) Within one year of the effective date of this Executive Order, and thereafter no later than September 1 of each year, the Open Data Management Team shall submit to the Mayor an annual Open Data Report.(B) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy should be reviewed and considered for revisions or additions that will continue to position Metro Government as a leader on issues of openness, efficiency, and technical best practices.Section 6. This Executive Order shall take effect as of October 11, 2013.Signed this 11th day of October, 2013, by Greg Fischer, Mayor of Louisville/Jefferson County Metro Government.GREG FISCHER, MAYOR
Facebook
TwitterThis file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.
The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.
This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.