73 datasets found

Data Reference Standard on Person(s): gender
open.canada.ca
gimi9.com
csv
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treasury Board of Canada Secretariat (2025). Data Reference Standard on Person(s): gender [Dataset]. https://open.canada.ca/data/dataset/21ffae40-8e4b-4082-a4f6-3c67f400e126
Explore at:
csvAvailable download formats
Dataset updated
Mar 3, 2025
Dataset provided by
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
This data reference standard provides a standard list of values to categorize data on person(s). The list reflects the classifications of gender and is designed to provide common data variables of the reported gender of person(s) or an individual’s personal and social gender identity. This data reference standard is to be read in conjunction with the Policy Direction to Modernize the Government of Canada’s Sex and Gender Information Practices and the Disaggregated Data Action Plan. This list of values is intended to standardize the way gender classifications are described in datasets to enable data interoperability and improve data quality. The appendix lists a data reference table that includes one-digit codes for designating gender. Not included in this data reference standard is an additional two-digit code for further classification. This data reference standard will be reviewed as required by the data reference steward in consultation with the data reference standard custodian. For support or advice on the measurement of “gender of person” or related data variables, contact statcan.csds-cnsd.statcan@statcan.gc.ca
Employment income statistics by industry subsectors, class of worker...
ouvert.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Nov 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Employment income statistics by industry subsectors, class of worker including job permanency, work activity during the reference year, age and gender: Canada, provinces and territories, census metropolitan areas and census agglomerations with parts [Dataset]. https://ouvert.canada.ca/data/dataset/e332402e-e07a-4d69-94df-ea765d4acbc7
Explore at:
csv, xml, htmlAvailable download formats
Dataset updated
Nov 15, 2023
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
Data on employment income statistics by industry subsectors (3-digit code) from the North American Industry Classification System (NAICS) 2017, class of worker including job permanency, work activity during the reference year, age and gender, for the population aged 15 years and over who reported weeks worked and employment income in 2020 in private households in Canada, provinces and territories, census metropolitan areas and census agglomerations with parts.
2021 Census - Reference maps
ouvert.canada.ca
catalogue.arctic-sdi.org
+1more
pdf
Updated Apr 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2022). 2021 Census - Reference maps [Dataset]. https://ouvert.canada.ca/data/dataset/d8b89e72-dd02-40b2-a74d-f0235635314e
Explore at:
pdfAvailable download formats
Dataset updated
Apr 13, 2022
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Reference maps illustrate the location of census standard geographic areas for which census statistical data are tabulated and disseminated. The maps display the boundaries, names and unique identifiers of standard geographic areas, as well as physical features such as streets, railroads, coastlines, rivers and lakes. Reference maps include: Standard Geographical Classification (SGC) Census tracts Federal electoral districts
Employment income statistics by occupation minor group, Indigenous identity,...
www150.statcan.gc.ca
open.canada.ca
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2023). Employment income statistics by occupation minor group, Indigenous identity, highest level of education, work activity during the reference year, age and gender: Canada, provinces and territories and census metropolitan areas with parts [Dataset]. http://doi.org/10.25318/9810058701-eng
Explore at:
Unique identifier
https://doi.org/10.25318/9810058701-eng
Dataset updated
Jun 21, 2023
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Government of Canadahttp://www.gg.ca/
Area covered
Canada
Description
Data on employment income statistics by occupation minor group (4-digit code) from the National Occupational Classification (NOC) 2021, Indigenous identity, highest level of education, work activity during the reference year, age and gender, for the population aged 15 years and over who reported weeks worked and employment income in 2020, in private households in Canada, provinces and territories and census metropolitan areas with parts.
Market Basket Measure (MBM) thresholds for the reference family by Market...
www150.statcan.gc.ca
open.canada.ca
+1more
Updated May 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Market Basket Measure (MBM) thresholds for the reference family by Market Basket Measure region, component and base year [Dataset]. http://doi.org/10.25318/1110006601-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1110006601-eng
Dataset updated
May 1, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Market Basket Measure (MBM) thresholds for the reference family by MBM region and base year. Total thresholds as well as thresholds for the food, clothing, transportation, shelter and other expenses components are presented, in current and constant dollars, annual.
Percentage of total energy intake from carbohydrates, by dietary reference...
www150.statcan.gc.ca
beta.data.urbandatacentre.ca
+3more
Updated Jun 20, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2017). Percentage of total energy intake from carbohydrates, by dietary reference intake age-sex group, household population aged 1 and over, Canadian Community Health Survey (CCHS) - Nutrition, Canada and provinces [Dataset]. http://doi.org/10.25318/1310077001-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310077001-eng
Dataset updated
Jun 20, 2017
Dataset provided by
Government of Canadahttp://www.gg.ca/
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Mean of percentage of total energy intake from carbohydrates, by dietary age-sex reference intake group, for 2004 and 2015.
Historic US Census - 1940
redivis.com
application/jsonl +7
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Historic US Census - 1940 [Dataset]. http://doi.org/10.57761/660g-eq95
Explore at:
avro, arrow, sas, application/jsonl, spss, parquet, stata, csvAvailable download formats
Unique identifier
https://doi.org/10.57761/660g-eq95
Dataset updated
Jan 10, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 1940 - Dec 31, 1940
Area covered
United States
Description
Abstract

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Notes

We provide IPUMS household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT40, reconstructed using the variable SERIAL40, and the original count is found in the variable NUMPREC40.

Some variables are missing from this data set for specific enumeration districts. The enumeration districts with missing data can be identified using the variable EDMISS. These variables will be added in a future release.

Coded variables derived from string variables are still in progress. These variables include: occupation, industry and migration status.

Missing observations have been allocated and some inconsistencies have been edited for the following variables: Missing observations have been allocated and some inconsistencies have been edited for the following variables: SURSIM, SEX, SCHOOL, RELATE, RACE, OCC1950, MTONGUE, MBPL, FBPL, BPL, MARST, EMPSTAT, CITIZEN, OWNERSHP. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

Most inconsistent information was not edited for this release, thus there are observations outside of the universe for many variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next r
Data reference standard on Canadian Provinces and Territories
ouvert.canada.ca
open.canada.ca
csv, json
Updated Mar 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treasury Board of Canada Secretariat (2025). Data reference standard on Canadian Provinces and Territories [Dataset]. https://ouvert.canada.ca/data/dataset/cd8fad92-b276-4250-972f-2d6c40ca04fa
Explore at:
json, csvAvailable download formats
Dataset updated
Mar 3, 2025
Dataset provided by
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
Introduction This reference data provides a standard list of values for all Canadian provinces and territories. The list reflects Canada’s 13 major political units. There are many coding systems for Canadian provinces and territories. The data standard shows the relationships among the recommended code and other common codes. Purpose This list is intended to standardize the way Canadian provinces and territories are described in datasets to enable data interoperability and improve data quality. Not included in this standard are previous names, abbreviations and codes for provinces and territories. When changes occur in the future, version history will be maintained. Applicability Use of the codes within the “Alpha Code” column is recommended when sharing data within the federal government or publishing data to the Open Government Portal. This alpha code was chosen for three reasons: 1. it is comprehensible for users 2. it is closely aligned with the ISO 3166-2 code for subdivision and is identical to the Canada Post abbreviation 3. it has already been adopted by a number of federal departments The Alpha Code exactly matches the set of codes created and managed by Canada Post. If Canada Post changes its codes, the Government of Canada will review and separately approve any changes to this reference standard. If it is necessary to use a numerical code in a data system, then the numerical code created by Statistics Canada is included in the table. Roles and responsibilities Data Standard Stewards Statistics Canada Statistical Geomatics Centre, Analytical Studies, Methodology and Statistical Infrastructure Field Natural Resources Canada Geographical Names Board of Canada Secretariat Data Standard Custodian Treasury Board of Canada Secretariat Office of the Chief Information Officer, Data and Digital Policy Sector Recommended Review Period The reference data standard will be reviewed as required. The expected frequency of change is low.
Lakes and Rivers (polygons), Boundary files - 2016 Census
open.canada.ca
gml, html, shp
Updated Feb 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2022). Lakes and Rivers (polygons), Boundary files - 2016 Census [Dataset]. https://open.canada.ca/data/en/dataset/d0cdef71-9343-46c3-b2e7-c1ded5907686
Explore at:
shp, gml, htmlAvailable download formats
Dataset updated
Feb 23, 2022
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
There are two types of boundary files: cartographic and digital. Cartographic boundary files portray the geographic areas using only the major land mass of Canada and its coastal islands. Digital boundary files portray the full extent of the geographic areas, including the coastal water area.
u
Data Reference Standard on Person(s): gender - Catalogue - Canadian Urban...
data.urbandatacentre.ca
beta.data.urbandatacentre.ca
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Data Reference Standard on Person(s): gender - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-21ffae40-8e4b-4082-a4f6-3c67f400e126
Explore at:
Dataset updated
Oct 1, 2024
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
This data reference standard provides a standard list of values to categorize data on person(s). The list reflects the classifications of gender and is designed to provide common data variables of the reported gender of person(s) or an individual’s personal and social gender identity. This data reference standard is to be read in conjunction with the Policy Direction to Modernize the Government of Canada’s Sex and Gender Information Practices and the Disaggregated Data Action Plan. This list of values is intended to standardize the way gender classifications are described in datasets to enable data interoperability and improve data quality. The appendix lists a data reference table that includes one-digit codes for designating gender. Not included in this data reference standard is an additional two-digit code for further classification. This data reference standard will be reviewed as required by the data reference steward in consultation with the data reference standard custodian. For support or advice on the measurement of “gender of person” or related data variables, contact statcan.csds-cnsd.statcan@statcan.gc.ca
u
Data Reference Standard on Person(s): sex assigned at birth - Catalogue -...
data.urbandatacentre.ca
beta.data.urbandatacentre.ca
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Data Reference Standard on Person(s): sex assigned at birth - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-9c36431e-d916-498d-9779-0c52ce840293
Explore at:
Dataset updated
Oct 1, 2024
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
This data reference standard is part of a suite that provides a standard list of values to categorize data on person(s). The list reflects the classifications of sex assigned at birth and is designed to provide common data variables of the reported sex at birth, typically assigned based on a person’s reproductive system and other physical characteristics. This data reference standard is to be used by exception and read in conjunction with Policy Direction to Modernize the Government of Canada’s Sex and Gender Information Practices and the Disaggregated Data Action Plan. This list of values is intended to standardize the way sex assigned at birth classifications are coded in datasets to enable data interoperability and improve data quality. The numbering system is a one-digit code. The appendix lists a data reference table that includes the one-digit codes for designating sex assigned at birth. Not included in this data reference standard is the lived or current sex of person(s). This data reference standard will be reviewed as required by the data reference steward in consultation with the data reference standard custodian. For support or advice on the measurement of “sex at birth of person” or related data variables, contact statcan.csds-cnsd.statcan@statcan.gc.ca.
Employment income statistics by occupation unit group, visible minority,...
www150.statcan.gc.ca
open.canada.ca
Updated May 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2023). Employment income statistics by occupation unit group, visible minority, highest level of education, work activity during the reference year, age and gender: Canada, provinces and territories [Dataset]. http://doi.org/10.25318/9810058601-eng
Explore at:
Unique identifier
https://doi.org/10.25318/9810058601-eng
Dataset updated
May 10, 2023
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Data on employment income statistics, by occupation unit group (5-digit code) from the National Occupational Classification (NOC) 2021, visible minority, highest level of education, work activity during the reference year, age and gender for the population aged 15 years and over who reported weeks worked and employment income in 2020 in private households in Canada, provinces and territories.
Youtube video statistics for 1 million videos
kaggle.com
Updated Jun 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mattia Zeni (2020). Youtube video statistics for 1 million videos [Dataset]. https://www.kaggle.com/datasets/mattiazeni/youtube-video-statistics-1million-videos/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2020
Dataset provided by
Kaggle
Authors
Mattia Zeni
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
YouTube
Description
Motivation

Study how YouTube videos become viral or, more in general, how they evolve in terms of views, likes and subscriptions is a topic of interest in many disciplines. With this dataset you can study such phenomena, with statistics about 1 million YouTube videos. The information was collected in 2013 when YouTube was exposing the data publicly: they removed this functionality in the years and now it's possible to have such statistics only to the owner of the video. This makes this dataset unique.

Context

This Dataset has been generated with YOUStatAnalyzer, a tool developed by myself (Mattia Zeni) when I was working for CREATE-NET (www.create-net.org) within the framework of the CONGAS FP7 project (http://www.congas-project.eu). For the project we needed to collect and analyse the dynamics of YouTube videos popularity. The dataset contains statistics of more than 1 million Youtube videos, chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).

The motivation that led us to the development of the YOUStatAnalyser data collection tool and the creation of this dataset is that there's an active research community working on the interplay among user individual preferences, social dynamics, advertising mechanisms and a common problem is the lack of open large-scale datasets. At the same time, no tool was present at that time. Today, YouTube removed the possibility to visualize these data on each video's page, making this dataset unique.

When using our dataset for research purposes, please cite it as:

@INPROCEEDINGS{YOUStatAnalyzer, author={Mattia Zeni and Daniele Miorandi and Francesco {De Pellegrini}}, title = {{YOUStatAnalyzer}: a Tool for Analysing the Dynamics of {YouTube} Content Popularity}, booktitle = {Proc. 7th International Conference on Performance Evaluation Methodologies and Tools (Valuetools, Torino, Italy, December 2013)}, address = {Torino, Italy}, year = {2013} }

Content

The dataset contains statistics and metadata of 1 million YouTube videos, collected in 2013. The videos have been chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).

Dataset structure

The structure of a dataset is the following: { u'_id': u'9eToPjUnwmU', u'title': u'Traitor Compilation # 1 (Trouble ...', u'description': u'A traitor compilation by one are ...', u'category': u'Games', u'commentsNumber': u'6', u'publishedDate': u'2012-10-09T23:42:12.000Z', u'author': u'ServilityGaming', u'duration': u'208', u'type': u'video/3gpp', u'relatedVideos': [u'acjHy7oPmls', u'EhW2LbCjm7c', u'UUKigFAQLMA', ...], u'accessControl': { u'comment': {u'permission': u'allowed'}, u'list': {u'permission': u'allowed'}, u'videoRespond': {u'permission': u'moderated'}, u'rate': {u'permission': u'allowed'}, u'syndicate': {u'permission': u'allowed'}, u'embed': {u'permission': u'allowed'}, u'commentVote': {u'permission': u'allowed'}, u'autoPlay': {u'permission': u'allowed'} }, u'views': { u'cumulative': { u'data': [15.0, 25.0, 26.0, 26.0, ...] }, u'daily': { u'data': [15.0, 10.0, 1.0, 0.0, ..] } }, u'shares': { u'cumulative': { u'data': [0.0, 0.0, 0.0, 0.0, ...] }, u'daily': { u'data': [0.0, 0.0, 0.0, 0.0, ...] } }, u'watchtime': { u'cumulative': { u'data': [22.5666666667, 36.5166666667, 36.7, 36.7, ...] }, u'daily': { u'data': [22.5666666667, 13.95, 0.166666666667, 0.0, ...] } }, u'subscribers': { u'cumulative': { u'data': [0.0, 0.0, 0.0, 0.0, ...] }, u'daily': { u'data': [-1.0, 0.0, 0.0, 0.0, ...] } }, u'day': { u'data': [1349740800000.0, 1349827200000.0, 1349913600000.0, 1350000000000.0, ...] } }

From the structure above is possible to see which fields an entry in the dataset has. It is possible to divide them into 2 sections:

1) Video Information.

_id -> Corresponding to the video ID and to the unique identifier of an entry in the database. title -> Te video's title. description -> The video's description. category -> The YouTube category the video is inserted in. commentsNumber -> The number of comments posted by users. publishedDate -> The date the video has been published. author -> The author of the video. duration -> The video duration in seconds. type -> The encoding type of the video. relatedVideos -> A list of related videos. accessControl -> A list of access policies for different aspects related to the video.

2) Video Statistics.

Each video can have 4 different statistics variables: views, shares, subscribers and watchtime. Recent videos have all of them while older video can have only the 'views' variable. Each variable has 2 dimensions, daily and cumulative.

`views -> number of views collected by the vi...
Water File - Coastal Waters (polygons) - 2011 Census
open.canada.ca
data.amerigeoss.org
+1more
gml, html, shp
Updated Feb 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2022). Water File - Coastal Waters (polygons) - 2011 Census [Dataset]. https://open.canada.ca/data/en/dataset/92e3ad59-c7d3-4b79-ba90-5540a67a89a7
Explore at:
shp, html, gmlAvailable download formats
Dataset updated
Feb 24, 2022
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jan 1, 2011
Description
Water files are provided for the mapping of inland and coastal waters, Great Lakes and the St. Lawrence River. These files were created to be used in conjunction with the boundary files.
Z
Data from: Citation network of the knowledge co-production literature....
data.niaid.nih.gov
Updated Dec 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rhodri Ivor Leng (2021). Citation network of the knowledge co-production literature. Supplementary data. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5762450
Explore at:
Dataset updated
Dec 8, 2021
Dataset provided by
Megan Arthur
Justyna Bandola-Gill
Rhodri Ivor Leng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data description

This data note describes the final citation network dataset analysed in the manuscript "What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives’"[1].

The data collection strategy used to construct the following dataset can be found in the associated manuscript [1]. These data were originally downloaded from the Web of Science (WoS) Core Collection via the library subscription of the University of Edinburgh via a systematic search methodology that sought to capture literature relevant to ‘knowledge co-production’. The dataset consists of 1,893 unique document reference strings (nodes) interlinked together by 9,759 citation links (edges). The network dataset describes a directed citation network composed of papers relevant to 'knowledge co-production', and is split into two files: (i) ‘KnowCo_node_attribute_list.csv’ contains attributes of the 1,893 documents (nodes); and (ii) ‘KnowCo_edge_list.csv’ records the citation links (edges) between pairs of documents.

‘KnowCo_node_attribute_list.csv’ consists of attributes of the 1,893 nodes (documents) of the citation network. Due to the approach used to collect data, there are two types of node: (i) 525 nodes represent documents retrieved from WoS via the systematic search strategy, and these have full attribute data including their reference lists; and (ii) 1,368 documents that were cited >2 times by our 525 fully retrieved papers (see manuscript for full description [1]). The columns refer to:

Id, the unique identifier. Fully retrieved documents are identified via a unique identifier that begins with ‘f’ followed by an integer (e.g. f1, f2, etc.). Non-retrieved documents are identified via a unique identifier beginning with ‘n’ followed by an integer (e.g. n1, n2, etc.).

Label, contains the unique reference string of the document for which the attribute data in that row corresponds. Reference strings contain the last name of the first author, publication year, journal, volume, start page, and DOI (if available).

authors, all author names. These are in the order that these names appear in the authorship list of the corresponding document. These data are only available for fully retrieved documents.

title, document title. These data are only available for fully retrieved documents.

journal, journal of publication. These data are only available for fully retrieved documents. For those interested in journal data for the remaining papers, this can be extracted from the reference string in the ‘Label’ column.

year, year of publication. These data are available for all nodes.

type, document type (e.g. article, review). Available only for fully retrieved documents.

wos_total_citations, total citation count as recorded by Web of Science Core Collection as of May 2020. Available only for fully retrieved documents.

wos_id, Web of Science accession number. Available only for fully retrieved documents only, for non-retrieved documents ‘CitedReference’ fills the cell.

cluster, provides the cluster membership number as discussed within the manuscript, established via modularity maximisation via the Leiden algorithm (Res 0.8; Q=0.53|5 clusters). Available for all nodes.

indegree, total count of within network citations to a given document. Due to the composition of the network, this figure tells us the total number of citations from 525 fully retrieved documents to each of the 1,893 documents within the network. Available for all nodes.

outdegree, total count of within network references from a given document. Due to the composition of the network, only fully retrieved documents can have a value >0 because only these documents have their associated reference list data. Available for all nodes.

‘KnowCo_edge _list.csv’ is an edge list containing 9,759 citation links between the 1,893 documents. The columns refer to:

Source, the citing document’s unique identifier.

Target, the cited document’s unique identifier.

Notes

[1] Bandola-Gill, J., Arthur, M., & Leng, R. I. (Under review). What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives. Evidence & Policy
Z
All Computer Science Papers @ arXiv.org -- A High-Quality Gold Standard for...
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jatowt, Adam (2020). All Computer Science Papers @ arXiv.org -- A High-Quality Gold Standard for Citation-based Tasks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3535001
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Färber, Michael
Jatowt, Adam
Thiemann, Alexander
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We propose a newly-created gold standard data set for citation-based tasks. This gold standard is based on all computer science papers in arXiv.org.

Abstract. Analyzing and recommending citations with their specific citation contexts have recently received much attention due to the growing number of available publications. Although data sets such as CiteSeerX have been created for evaluating approaches for such tasks, those data sets exhibit striking defects. This is understandable if one considers that both information extraction and entity linking as well as entity resolution need to be performed. In this paper, we propose a new evaluation data set for citation-dependent tasks based on arXiv.org publications. Our data set is characterized by the fact that it exhibits almost zero noise in the extracted content and that all citations are linked to their correct publications. Besides the pure content, available on a sentence-basis, cited publications are annotated directly in the text via global identifiers. As far as possible, referenced publications are further linked to DBLP. Our data set consists of over 15M sentences and is freely available for research purposes. It can be used for training and testing citation-based tasks, such as recommending citations, determining the functions or importance of citations, and summarizing documents based on their citations.

More information can be found in our publication "A High-Quality Gold Standard for Citation-based Tasks" (LREC'18).

You can cite the data set as follows:

@inproceedings{DBLP:conf/lrec/0001TJ18, author = {Michael F{"{a}}rber and Alexander Thiemann and Adam Jatowt}, title = "{A High-Quality Gold Standard for Citation-based Tasks}", booktitle = "{Proceedings of the Eleventh International Conference on Language Resources and Evaluation}", series = "{LREC'18}", location = "{Miyazaki, Japan}", year = {2018}, url = {http://www.lrec-conf.org/proceedings/lrec2018/summaries/283.html} }
G
Percentage of total energy intake from protein, by dietary reference intake...
open.canada.ca
datasets.ai
+3more
csv, html, xml
Updated Jan 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Percentage of total energy intake from protein, by dietary reference intake age-sex group, household population aged 1 and over, Canadian Community Health Survey (CCHS) - Nutrition, Canada and provinces [Dataset]. https://open.canada.ca/data/en/dataset/13a2c639-83b6-4a04-8282-bd35470ac2ae
Explore at:
html, xml, csvAvailable download formats
Dataset updated
Jan 17, 2023
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
Mean of percentage of total energy intake from protein, by dietary age-sex reference intake group, for 2004 and 2015.
BIP! DB: A Dataset of Impact Measures for Research Products
data.europa.eu
unknown
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). BIP! DB: A Dataset of Impact Measures for Research Products [Dataset]. http://data.europa.eu/88u/dataset/oai-zenodo-org-11203340
Explore at:
unknown(1147065796)Available download formats
Dataset updated
May 17, 2024
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains citation-based impact indicators (a.k.a, "measures") for ~191M distinct PIDs (persistent identifiers) that correspond to research products (scientific publications, datasets, etc). In particular, for each PID, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture): Influence indicators (i.e., indicators of the "total" impact of each research product; how established it is in general) Citation Count: The total number of citations of the product, the most well-known influence indicator. PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each product based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two products with the same number of citations can have significantly different PageRank scores if the aggregated influence of the products citing them is very different - the product receiving citations from more influential products will get a larger score). Popularity indicators (i.e., indicators of the "current" impact of each research product; how popular the product is currently) RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published products (new products need time to receive a number of citations that can be indicative for their impact). AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published products by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to examine products which received a lot of attention recently. Impulse indicators (i.e., indicators of the initial momentum that the research product received right after its publication) Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all products and the time window depends on the publication date of the product, i.e., only citations 3 years after each product's publication are counted. More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found here and in the respective references (e.g., in [5]). From version 5.1 onward, the impact indicators are calculated in two levels: The PID level (assuming that each PID corresponds to a distinct research product). The OpenAIRE-id level (leveraging PID synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id). Previous versions of the dataset only provided the scores at the PID level. From version 12 onward, two types of PIDs are included in the dataset: DOIs and PMIDs (before that version, only DOIs were included). Also, from version 7 onward, for each product in our files we also offer an impact class, which informs the user about the percentile into which the product score belongs compared to the impact scores of the rest products in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%). Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each product with a distinct PID that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the OpenAIRE article deduplication algorithm. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same product). As a result, each node in the citation network we build is a deduplicated product having a distinct OpenAIRE id. We still report the scores at PID level (i.e., we assign a score to each of the versions/instances of the product), however these PID-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated product will receive the same scores). We have removed a small number of instances (having a PID) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph. For each calculation level (PID / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format "identifier
c
ckanext-datacitation - Extensions - CKAN Ecosystem Catalog
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-datacitation - Extensions - CKAN Ecosystem Catalog [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-datacitation
Explore at:
Dataset updated
Jun 4, 2025
Description
The datacitation extension for CKAN aims to facilitate proper data citation practices within the CKAN data catalog ecosystem. By providing tools and features to create and manage citations for datasets, the extension promotes discoverability and acknowledgment of data sources, enhancing the reproducibility and transparency of research and analysis based on these datasets. The available information is limited, but based on the name, the extension likely focuses on generating, displaying, and potentially exporting citation information. Key Features (Assumed based on Extension Name): * Dataset Citation Generation: Likely provides functionality to automatically generate citation strings for datasets based on metadata fields, adhering to common citation formats (e.g., APA, MLA, Chicago). * Citation Metadata Management: Potentially offers tools to manage citation-related metadata within datasets, such as author names, publication dates, and version numbers, which are essential elements for creating accurate citations. * Citation Display on Dataset Pages: It's reasonable to expect that the extension displays the generated citation information prominently on the dataset's display page, facilitating easy access for users. * Citation Export Options: May provide options to export citations in various formats (e.g., BibTeX, RIS) to integrate with reference management software popular among researchers. * Citation Style Customization: Possibly provides configuration options to customize the citation style used for generation, accommodating different disciplinary requirements. Use Cases (Inferred): 1. Research Data Repositories: Data repositories can utilize datacitation to ensure that researchers cite datasets correctly, which is crucial for tracking the impact of data and recognizing the contributions of data creators. 2. Government Data Portals: Government agencies can implement the extension to promote the proper use and attribution of open government datasets, fostering transparency and accountability. Technical Integration: Due to limited information, the integration details are speculative. However, it can be assumed that the datacitation extension likely integrates with CKAN by: * Adding a new plugin or module to CKAN that handles citation generation and display. * Extending the CKAN dataset schema to include citation-related metadata fields. * Potentially providing API endpoints for programmatic access to citation information. Benefits & Impact: The anticipated benefits of the datacitation extension include: * Improved data discoverability and reusability through proper citation practices. * Enhanced research reproducibility and transparency by ensuring that data sources are properly acknowledged. * Increased recognition of data creators and contributors. * Simplified citation management for users of CKAN-based data catalogs. Disclaimer: The above information is largely based on assumptions derived from the extension's name and common data citation practices. The actual features and capabilities of the datacitation extension may vary due to the unavailability of a README file.
Data from: Reliance on Science in Patenting
zenodo.org
explore.openaire.eu
pdf, tsv, zip
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matt Marx; Matt Marx; Aaron Fuegi; Aaron Fuegi (2024). Reliance on Science in Patenting [Dataset]. http://doi.org/10.5281/zenodo.3382981
Explore at:
pdf, zip, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3382981
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matt Marx; Matt Marx; Aaron Fuegi; Aaron Fuegi
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This dataset contains citations from USPTO patents granted 1947-2018 to articles captured by the Microsoft Academic Graph (MAG) from 1800-2018. If you use the data, please cite these two papers:

for the dataset of citations: Marx, Matt and Aaron Fuegi, "Reliance on Science in Patenting: USPTO Front-Page Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686)

for the underlying dataset of papers Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246.

The main file, pcs.tsv, contains the resolved citations. Fields are tab-separated. Each match has the patent number, MAG ID, the original citation from the patent, an indicator for whether the citation was supplied by the applicant, examiner, or unknown, and a confidence score (1-10) indicating how likely this match is correct. Note that this distribution does not contain matches with confidence 2 or 1.

There is also a PubMed-specific match in pcs-pubmed.tsv.

The remaining files are a redistribution of the 1 January 2019 release of the Microsoft Academic Graph. All of these files are compressed using ZIP compression under CentOS5. Original files, documented at https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema, can be downloaded from https://aka.ms/msracad; this redistribution carves up the original files into smaller, variable-specific files that can be loaded individually (see _relianceonscience.pdf for full details).

Source code for generating the patent citations to science in pcs.tsv is available at https://github.com/mattmarx/reliance_on_science. Source code for generating jif.zip and jcif.zip (Journal Impact Factor and Journal Commercial Impact Factor) is at https://github.com/mattmarx/jcif.

Although MAG contains authors and affiliations for each paper, it does not contain the location for affiliations. We have created a dataset of locations for affiliations appearing at least 100x using Bing Maps and Google Maps; however, it is unclear to us whether the API licensing terms allow us to repost their data. In any case, you can download our source code for doing so here: https://github.com/ksjiaxian/api-requester-locations.

MAG extracts field keywords for each paper (paperfieldid.zip and fieldidname.zip) --more than 200,000 fields in all! When looking to study industries or technical areas you might find this a bit overwhelming. We mapped the MAG subjects to six OECD fields and 39 subfields, defined here: http://www.oecd.org/science/inno/38235147.pdf. Clarivate provides a crosswalk between the OECD classifications and Web of Science fields, so we include WoS fields as well. This file is magfield_oecd_wos_crosswalk.zip.

Facebook

Twitter

Click to copy link

Link copied

Cite

Treasury Board of Canada Secretariat (2025). Data Reference Standard on Person(s): gender [Dataset]. https://open.canada.ca/data/dataset/21ffae40-8e4b-4082-a4f6-3c67f400e126

Data Reference Standard on Person(s): gender

Explore at:

csvAvailable download formats

Dataset updated

Mar 3, 2025

Dataset provided by

Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html

License

Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically

Description

This data reference standard provides a standard list of values to categorize data on person(s). The list reflects the classifications of gender and is designed to provide common data variables of the reported gender of person(s) or an individual’s personal and social gender identity. This data reference standard is to be read in conjunction with the Policy Direction to Modernize the Government of Canada’s Sex and Gender Information Practices and the Disaggregated Data Action Plan. This list of values is intended to standardize the way gender classifications are described in datasets to enable data interoperability and improve data quality. The appendix lists a data reference table that includes one-digit codes for designating gender. Not included in this data reference standard is an additional two-digit code for further classification. This data reference standard will be reviewed as required by the data reference steward in consultation with the data reference standard custodian. For support or advice on the measurement of “gender of person” or related data variables, contact statcan.csds-cnsd.statcan@statcan.gc.ca

Clear search

Close search

Google apps

Main menu

Data Reference Standard on Person(s): gender

Employment income statistics by industry subsectors, class of worker...

2021 Census - Reference maps

Employment income statistics by occupation minor group, Indigenous identity,...

Market Basket Measure (MBM) thresholds for the reference family by Market...

Percentage of total energy intake from carbohydrates, by dietary reference...

Historic US Census - 1940

Abstract

Before Manuscript Submission

Documentation

Data reference standard on Canadian Provinces and Territories

Lakes and Rivers (polygons), Boundary files - 2016 Census

Data Reference Standard on Person(s): gender - Catalogue - Canadian Urban...

Data Reference Standard on Person(s): sex assigned at birth - Catalogue -...

Employment income statistics by occupation unit group, visible minority,...

Youtube video statistics for 1 million videos

Motivation

Context

Content

Dataset structure

Water File - Coastal Waters (polygons) - 2011 Census

Data from: Citation network of the knowledge co-production literature....

All Computer Science Papers @ arXiv.org -- A High-Quality Gold Standard for...

Percentage of total energy intake from protein, by dietary reference intake...

BIP! DB: A Dataset of Impact Measures for Research Products

ckanext-datacitation - Extensions - CKAN Ecosystem Catalog

Data from: Reliance on Science in Patenting

Data Reference Standard on Person(s): gender