Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Canada Trademarks Dataset
18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303
Dataset Selection and Arrangement (c) 2021 Jeremy Sheff
Python and Stata Scripts (c) 2021 Jeremy Sheff
Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.
This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.
Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.
Terms of Use:
As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.
The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:
The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.
Details of Repository Contents:
This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:
If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.
The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.
With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.
The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.
This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This replication package contains the raw data and code to replicate the findings reported in the paper. The data are licensed under a Creative Commons Attribution 4.0 International Public License. The code is licensed under a Modified BSD License. See LICENSE.txt for details.
Software requirements
All analysis were done in Stata version 16:
Add-on packages are included in scripts/libraries/stata and do not need to be installed by user. The names, installation sources, and installation dates of these packages are available in scripts/libraries/stata/stata.trk.
Instructions
Save the folder ‘replication_PLOS’ to your local drive.
Open the master script ‘run.do’ and change the global pointing to the working direction (line 20) to the location where you save the folder on your local drive
Run the master script ‘run.do’ to replicate the analysis and generate all tables and figures reported in the paper and supplementary online materials
Datasets
Wave 1 – Survey experiment: ‘wave1_survey_experiment_raw.dta’
Wave 2 – Follow-up Survey: ‘wave2_follow_up_raw.dta'
Map: shape-files ‘plz2stellig.shp’ ‘OSM_PLZ.shp’, area codes ‘Postleitzahlengebiete-_OSM.csv’_, (all links to the sources can be found in the script ‘04_figure2_germany_map.do’)
Pretest: ‘pre-test_corona_raw.dta’
For Appendix S7: ‘alter_geschlecht_zensus_det.xlsx’, ‘vaccination_landkreis_raw.dta’, ‘census2020_age_gender.csv’ (all links to the sources can be found in the script ‘06_AppendixS7.do’)
For Appendix S10: ‘vaccination_landkreis_raw.dta’ (all links to the sources can be found in the script ‘07_AppendixS10.do’)
Descriptions of scripts
1_1_clean_wave1.do This script processes the raw data from wave 1, the survey experiment 1_2_clean_wave2.do This script processes the raw data from wave 2, the follow-up survey 1_3_merge_generate.do This script creates the datasets used in the main analysis and for robustness checks by merging the cleaned data from wave 1 and 2, tests the exclusion criteria and creates additional variables 02_analysis.do This script estimates regression models in Stata, creates figures and tables, saving them to results/figures and results/tables 03_robustness_checks_no_exclusion.do This script runs the main analysis using the dataset without applying the exclusion criteria. Results are saved in results/tables 04_figure2_germany_map.do This script creates Figure 2 in the main manuscript using publicly available data on vaccination numbers in Germany. 05_figureS1_dogmatism_scale.do This script creates Figure S1 using data from a pretest to adjust the dogmatism scale. 06_AppendixS7.do This script creates the figures and tables provided in Appendix S7 on the representativity of our sample compared to the German average using publicly available data about the age distribution in Germany. 07_AppendixS10.do This script creates the figures and tables provided in Appendix S10 on the external validity of vaccination rates in our sample using publicly available data on vaccination numbers in Germany.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for paper published in PLOS ONE 14.07.2023 These files were used for the statistical analysis of the hemparc feasibility trial using Stata software verson 17, and are as follows, both Stata format and .csv format as appropriate. The .do file is a simple text file.
hepmarc_data minimum dataset: .csv, .dta: See doi:10.1136/bmjopen-2019-035596 for study protocol describing all data collected hepmarc Data dictionary .xls, .dta; description of each data fields in minimum dataset hepmarc AE listing: Adverse events listing .csv, .dta hepmarc SAP v1.0 240322_.xls .dta; description of each data fields in minimum dataset hepmarc data.do Stata .do file used to perform the analysis
Notes: Each particpant's age has been altered by a random amount to preserve anonymity. There are two rows for two of the participants who each reported two adverse reactions.
Abstract Objectives Maraviroc may reduce hepatic inflammation in people with HIV and non-alcoholic fatty liver disease (HIV-NAFLD) through CCR5-receptor antagonism, which warrants further exploration.
Methods We performed an open-label 96-week randomised-controlled feasibility trial of maraviroc plus optimised background therapy (OBT) versus OBT alone, in a 1:1 ratio, for people with virologically-suppressed HIV-1 and NAFLD without cirrhosis. Dosing followed recommendations for HIV therapy in the Summary of Product Characteristics for maraviroc. The primary outcomes were safety, recruitment and retention rates, adherence and data completeness. Secondary outcomes included the change in Fibroscan-assessed liver stiffness measurements (LSM), controlled attenuation parameter (CAP) and Enhanced Liver Fibrosis (ELF) scores.
Results Fifty-three participants (53/60, 88% of target) were recruited; 23 received maraviroc plus OBT; 89% were male; 19% had type 2 diabetes mellitus. The median baseline LSM, CAP & ELF scores were 6.2 (IQR 4.6-7.8) kPa, 325 (IQR 279-351) dB/m and 9.1 (IQR 8.6-9.6) respectively.
Primary outcomes: all individuals eligible after screening were randomised; there was 92% (SD 6.6%) adherence to maraviroc [target >90%]; 83% (95%CI 70%-92%) participant retention [target >65%]; 5.5% of data were missing [target
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)
In any society, the human element represents the basis of the work force which exercises all the service and production activities. Therefore, it is a mandate to produce labor force statistics and studies, that is related to the growth and distribution of manpower and labor force distribution by different types and characteristics.
In this context, the Central Agency for Public Mobilization and Statistics conducts "Quarterly Labor Force Survey" which includes data on the size of manpower and labor force (employed and unemployed) and their geographical distribution by their characteristics.
By the end of each year, CAPMAS issues the annual aggregated labor force bulletin publication that includes the results of the quarterly survey rounds that represent the manpower and labor force characteristics during the year.
----> Historical Review of the Labor Force Survey:
1- The First Labor Force survey was undertaken in 1957. The first round was conducted in November of that year, the survey continued to be conducted in successive rounds (quarterly, bi-annually, or annually) till now.
2- Starting the October 2006 round, the fieldwork of the labor force survey was developed to focus on the following two points: a. The importance of using the panel sample that is part of the survey sample, to monitor the dynamic changes of the labor market. b. Improving the used questionnaire to include more questions, that help in better defining of relationship to labor force of each household member (employed, unemployed, out of labor force ...etc.). In addition to re-order of some of the already existing questions in much logical way.
3- Starting the January 2008 round, the used methodology was developed to collect more representative sample during the survey year. this is done through distributing the sample of each governorate into five groups, the questionnaires are collected from each of them separately every 15 days for 3 months (in the middle and the end of the month)
----> The survey aims at covering the following topics:
1- Measuring the size of the Egyptian labor force among civilians (for all governorates of the republic) by their different characteristics. 2- Measuring the employment rate at national level and different geographical areas. 3- Measuring the distribution of employed people by the following characteristics: gender, age, educational status, occupation, economic activity, and sector. 4- Measuring unemployment rate at different geographic areas. 5- Measuring the distribution of unemployed people by the following characteristics: gender, age, educational status, unemployment type "ever employed/never employed", occupation, economic activity, and sector for people who have ever worked.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample of urban and rural areas in all the governorates.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)
----> Sample Design and Selection
The sample of the LFS 2006 survey is a simple systematic random sample.
----> Sample Size
The sample size varied in each quarter (it is Q1=19429, Q2=19419, Q3=19119 and Q4=18835) households with a total number of 76802 households annually. These households are distributed on the governorate level (urban/rural).
A more detailed description of the different sampling stages and allocation of sample across governorates is provided in the Methodology document available among external resources in Arabic.
Face-to-face [f2f]
The questionnaire design follows the latest International Labor Organization (ILO) concepts and definitions of labor force, employment, and unemployment.
The questionnaire comprises 3 tables in addition to the identification and geographic data of household on the cover page.
----> Table 1- Demographic and employment characteristics and basic data for all household individuals
Including: gender, age, educational status, marital status, residence mobility and current work status
----> Table 2- Employment characteristics table
This table is filled by employed individuals at the time of the survey or those who were engaged to work during the reference week, and provided information on: - Relationship to employer: employer, self-employed, waged worker, and unpaid family worker - Economic activity - Sector - Occupation - Effective working hours - Work place - Average monthly wage
----> Table 3- Unemployment characteristics table
This table is filled by all unemployed individuals who satisfied the unemployment criteria, and provided information on: - Type of unemployment (unemployed, unemployed ever worked) - Economic activity and occupation in the last held job before being unemployed - Last unemployment duration in months - Main reason for unemployment
----> Raw Data
Office editing is one of the main stages of the survey. It started once the questionnaires were received from the field and accomplished by the selected work groups. It includes: a-Editing of coverage and completeness b-Editing of consistency
----> Harmonized Data
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de471016https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de471016
Abstract (en): The Survey of Consumer Attitudes and Behavior series (also known as the Surveys of Consumers) was undertaken to measure changes in consumer attitudes and expectations, to understand why such changes occur, and to evaluate how they relate to consumer decisions to save, borrow, or make discretionary purchases. The data regularly include the Index of Consumer Sentiment, the Index of Current Economic Conditions, and the Index of Consumer Expectations. Since the 1940s, these surveys have been produced quarterly through 1977 and monthly thereafter. The surveys conducted in 2005 focused on topics such as evaluations and expectations about personal finances, employment, price changes, and the national business situation. Opinions were collected regarding respondents' appraisals of present market conditions for purchasing houses, automobiles, computers, and other durables. Also explored in this survey, were respondents' types of savings and financial investments, loan use, family income, and retirement planning. Other topics in this series typically include ownership, lease, and use of automobiles, respondents' general feelings and respondents' familiarity with and use of the Internet. Demographic information includes ethnic origin, sex, age, marital status, and education. The purpose of this survey series is to forecast changes in aggregate consumer behavior. The data are not weighted, however, this collection contains four weight variables; WT (HOUSEHOLD HEAD WEIGHT), WT_HH (HOUSEHOLD WEIGHT), WT_ADHD (ADULT HEAD WEIGHT), and WT_AD (ADULT WEIGHT) that must be used in any analysis. For more information on weights and sampling please refer to the documentation and/or visit the Surveys of Consumers Web site. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.; Checked for undocumented or out-of-range codes.. Persons aged 18 years or older living in households with telephones within the United States. Smallest Geographic Unit: Census Division National sample of dwelling units selected by area probability sampling. 2016-04-12 This collection has been fully curated, and was updated to include SPSS, SAS, and Stata data and setup files, a tab-delimited data file, an R data file, and a PDF codebook. computer-assisted telephone interview (CATI)Information on the Index of Consumer Sentiment, the Index of Current Economic Conditions, and the Index of Consumer Expectations and how they were created can be found in the ICPSR Codebook. Additional information on the Survey of Consumers can be found by visiting the Surveys of Consumers Web site.
The cleaned and harmonized version of the survey data produced and published by the Economic Research Forum represents 100% of the original survey data collected by the Central Agency for Public Mobilization and Statistics (CAPMAS)
In any society, the human element represents the basis of the work force which exercises all the service and production activities. Therefore, it is a mandate to produce labor force statistics and studies, that is related to the growth and distribution of manpower and labor force distribution by different types and characteristics.
In this context, the Central Agency for Public Mobilization and Statistics conducts "Quarterly Labor Force Survey" which includes data on the size of manpower and labor force (employed and unemployed) and their geographical distribution by their characteristics.
By the end of each year, CAPMAS issues the annual aggregated labor force bulletin publication that includes the results of the quarterly survey rounds that represent the manpower and labor force characteristics during the year.
----> Historical Review of the Labor Force Survey:
1- The First Labor Force survey was undertaken in 1957. The first round was conducted in November of that year, the survey continued to be conducted in successive rounds (quarterly, bi-annually, or annually) till now.
2- Starting the October 2006 round, the fieldwork of the labor force survey was developed to focus on the following two points: a. The importance of using the panel sample that is part of the survey sample, to monitor the dynamic changes of the labor market. b. Improving the used questionnaire to include more questions, that help in better defining of relationship to labor force of each household member (employed, unemployed, out of labor force ...etc.). In addition to re-order of some of the already existing questions in much logical way.
3- Starting the January 2008 round, the used methodology was developed to collect more representative sample during the survey year. this is done through distributing the sample of each governorate into five groups, the questionnaires are collected from each of them separately every 15 days for 3 months (in the middle and the end of the month)
----> The survey aims at covering the following topics:
1- Measuring the size of the Egyptian labor force among civilians (for all governorates of the republic) by their different characteristics. 2- Measuring the employment rate at national level and different geographical areas. 3- Measuring the distribution of employed people by the following characteristics: gender, age, educational status, occupation, economic activity, and sector. 4- Measuring unemployment rate at different geographic areas. 5- Measuring the distribution of unemployed people by the following characteristics: gender, age, educational status, unemployment type "ever employed/never employed", occupation, economic activity, and sector for people who have ever worked.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample of urban and rural areas in all the governorates.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
The cleaned and harmonized version of the survey data produced and published by the Economic Research Forum represents 100% of the original survey data collected by the Central Agency for Public Mobilization and Statistics (CAPMAS)
Sample Design and Selection
The sample of the LFS 2006 survey is a simple systematic random sample.
Sample Size
The sample size varied in each quarter (it is Q1=19429, Q2=19419, Q3=19119 and Q4=18835) households with a total number of 76802 households annually. These households are distributed on the governorate level (urban/rural).
A more detailed description of the different sampling stages and allocation of sample across governorates is provided in the Methodology document available among external resources in Arabic.
Face-to-face [f2f]
The questionnaire design follows the latest International Labor Organization (ILO) concepts and definitions of labor force, employment, and unemployment.
The questionnaire comprises 3 tables in addition to the identification and geographic data of household on the cover page.
----> Table 1- Demographic and employment characteristics and basic data for all household individuals
Including: gender, age, educational status, marital status, residence mobility and current work status
----> Table 2- Employment characteristics table
This table is filled by employed individuals at the time of the survey or those who were engaged to work during the reference week, and provided information on: - Relationship to employer: employer, self-employed, waged worker, and unpaid family worker - Economic activity - Sector - Occupation - Effective working hours - Work place - Average monthly wage
----> Table 3- Unemployment characteristics table
This table is filled by all unemployed individuals who satisfied the unemployment criteria, and provided information on: - Type of unemployment (unemployed, unemployed ever worked) - Economic activity and occupation in the last held job before being unemployed - Last unemployment duration in months - Main reason for unemployment
----> Raw Data
Office editing is one of the main stages of the survey. It started once the questionnaires were received from the field and accomplished by the selected work groups. It includes: a-Editing of coverage and completeness b-Editing of consistency
----> Harmonized Data
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Canada Trademarks Dataset
18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303
Dataset Selection and Arrangement (c) 2021 Jeremy Sheff
Python and Stata Scripts (c) 2021 Jeremy Sheff
Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.
This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.
Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.
Terms of Use:
As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.
The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:
The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.
Details of Repository Contents:
This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:
If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.
The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.
With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.
The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.
This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.