Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Canada Trademarks Dataset
18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303
Dataset Selection and Arrangement (c) 2021 Jeremy Sheff
Python and Stata Scripts (c) 2021 Jeremy Sheff
Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.
This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.
Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.
Terms of Use:
As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.
The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:
The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.
Details of Repository Contents:
This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:
If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.
The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.
With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.
The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.
This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
The supply and use tables replace the archived input-output tables (Tables 36-10-0417 and 36-10-0418). This new presentation is more in line with the international standard and practices found in most national statistical organizations. Supply and use tables are now available on the Statistics Canada web site in Comma Separated Value (.csv) files and Excel spreadsheet (.xlsx) files. This data is available at the following link: http://www.statcan.gc.ca/pub/15-602-x/15-602-x2017001-eng.htm (opens new window)." The supply and use tables are built around three classification systems, namely the Input-Output Industry Classification (IOIC), the Input-Output Final Demand Classification (IOFDC), and the Supply and Use Product Classification (SUPC). Each classification has four levels of hierarchy, consisting of the Detail level, Link-1997 level, Link-1961 level and Summary level. These classifications are available upon request. The estimates for reference years 2010 to 2013 are based on the 2015 comprehensive revision of the Canadian System of Macroeconomic Accounts. More information about the 2015 comprehensive revision is available in: http://www.statcan.gc.ca/pub/13-605-x/2015003/article/14153-eng.htm" rel="external noopener noreferrer" target="_blank">A preview of the 2015 comprehensive revision of the Canadian System of Macroeconomic Accounts (opens new window)." With the release of the estimates for reference year 2013, a new product classification has been introduced mainly as the result of the incorporation of the 2012 version of the North American Product Classification System (NAPCS) into the Supply and Use Product Classification (SUPC). There are also some non NAPCS-related changes to some products - the new SUPC 2013 classification includes enhanced detail for wholesale and retail margin services. Prior to reference year 2013, the health and education industries in the supply and use tables (SUT) were roughly aligned with either the provincial or municipal level. Beginning with the 2013 SUT, the education and health industries are redefined to allow a simple aggregation of each health and education industry to only one level of government. Prior to reference year 2013, the Canadian supply and use tables (SUT) showed imports by product valued at c.i.f. (cost, insurance and freight), inclusive of duties. In order to align with the international standard and reduce confusion for users, starting in 2013 the Canadian SUT show imports by product valued at c.i.f., with duties on imports shown explicitly in a separate tax margin table. Prior to reference year 2013, land transfer taxes were presented in the purchaser price supply and use tables (SUT) as direct payments of taxes on products in the final demand construction categories. As of 2013, the SUT will show these payments as included in the purchaser price values and as tax margins. Starting with the 2013 supply and use tables, the fictive commodities and industries have been eliminated from the tables. Beginning with reference year 2014, the classifications of the supply and use tables have been modified to include cannabis related industries, products and final demand categories. Additional changes have also been made to the industry classification codes for oil and gas extraction and to the final demand classification to disaggregate disposal of used assets by sector. Beginning with reference year 2014, the estimates are based on the 2019 comprehensive revision of the Canadian System of Macroeconomic Accounts which incorporated revisions to both international travel expenditures and cannabis-related activities. More information about the 2019 comprehensive revision is available in: A preview of the 2019 revision of the Canadian System of Macroeconomic Accounts (opens new window)." With the release of the estimates for reference year 2017, the Supply and Use Product Classification (SUPC) now reflects version 2.0 of the 2017 North American Product Classification System (NAPCS). As a result, one SUPC has become redundant and contains no data: MPS519001 — Subscriptions for online content. Two other codes also no longer contain data MPS532A01 — Computer equipment rental and leasing services and MPS532A02 — Office machinery and equipment (except computer equipment) rental and leasing services. These products are no longer economically significant and their data was combined with MPS532A03 — Rental and operating leasing services of commercial and industrial machinery and equipment.
Its been two years since the news that Canada has legalized weed hit us, so I was like why don't we get a dataset from Kaggle to practice a bit of data analysis and to my surprise I cannot find a weed dataset which reflects the economics behind legalized weed and how it has changed over time ,so I just went to the Canadian govt data site , and ola they have CSV files on exactly what I wanted floating around on their website and all I did was to download it straight up, and here I am to share it with the community.
We have a series of CSV files each having data about things like supply, use case, production, etc but before we go into the individual files there are a few data columns which are common to all csv files
Understanding metadata files:
Cube Title: The title of the table. The output files are unilingual and thus will contain either the English or French title.
Product Id (PID): The unique 8 digit product identifier for the table.
CANSIM Id: The ID number which formally identified the table in CANSIM. (where applicable)
URL: The URL for the representative (default) view of a given data table.
Cube Notes: Each note is assigned a unique number. This field indicates which notes, if any, are applied to the entire table.
Archive Status: Describes the status of a table as either 'Current' or 'Archived'. Archived tables are those that are no longer updated.
Frequency: Frequency of the table. (i.e. annual)
Start Reference Period: The starting reference period for the table.
End Reference Period: The end reference period for the table.
Total Number of Dimensions: The total number of dimensions contained in the table.
Dimension Name: The name of a dimension in a table. There can be up to 10 dimensions in a table. (i.e. – Geography)
Dimension ID: The reference code assigned to a dimension in a table. A unique reference Dimension ID code is assigned to each dimension in a table.
Dimension Notes: Each note is assigned a unique number. This field indicates which notes are applied to a particular dimension.
Dimension Definitions: Reserved for future development.
Member Name: The textual description of the members in a dimension. (i.e. – Nova Scotia, Ontario (members of the Geography dimension))
Member ID: The code assigned to a member of a dimension. There is a unique ID for each member within a dimension. These IDs are used to create the coordinate field in the data file. (see the 'coordinate' field in the data record layout).
Classification (where applicable): Classification code for a member. Definitions, data sources and methods
Parent Member ID: The code used to display the hierarchical relationship between members in a dimension. (i.e. – The member Ontario (5) is a child of the member Canada (1) in the dimension 'Geography')
Terminated: Indicates whether a member has been terminated or not. Terminated members are those that are no longer updated.
Member Notes: Each note is assigned a unique number. This field indicates which notes are applied to each member.
Member definitions: Reserved for future development.
Symbol Legend: The symbol legend provides descriptions of the various symbols which can appear in a table. This field describes a comprehensive list of all possible symbols, regardless of whether a selected symbol appears in a particular table.
Survey Code: The unique code associated with a survey or program from which the data in the table is derived. Data displayed in one table may be derived ...
PSE policies Canada (July 2024) Project Team Ray Huang - [Github] [https://orcid.org/0009-0008-1699-6267] Tim Ribaric - [Github] [https://orcid.org/0000-0001-9229-8569] Rahul Kumar - [Github] [https://orcid.org/0000-0002-4247-6045] Policies play a pivotal role in defining the boundaries of what is permissible and what is not. Among these, academic integrity policies are crucial in outlining acceptable and unacceptable behaviours in academic settings. These policies were available in various formats on institutional websites. This repository contains text versions of academic integrity policies from English-language, publicly supported postsecondary education (PSE) institutions in Canada. We recognize that policies often lag behind innovation (e.g., Barzotto et al., 2019; Marcus, 1981; Rodríguez‐Pose & Wilkie, 2018); consequently, this repository also includes guidelines that are sometimes issued to address the disruption caused by GenAI. We aimed to examine their similarities and differences concerning responsibilities and freedoms outlined in the policies and guidelines. In the research for which these policies were collected, we were particularly interested in investigating how they have evolved (or not) in response to the proliferation of generative artificial intelligence (GenAI). Using a computer script, these policies were collected on July 29, 2024 after their location was populated in the attached CSV file. Examining these policies and guidelines using computerized techniques required tokenization to perform Latent Dirichlet Allocation (LDA) and Term Frequency-Inverse Document Frequency (TFIDF) analyses. These processed files are also included in the repository. Details of the various files and their website locations are provided in the CSV file within the repository, and the README.md contains additional pertinent information. Our preliminary results indicate that the policies are indeed trailing the innovation and disruption brought about by GenAI. For more details, please visit our project website where the published results will also be posted. Document Description & Summary That dataset is comprised of the academic integrity policies of English speaking, publically funding, Canadian institutions current to July 29, 2024. Harvested information is categorized into the following: policies - documents that are binding guidelines - documents that are not binding but represent best practices, guidelines, etc. followed up policies - documents that are secondary responses Description of files PSE_Policies_Collection.csv A listing of all of the Canadian colleges and universities with a posted Academic Integrity policy investigated in this study. Columns in data: Name of the PSE U15 or not College/University Province URL of the PSE URL2 (filled if instiution has a policy) URL3 (filled if instiution has a guideline) URL4 (filled if instiution has a followed up policy) Name of downloaded policy document (if applicable) Name of the downloaded guideline policy (if applicable) Name of the downloaded followed up policy (if applicable) Texts of Documents Documents were either HTML or PDF file. These were harvested full-text was extracted and put into a text file with the name of institution, and time stamp of original collection from the web concatenated into the name of the file. Tokenization of Documents In order to run LDA analysis the full-text documents were parsed and tokenized using spACy and NLTK. This process lemmatized the text, created bigrams, and removed stopwords. Each token file follows the same naming structure as the extracted full-text with the addtion of _tokens to the end of the filename. References Barzotto, M., Corradini, C., Fai, F., Labory, S., & Tomlinson, P. R. (2019). Enhancing innovative capabilities in lagging regions: An extra-regional collaborative approach to RIS3. Cambridge Journal of Regions, Economy and Society, 12(2), 213-232. https://doi.org/10.1093/cjres/rsz003 Marcus, A. A. (1981). Policy uncertainty and technological innovation. Academy of Management Review, 6(3), 443-448. https://doi.org/10.5465/amr.1981.4285783 Rodríguez‐Pose, A., & Wilkie, C. (2018). Innovating in less developed regions: what drives patenting in the lagging regions of Europe and North America. Growth and Change, 50(1), 4-37. https://doi.org/10.1111/grow.12280
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Unfortunately, the text-based data extract is no longer accessible through the Natural and Non-prescription Health Products (NNHPD) website. We want to underline that the support for the CSV (text) extract formats has been dropped by our program area and the links to CSV files on the page will be deleted soon. However, the alternative formats XML and JSON are both available and are updated daily and will continue to be moving forward. .For the most recent LNHPD extract you can retrieve either XML or JSON extracts from our API. We apologize for any confusion this may have caused. The Licensed Natural Health Products Database contains information about natural health products that have been issued a product licence by Health Canada. This data extract contains information on NHP Products. Products with a licence have been assessed by Health Canada and found to be safe, effective and of high quality under their recommended conditions of use. You can identify licensed natural health products by looking for the eight-digit Natural Product Number (NPN) or Homeopathic Medicine Number (DIN-HM) on the label.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Canada Trademarks Dataset
18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303
Dataset Selection and Arrangement (c) 2021 Jeremy Sheff
Python and Stata Scripts (c) 2021 Jeremy Sheff
Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.
This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.
Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.
Terms of Use:
As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.
The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:
The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.
Details of Repository Contents:
This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:
If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.
The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.
With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.
The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.
This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.