CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.
This data includes the location of cooling towers registered with New York State. The data is self-reported by owners/property managers of cooling towers in service in New York State. In August 2015 the New York State Department of Health released emergency regulations requiring the owners of cooling towers to register them with New York State. In addition the regulation includes requirements: regular inspection; annual certification; obtaining and implementing a maintenance plan; record keeping; reporting of certain information; and sample collection and culture testing. All cooling towers in New York State, including New York City, need to be registered in the NYS system. Registration is done through an electronic database found at: www.ny.gov/services/register-cooling-tower-and-submit-reports. For more information, check http://www.health.ny.gov/diseases/communicable/legionellosis/, or go to the “About” tab.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Pesticide Data Program (PDP) is a national pesticide residue database program. Through cooperation with State agriculture departments and other Federal agencies, PDP manages the collection, analysis, data entry, and reporting of pesticide residues on agricultural commodities in the U.S. food supply, with an emphasis on those commodities highly consumed by infants and children. This dataset provides information on where each tested sample was collected, where the product originated from, what type of product it was, and what residues were found on the product, for calendar years 1992 through 2020. The data can measure residues of individual compounds and classes of compounds, as well as provide information about the geographic distribution of the origin of samples, from growers, packers and distributors. The dataset also includes information on where the samples were taken, what laboratory was used to test them, and all testing procedures (by sample, so can be linked to the compound that is identified). The dataset also contains a reference variable for each compound that denotes the limit of detection for a pesticide/commodity pair (LOD variable). The metadata also includes EPA tolerance levels or action levels for each pesticide/commodity pair. The dataset will be updated on a continual basis, with a new resource data file added annually after the PDP calendar-year survey data is released. Resources in this dataset:Resource Title: CSV Data Dictionary for PDP. File Name: PDP_DataDictionary.csvResource Description: Machine-readable Comma Separated Values (CSV) format data dictionary for PDP Database Zip files. Defines variables for the sample identity and analytical results data tables/files. The ## characters in the Table and Text Data File name refer to the 2-digit year for the PDP survey, like 97 for 1997 or 01 for 2001. For details on table linking, see PDF. Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Data dictionary for Pesticide Data Program. File Name: PDP DataDictionary.pdfResource Description: Data dictionary for PDP Database Zip files.Resource Software Recommended: Adobe Acrobat,url: https://www.adobe.com Resource Title: 2019 PDP Database Zip File. File Name: 2019PDPDatabase.zipResource Title: 2018 PDP Database Zip File. File Name: 2018PDPDatabase.zipResource Title: 2017 PDP Database Zip File. File Name: 2017PDPDatabase.zipResource Title: 2016 PDP Database Zip File. File Name: 2016PDPDatabase.zipResource Title: 2015 PDP Database Zip File. File Name: 2015PDPDatabase.zipResource Title: 2014 PDP Database Zip File. File Name: 2014PDPDatabase.zipResource Title: 2013 PDP Database Zip File. File Name: 2013PDPDatabase.zipResource Title: 2012 PDP Database Zip File. File Name: 2012PDPDatabase.zipResource Title: 2011 PDP Database Zip File. File Name: 2011PDPDatabase.zipResource Title: 2010 PDP Database Zip File. File Name: 2010PDPDatabase.zipResource Title: 2009 PDP Database Zip File. File Name: 2009PDPDatabase.zipResource Title: 2008 PDP Database Zip File. File Name: 2008PDPDatabase.zipResource Title: 2007 PDP Database Zip File. File Name: 2007PDPDatabase.zipResource Title: 2005 PDP Database Zip File. File Name: 2005PDPDatabase.zipResource Title: 2004 PDP Database Zip File. File Name: 2004PDPDatabase.zipResource Title: 2003 PDP Database Zip File. File Name: 2003PDPDatabase.zipResource Title: 2002 PDP Database Zip File. File Name: 2002PDPDatabase.zipResource Title: 2001 PDP Database Zip File. File Name: 2001PDPDatabase.zipResource Title: 2000 PDP Database Zip File. File Name: 2000PDPDatabase.zipResource Title: 1999 PDP Database Zip File. File Name: 1999PDPDatabase.zipResource Title: 1998 PDP Database Zip File. File Name: 1998PDPDatabase.zipResource Title: 1997 PDP Database Zip File. File Name: 1997PDPDatabase.zipResource Title: 1996 PDP Database Zip File. File Name: 1996PDPDatabase.zipResource Title: 1995 PDP Database Zip File. File Name: 1995PDPDatabase.zipResource Title: 1994 PDP Database Zip File. File Name: 1994PDPDatabase.zipResource Title: 1993 PDP Database Zip File. File Name: 1993PDPDatabase.zipResource Title: 1992 PDP Database Zip File. File Name: 1992PDPDatabase.zipResource Title: 2006 PDP Database Zip File. File Name: 2006PDPDatabase.zipResource Title: 2020 PDP Database Zip File. File Name: 2020PDPDatabase.zipResource Description: Data and supporting files for PDP 2020 surveyResource Software Recommended: Microsoft Access,url: https://products.office.com/en-us/access
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of income and program participation (sipp) with r if the census bureau's budget was gutted and only one complex sample survey survived, pray it's the survey of income and program participation (sipp). it's giant. it's rich with variables. it's monthly. it follows households over three, four, now five year panels. the congressional budget office uses it for their health insurance simulation . analysts read that sipp has person-month files, get scurred, and retreat to inferior options. the american community survey may be the mount everest of survey data, but sipp is most certainly the amazon. questions swing wild and free through the jungle canopy i mean core data dictionary. legend has it that there are still species of topical module variables that scientists like you have yet to analyze. ponce de león would've loved it here. ponce. what a name. what a guy. the sipp 2008 panel data started from a sample of 105,663 individuals in 42,030 households. once the sample gets drawn, the census bureau surveys one-fourth of the respondents every four months, over f our or five years (panel durations vary). you absolutely must read and understand pdf pages 3, 4, and 5 of this document before starting any analysis (start at the header 'waves and rotation groups'). if you don't comprehend what's going on, try their survey design tutorial. since sipp collects information from respondents regarding every month over the duration of the panel, you'll need to be hyper-aware of whether you want your results to be point-in-time, annualized, or specific to some other period. the analysis scripts below provide examples of each. at every four-month interview point, every respondent answers every core question for the previous four months. after that, wave-specific addenda (called topical modules) get asked, but generally only regarding a single prior month. to repeat: core wave files contain four records per person, topical modules contain one. if you stacked every core wave, you would have one record per person per month for the duration o f the panel. mmmassive. ~100,000 respondents x 12 months x ~4 years. have an analysis plan before you start writing code so you extract exactly what you need, nothing more. better yet, modify something of mine. cool? this new github repository contains eight, you read me, eight scripts: 1996 panel - download and create database.R 2001 panel - download and create database.R 2004 panel - download and create database.R 2008 panel - download and create database.R since some variables are character strings in one file and integers in anoth er, initiate an r function to harmonize variable class inconsistencies in the sas importation scripts properly handle the parentheses seen in a few of the sas importation scripts, because the SAScii package currently does not create an rsqlite database, initiate a variant of the read.SAScii
function that imports ascii data directly into a sql database (.db) download each microdata file - weights, topical modules, everything - then read 'em into sql 2008 panel - full year analysis examples.R< br /> define which waves and specific variables to pull into ram, based on the year chosen loop through each of twelve months, constructing a single-year temporary table inside the database read that twelve-month file into working memory, then save it for faster loading later if you like read the main and replicate weights columns into working memory too, merge everything construct a few annualized and demographic columns using all twelve months' worth of information construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half, again save it for faster loading later, only if you're so inclined reproduce census-publish ed statistics, not precisely (due to topcoding described here on pdf page 19) 2008 panel - point-in-time analysis examples.R define which wave(s) and specific variables to pull into ram, based on the calendar month chosen read that interview point (srefmon)- or calendar month (rhcalmn)-based file into working memory read the topical module and replicate weights files into working memory too, merge it like you mean it construct a few new, exciting variables using both core and topical module questions construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half reproduce census-published statistics, not exactly cuz the authors of this brief used the generalized variance formula (gvf) to calculate the margin of error - see pdf page 4 for more detail - the friendly statisticians at census recommend using the replicate weights whenever possible. oh hayy, now it is. 2008 panel - median value of household assets.R define which wave(s) and spe cific variables to pull into ram, based on the topical module chosen read the topical module and replicate weights files into working memory too, merge once again construct a replicate-weighted complex sample design with a...
Data here contain and describe an open-source structured query language (SQLite) portable database containing high resolution mass spectrometry data (MS1 and MS2) for per- and polyfluorinated alykl substances (PFAS) and associated metadata regarding their measurement techniques, quality assurance metrics, and the samples from which they were produced. These data are stored in a format adhering to the Database Infrastructure for Mass Spectrometry (DIMSpec) project. That project produces and uses databases like this one, providing a complete toolkit for non-targeted analysis. See more information about the full DIMSpec code base - as well as these data for demonstration purposes - at GitHub (https://github.com/usnistgov/dimspec) or view the full User Guide for DIMSpec (https://pages.nist.gov/dimspec/docs).Files of most interest contained here include the database file itself (dimspec_nist_pfas.sqlite) as well as an entity relationship diagram (ERD.png) and data dictionary (DIMSpec for PFAS_1.0.1.20230615_data_dictionary.json) to elucidate the database structure and assist in interpretation and use.
Each feature within this dataset is the authoritative representation of the location of a sample within the U.S. Department of Energy (DOE) Office of Legacy Management (LM) Environmental Database. The dataset includes sample locations from Puerto Rico to Alaska, with point features representing different types of sample locations such as boreholes, wells, geoprobes, etc. All sample locations are maintained within the LM Environmental Database, with feature attributes defined within the associated data dictionary.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We study the behaviour and cognition of wild apes and other species (elephants, corvids, dogs). Our video archive is called the Great Ape Dictionary, you can find out more here www.greatapedictionary.com or about our lab group here www.wildminds.ac.uk We consider these videos to be a data ark that we would like to make as accessible as possible. While we are unable to make the original video files open access at the present time you can search this database to explore what is available, and then request access for collaborations of different kinds by contacting us directly or through our website.
We label all videos in the Great Ape Dictionary video archive with basic meta-data on the location, date, duration, individuals present, and behaviour present. Version 1.0.0 contains current data from the Budongo East African chimpanzee population (n=13806 videos). These datasets are being updated regularly and new data will be incorporated here with versioning. As well as the database there is a second read.me file which contains the ethograms used for each variable coded, and a short summary of other datasets that are in preparation for subsequent version(s). If you are interested in these data please contact us. Please note that not all variables are labeled for all videos, the detailed Ethogram categories are only available for a subset of data. All videos are labelled with up to 5 Contexts (at least one, rarely 5). If you are interested in finding a good example video for a particular behaviour, search for 'Library' = Y, this indicates that this clip contains a very clear example of the behaviour.
The Business Structure Database (BSD) contains a small number of variables for almost all business organisations in the UK. The BSD is derived primarily from the Inter-Departmental Business Register (IDBR), which is a live register of data collected by HM Revenue and Customs via VAT and Pay As You Earn (PAYE) records. The IDBR data are complimented with data from ONS business surveys. If a business is liable for VAT (turnover exceeds the VAT threshold) and/or has at least one member of staff registered for the PAYE tax collection system, then the business will appear on the IDBR (and hence in the BSD). In 2004 it was estimated that the businesses listed on the IDBR accounted for almost 99 per cent of economic activity in the UK. Only very small businesses, such as the self-employed were not found on the IDBR.
The IDBR is frequently updated, and contains confidential information that cannot be accessed by non-civil servants without special permission. However, the ONS Virtual Micro-data Laboratory (VML) created and developed the BSD, which is a 'snapshot' in time of the IDBR, in order to provide a version of the IDBR for research use, taking full account of changes in ownership and restructuring of businesses. The 'snapshot' is taken around April, and the captured point-in-time data are supplied to the VML by the following September. The reporting period is generally the financial year. For example, the 2000 BSD file is produced in September 2000, using data captured from the IDBR in April 2000. The data will reflect the financial year of April 1999 to March 2000. However, the ONS may, during this time, update the IDBR with data on companies from its own business surveys, such as the Annual Business Survey (SN 7451).
The data are divided into 'enterprises' and 'local units'. An enterprise is the overall business organisation. A local unit is a 'plant', such as a factory, shop, branch, etc. In some cases, an enterprise will only have one local unit, and in other cases (such as a bank or supermarket), an enterprise will own many local units.
For each company, data are available on employment, turnover, foreign ownership, and industrial activity based on Standard Industrial Classification (SIC)92, SIC 2003 or SIC 2007. Year of 'birth' (company start-up date) and 'death' (termination date) are also included, as well as postcodes for both enterprises and their local units. Previously only pseudo-anonymised postcodes were available but now all postcodes are real.
The ONS is continually developing the BSD, and so researchers are strongly recommended to read all documentation pertaining to this dataset before using the data.
Linking to Other Business Studies
These data contain IDBR reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.
Latest Edition Information
For the sixteenth edition (March 2024), data files and a variable catalogue document for 2023 have been added.
During hydrocarbon production, water is typically co-produced from the geologic formations producing oil and gas. Understanding the composition of these produced waters is important to help investigate the regional hydrogeology, the source of the water, the efficacy of water treatment and disposal plans, potential economic benefits of mineral commodities in the fluids, and the safety of potential sources of drinking or agricultural water. In addition to waters co-produced with hydrocarbons, geothermal development or exploration brings deep formation waters to the surface for possible sampling. This U.S. Geological Survey (USGS) Produced Waters Geochemical Database, which contains geochemical and other information for 114,943 produced water and other deep formation water samples of the United States, is a provisional, updated version of the 2002 USGS Produced Waters Database (Breit and others, 2002). In addition to the major element data presented in the original, the new database contains trace elements, isotopes, and time-series data, as well as nearly 100,000 additional samples that provide greater spatial coverage from both conventional and unconventional reservoir types, including geothermal. The database is a compilation of 40 individual databases, publications, or reports. The database was created in a manner to facilitate addition of new data and correct any compilation errors, and is expected to be updated over time with new data as provided and needed. Table 1, USGSPWDBv2.3 Data Sources.csv, shows the abbreviated ID of each input database (IDDB), the number of samples from each, and its reference. Table 2, USGSPWDBv2.3 Data Dictionary.csv, defines the 190 variables contained in the database and their descriptions. The database variables are organized first with identification and location information, followed by well descriptions, dates, rock properties, physical properties of the water, and then chemistry. The chemistry is organized alphabetically by elemental symbol. Each element is followed by any associated compounds (e.g. H2S is found after S). After Zr, molecules containing carbon, organic 9 compounds and dissolved gases follow. Isotopic data are found at the end of the dataset, just before the culling parameters.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset
level. This is also referred to as the package
in some CKAN documentation. This is the main
table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db
database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
The dataset, Survey-SR, provides the nutrient data for assessing dietary intakes from the national survey What We Eat In America, National Health and Nutrition Examination Survey (WWEIA, NHANES). Historically, USDA databases have been used for national nutrition monitoring (1). Currently, the Food and Nutrient Database for Dietary Studies (FNDDS) (2), is used by Food Surveys Research Group, ARS, to process dietary intake data from WWEIA, NHANES. Nutrient values for FNDDS are based on Survey-SR. Survey-SR was referred to as the "Primary Data Set" in older publications. Early versions of the dataset were composed mainly of commodity-type items such as wheat flour, sugar, milk, etc. However, with increased consumption of commercial processed and restaurant foods and changes in how national nutrition monitoring data are used (1), many commercial processed and restaurant items have been added to Survey-SR. The current version, Survey-SR 2013-2014, is mainly based on the USDA National Nutrient Database for Standard Reference (SR) 28 (2) and contains sixty-six nutrientseach for 3,404 foods. These nutrient data will be used for assessing intake data from WWEIA, NHANES 2013-2014. Nutrient profiles were added for 265 new foods and updated for about 500 foods from the version used for the previous survey (WWEIA, NHANES 2011-12). New foods added include mainly commercially processed foods such as several gluten-free products, milk substitutes, sauces and condiments such as sriracha, pesto and wasabi, Greek yogurt, breakfast cereals, low-sodium meat products, whole grain pastas and baked products, and several beverages including bottled tea and coffee, coconut water, malt beverages, hard cider, fruit-flavored drinks, fortified fruit juices and fruit and/or vegetable smoothies. Several school lunch pizzas and chicken products, fast-food sandwiches, and new beef cuts were also added, as they are now reported more frequently by survey respondents. Nutrient profiles were updated for several commonly consumed foods such as cheddar, mozzarella and American cheese, ground beef, butter, and catsup. The changes in nutrient values may be due to reformulations in products, changes in the market shares of brands, or more accurate data. Examples of more accurate data include analytical data, market share data, and data from a nationally representative sample. Resources in this dataset:Resource Title: USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES 2013-14 (Survey SR 2013-14). File Name: SurveySR_2013_14 (1).zipResource Description: Access database downloaded on November 16, 2017. US Department of Agriculture, Agricultural Research Service, Nutrient Data Laboratory. USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR), October 2015. Resource Title: Data Dictionary. File Name: SurveySR_DD.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PETROG, AGSO's Petrography Database, is a relational computer database of petrographic data obtained from microscopic examination of thin sections of rock samples. The database is designed for petrographic descriptions of crystalline igneous and metamorphic rocks, and also for sedimentary petrography. A variety of attributes pertaining to thin sections can be recorded, as can the volume proportions of component minerals, clasts and matrix.
PETROG is one of a family of field and laboratory databases that include mineral deposits, regolith, rock chemistry, geochronology, stream-sediment geochemistry, geophysical rock properties and ground spectral properties for remote sensing. All these databases rely on a central Field Database for information on geographic location, outcrops and rock samples. PETROG depends, in particular, on the Field Database's SITES and ROCKS tables, as well as a number of lookup tables of standard terms. ROCKMINSITES, a flat view of PETROG's tables combined with the SITES and ROCKS tables, allows thin-section and mineral data to be accessed from geographic information systems and plotted on maps.
This guide presents an overview of PETROG's infrastructure and describes in detail the menus and screen forms used to input and view the data. In particular, the definitions of most fields in the database are given in some depth under descriptions of the screen forms - providing, in effect, a comprehensive data dictionary of the database. The database schema, with all definitions of tables, views and indexes is contained in an appendix to the guide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of the sample of 6567 words from the website database.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
If you want to give feedback on this dataset, or wish to request it in another form (e.g csv), please fill out this survey here. We are a not-for-profit research organisation keen to see how others use our open models and tools, so all feedback is appreciated! It's a short form that takes 5 minutes to complete.
Important Note: Before downloading this dataset, please read the License and Software Attribution section at the bottom.
This dataset aligns with the work published in Centre for Net Zero's report "Hitting the Target". In this work, we simulate a range of interventions to model the situations in which we believe the UK will meet its 600,000 heat pump installation per year target by 2028. For full modelling assumptions and findings, read our report on our website.
The code for running our simulation is open source here.
This dataset contains over 9 million households that have been address matched between Energy Performance Certificates (EPC) data and Price Paid Data (PPD). The code for our address matching is here. Since these datasets are Open Government License (OGL), this dataset is too. We basically model specific columns from various datasets, as set out in our methodology section in our report, to simplify and clean up this dataset for academic use. License information is also available in the appendix of our report above.
The EPC data loaders can be found here (the data is here) and the rest of the schemas and data download locations can be found here.
Note that this dataset is not regularly maintained or updated. It is correct as of January 2022. The data was curated and tested using dbt via this Github repository and would be simple to rerun on the latest data.
The schema / data dictionary for this data can be found here.
Our recommended way of loading this data is in Python. After downloading all "parts" of the dataset to a folder. You can run:
import pandas as pd
data = pd.read_parquet("path/to/data/folder/")
Licenses and software attribution:
For EPC, PPD and UK House Price Index data:
For the EPC data, we are permitted to republish this providing we mention that all researchers who download this dataset follow these copyright restrictions. We do not explicitly release any Royal Mail address data, instead we use these fields to generate a pseudonymised "address_cluster_id" which reflects a unique combination of the address lines and postcodes, as well as other metadata. When viewing ICO and GDPR guidelines, this still counts as personal data, but we have gone to measures to pseudonymise as much as possible to fulfil our obligations as a data processor. You must read this carefully before downloading the data, and ensure that you are using it for the research purposes as determined by this copyright notice.
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Contains OS data © Crown copyright and database right 2022.
Contains Office for National Statistics data licensed under the Open Government Licence v.3.0.
The OGL v3.0 license states that we are free to:
copy, publish, distribute and transmit the Information;
adapt the Information;
exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application.
However we must (where we do any of the above):
acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence;
You can see more information here.
For XOServe Off Gas Postcodes:
This dataset has been released openly for all uses here.
For the address matching:
GNU Parallel: O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014
Of interest to pharmaceutical, nutritional, and biomedical researchers, as well as individuals and companies involved with alternative therapies and and herbal products, this database is one of the world's leading repositories of ethnobotanical data, evolving out of the extensive compilations by the former Chief of USDA's Economic Botany Laboratory in the Agricultural Research Service in Beltsville, Maryland, in particular his popular Handbook of phytochemical constituents of GRAS herbs and other economic plants (CRC Press, Boca Raton, FL, 1992). In addition to Duke's own publications, the database documents phytochemical information and quantitative data collected over many years through research results presented at meetings and symposia, and findings from the published scientific literature. The current Phytochemical and Ethnobotanical databases facilitate plant, chemical, bioactivity, and ethnobotany searches. A large number of plants and their chemical profiles are covered, and data are structured to support browsing and searching in several user-focused ways. For example, users can get a list of chemicals and activities for a specific plant of interest, using either its scientific or common name download a list of chemicals and their known activities in PDF or spreadsheet form find plants with chemicals known for a specific biological activity display a list of chemicals with their LD toxicity data find plants with potential cancer-preventing activity display a list of plants for a given ethnobotanical use find out which plants have the highest levels of a specific chemical References to the supporting scientific publications are provided for each specific result. Resources in this dataset:Resource Title: Duke-Source-CSV.zip. File Name: Duke-Source-CSV.zipResource Description: Dr. Duke's Phytochemistry and Ethnobotany - raw database tables for archival purposes. Visit https://phytochem.nal.usda.gov/phytochem/search for the interactive web version of the database.Resource Title: Data Dictionary (preliminary). File Name: DrDukesDatabaseDataDictionary-prelim.csvResource Description: This Data Dictionary describes the columns for each table. [Note that this is in progress and some variables are yet to be defined or are unused in the current implementation. Please send comments/suggestions to nal-adc-curator@ars.usda.gov ]
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset contains records of sting events and specimen samples of jellyfish (Irukanji, Halo irukanji, Box jellyfish and Morbakka) from the Venomous Jellyfish Database. This dataset contains an extract of 1081 sting events (in CSV format) from along the north Queensland coast between December 1883 to March 2017. The full database contains approximately 3000 sting events from around Australia and includes records from sources that have not yet been cleared for release.
This extract from the Venomous Jellyfish Database was made for eAtlas as part of the 2.2.3 NESP Irukandji forecasting system project. It contains jellyfish sting and specimen information. Data were compiled from numerous sources (noted in each record), including Lisa-ann Gershwin and media reports. These data will be used as part of the Irukandji forecasting model. The extract data file contains data that is publicly available.
The sting data includes primary information such as date, time of day and locality of stings, as well as secondary details such as age and gender of the sting victim, where on the body they were stung, their activity at the time of the sting and their general medical condition.
Limitations: This data shows the occurrence of reported jellyfish stings and specimens along the north Queensland coast. It does NOT provide a prediction of where jellyfish or jellyfish sting events may occur.
These records represent a fraction of known sting events and specimen collections, with more being added to the list of publicly available data as permissions are granted.
Historical data dates may be coarse, showing month and year that the sting occurred in. Some events have date only.
Methods: This data set contains information on sting events and specimen collections that have occurred around Australia, which involved venomous jellyfish (Irukandji syndrome-producing species in the genera Carukia, Malo, Morbakka).
These data were collected over numerous years by Lisa-ann Gershwin and others from various sources (primarily media). These data were entered into an excel spreadsheet, which formed the basis of the Venomous Jellyfish Database. This database was developed as part of the 2.2.3 NESP Irukandji forecasting system project.
Some data have been standardised, e.g., location information and sting site on the body. Data available to the public have been approved by the data owners, or came from a public source (e.g. newspaper reports, media alerts).
Format:
This dataset consists of one Comma Separated Value (CSV) table containing information on jellyfish events along the north Queensland coast. eAtlas Note: The original database extract was provided as an Excel spreadsheet table. This was converted to a CSV file.
Data Dictionary:
CSIRO_ID: unique id EVENT_TYPE: type of event – sting or specimen STATE: state in which event occurred REGION: broader region of State the event occurred in LOCAL_GOV_AREA: local government area that the event occurred in – if known MAIN_LOCALITY: main locality that the event occurred in SITE_INFO: site details/comments YEAR: year event occurred MONTH: month event occurred DAY: day of the month the event occurred DATE_RANGE: date range event may have occurred in EVENT_TIME: time the event occurred HH24:MI. If time is unknown then NULL EXACT_DATE: if exact date unknown then N. Use with DATE_RANGE EXACT_TIME: if exact time unknown then N. TIME_REPORTED: time event reported e.g. early afternoon, morning EVENT_RECORDED: date event reported e.g. on weekend, in February, Jan-March EVENT_COMMENTS: comments about the event LAT: latitude in decimal degrees LON: longitude in decimal degrees LOCATION_ACCURACY: How accurate the location is, 0=within a few hundred metres, 1=within a few kilometres, 2=more than a few kilometres EVENT_OFFSHORE_ONSHORE: where the event occurred (if known) – beach, island, reef LOCATION_COMMENTS: comments relating to the location of the event WATER_DEPTH_M: depth of water, in metres, that the event occurred in (if known) AGE: number: age of patient if known SEX: gender of patient if known HOME: home state/county of patient HOSPITAL: hospital the patient was treated at (if known) RETRIEVAL: method by which the patient was transported to hospital (if known) STING_SITE_REPORTED: reported sting site on the body STING_SITE_BODY: standardised area on body that sting was reported – upper limb, lower limb etc. NUMBER_STINGS: number of stings recorded, if known VISIBLE_STING: the nature of visible sting marks, if reported PPE_WORN: was Personal Protective Equipment (PPE) worn? PATIENT_COMMENTS: comments about the patient TIME_TO_ONSET: delay between sting and onset of symptoms, if reported PATIENT_CONDITION: state the patient was in, e.g. distressed, calm, stable BLOOD_PRESSUREL: comments relating to blood pressure of the patient NAUSEA: did the patient experience nausea and/or vomiting? PAIN: location and/or intensity of pain experienced by the patient SWEATING: did the patient experience sweating? TREATMENT: what treatment the patient was given DISCHARGED: when the patient was discharged from hospital ONGOING_SYMPTOMS: what ongoing symptoms the patient is experiencing NEMATO_SAMPLES: were nematocyst samples taken? SPECIES_NAME: species name, if determined PATROL: was the sting on a patrolled beach CURATOR: where the data came from e.g. Gershwin = Lisa-ann Gershwin DATA_CODE: access constraint on data PUBLIC_REFERENCE: source of the information for event ENTERED_BY: who entered the data ENTERED_DATE: when the data was entered
References:
Gershwin, L. (2013). Stung! On Jellyfish Blooms and the Future of the Ocean. Chicago, University of Chicago Press. Gershwin, L., De Nardi, M., Winkel, K.D., and Fenner, P.J. (2010) Marine Stingers: Review of an Under-Recognized Global Coastal Management Issue. Journal of Coastal Management, 38:1, 22-41, DOI: 10.1080/08920750903345031
Gershwin L, Condie SA, Mansbridge JV, Richardson AJ. 2014 Dangerous jellyfish blooms are predictable. Journal of the Royal Society. Interface 11: 20131168.http://dx.doi.org/10.1098/rsif.2013.1168
Gershwin, L., Richardson, A.J., Winkel, K.D., Fenner, P.J., Lippmann, J., Hore, R., Avila-Soria, G., Brewer, D., Kloser, R.J., Steven, A. and Condie, S. (2013). Biology and ecology of Irukandji jellyfish (Cnidaria: Cubozoa). Advances in Marine Biology 66: 1-85.
Data Location:
This dataset is filed in the eAtlas enduring data repository at: data\custodian\2016-18-NESP-TWQ-2\2.2.3_Jellyfish-early-warning\AU_NESP-TWQ-2-2-3_CSIRO_Venomous-Jellyfish-DB
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The SSURGO database contains information about soil as collected by the National Cooperative Soil Survey over the course of a century. The information can be displayed in tables or as maps and is available for most areas in the United States and the Territories, Commonwealths, and Island Nations served by the USDA-NRCS (Natural Resources Conservation Service). The information was gathered by walking over the land and observing the soil. Many soil samples were analyzed in laboratories. The maps outline areas called map units. The map units describe soils and other components that have unique properties, interpretations, and productivity. The information was collected at scales ranging from 1:12,000 to 1:63,360. More details were gathered at a scale of 1:12,000 than at a scale of 1:63,360. The mapping is intended for natural resource planning and management by landowners, townships, and counties. Some knowledge of soils data and map scale is necessary to avoid misunderstandings. The maps are linked in the database to information about the component soils and their properties for each map unit. Each map unit may contain one to three major components and some minor components. The map units are typically named for the major components. Examples of information available from the database include available water capacity, soil reaction, electrical conductivity, and frequency of flooding; yields for cropland, woodland, rangeland, and pastureland; and limitations affecting recreational development, building site development, and other engineering uses. SSURGO datasets consist of map data, tabular data, and information about how the maps and tables were created. The extent of a SSURGO dataset is a soil survey area, which may consist of a single county, multiple counties, or parts of multiple counties. SSURGO map data can be viewed in the Web Soil Survey or downloaded in ESRI® Shapefile format. The coordinate systems are Geographic. Attribute data can be downloaded in text format that can be imported into a Microsoft® Access® database. A complete SSURGO dataset consists of:
GIS data (as ESRI® Shapefiles) attribute data (dbf files - a multitude of separate tables) database template (MS Access format - this helps with understanding the structure and linkages of the various tables) metadata
Resources in this dataset:Resource Title: SSURGO Metadata - Tables and Columns Report. File Name: SSURGO_Metadata_-_Tables_and_Columns.pdfResource Description: This report contains a complete listing of all columns in each database table. Please see SSURGO Metadata - Table Column Descriptions Report for more detailed descriptions of each column.
Find the Soil Survey Geographic (SSURGO) web site at https://www.nrcs.usda.gov/wps/portal/nrcs/detail/vt/soils/?cid=nrcs142p2_010596#Datamart Title: SSURGO Metadata - Table Column Descriptions Report. File Name: SSURGO_Metadata_-_Table_Column_Descriptions.pdfResource Description: This report contains the descriptions of all columns in each database table. Please see SSURGO Metadata - Tables and Columns Report for a complete listing of all columns in each database table.
Find the Soil Survey Geographic (SSURGO) web site at https://www.nrcs.usda.gov/wps/portal/nrcs/detail/vt/soils/?cid=nrcs142p2_010596#Datamart Title: SSURGO Data Dictionary. File Name: SSURGO 2.3.2 Data Dictionary.csvResource Description: CSV version of the data dictionary
Dataset III and dictionary III. Excel spreadsheet and Data Dictionary that contain information on tissue samples of suspected Melanoma cases including specimens such as presence of tumor, tissue source and other relevant tissue information relevant to genomic analysis.
This data set holds the publicly-available version of the database of water-dependent assets that was compiled for the bioregional assessment (BA) of the Galilee subregion as part of the Bioregional Assessment Technical Programme. Though all life is dependent on water, for the purposes of a bioregional assessment, a water-dependent asset is an asset potentially impacted by changes in the groundwater and/or surface water regime due to coal resource development. The water must be other than local rainfall. Examples include wetlands, rivers, bores and groundwater dependent ecosystems.
The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets including Natural Resource Management regions, and Australian and state and territory government databases. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived. A single asset is represented spatially in the asset database by single or multiple spatial features (point, line or polygon). Individual points, lines or polygons are termed elements.
This dataset contains the unrestricted publicly-available components of spatial and non-spatial (attribute) data of the (restricted) Asset database for the Galilee subregion on 04 January 2016 (12ff5782-a3d9-40e8-987c-520d5fa366dd);. The database is provided primarily as an ESRI File geodatabase (.gdb), which is able to be opened in readily available open source software such as QGIS. Other formats include the Microsoft Access database (.mdb in ESRI Personal Geodatabase format), industry-standard ESRI Shapefiles and tab-delimited text files of all the attribute tables.
The restricted version of the Galilee Asset database has a total count of 403 918 Elements and 4 426 Assets. In the public version of the Asset GalileeGalilee database 13759 spatial Element features (~3%) have been removed from the Element List and Element Layer(s) and 352 spatial Assets (~8%) have been removed from the spatial Asset Layer(s)
The elements/assets removed from the restricted Asset Database are from the following data sources:
1) Environmental Asset Database - Commonwealth Environmental Water Office - RESTRICTED (Metadata only) (29fd1654-8aa1-4cb3-b65e-0b37698ac9a6)
2) Key Environmental Assets - KEA - of the Murray Darling Basin RESTRICTED (Metadata only) (9948195e-3d3b-49dc-96d2-ea7765297308)
3) Species Profile and Threats Database (SPRAT) - RESTRICTED - Metadata only) (7276dd93-cc8c-4c01-8df0-cef743c72112)
4) Australia, Register of the National Estate (RNE) - Spatial Database (RNESDB) (Internal 878f6780-be97-469b-8517-54bd12a407d0)
5) Communities of National Environmental Significance Database - RESTRICTED - Metadata only (c01c4693-0a51-4dbc-bbbd-7a07952aa5f6)
These important assets are included in the bioregional assessment, but are unable to be publicly distributed by the Bioregional Assessment Programme due to restrictions in their licensing conditions. Please note that many of these data sets are available directly from their custodian. For more precise details please see the associated explanatory Data Dictionary document enclosed with this dataset.
The public version of the asset database retains all of the unrestricted components of the Asset database for the Galilee subregion on 04 January 2016 - any material that is unable to be published or redistributed to a third party by the BA Programme has been removed from the database. The data presented corresponds to the assets published Cooper subregion product 1.3: Description of the water-dependent asset register and asset list for the Galilee subregion on 04 January 2016, and the associated Water-dependent asset register and asset list for the Galilee subregion on 04 January 2016.
Individual spatial features or elements are initially included in database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). In accordance to BA submethodology M02: Compiling water-dependent assets, individual spatial elements are then grouped into assets which are evaluated by project teams to determine whether they meet materiality test 2 (M2), which are assets that are considered to be water dependent.
Following delivery of the first pass asset list, project teams make a determination as to whether an asset (comprised of one or more elements) is water dependent, as assessed against the materiality tests detailed in the BA Methodology. These decisions are provided to ERIN by the assessment team and incorporated into the AssetList table in the Asset database.
Development of the Asset Register from the Asset database:
Decisions for M0 (fit for BA purpose), M1 (PAE) and M2 (water dependent) determine which assets are included in the "asset list" and "water-dependent asset register" which are published as Product 1.3.
The rule sets are applied as follows:
M0 M1 M2 Result
No n/a n/a Asset is not included in the asset list or the water-dependent asset register
(≠ No) No n/a Asset is not included in the asset list or the water-dependent asset register
(≠ No) Yes No Asset included in published asset list but not in water dependent asset register
(≠ No) Yes Yes Asset included in both asset list and water-dependent asset register
Assessment teams are then able to use the database to assign receptors and impact variables to water-dependent assets and the development of a receptor register as detailed in BA submethodology M03: Assigning receptors to water-dependent assets and the receptor register is then incorporated into the asset database.
At this stage of its development, the for the Galilee subregion on 04 January 2016, which this document describes, does not contain any receptor information.
Bioregional Assessment Programme (2013) Asset database for the Galilee subregion on 04 January 2016 Public. Bioregional Assessment Derived Dataset. Viewed 10 December 2018, http://data.bioregionalassessments.gov.au/dataset/eb4cf797-9b8f-4dff-9d7a-a5dfbc8d2bed.
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204
Derived From Queensland QLD - Regional - NRM - Water Asset Information Tool - WAIT - databases
Derived From Matters of State environmental significance (version 4.1), Queensland
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From South Australia SA - Regional - NRM Board - Water Asset Information Tool - WAIT - databases
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From National Groundwater Information System (NGIS) v1.1
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Queensland QLD Regional CMA Water Asset Information WAIT tool databases RESTRICTED Includes ALL Reports
Derived From Asset database for the Galilee subregion on 04 January 2016
Derived From Environmental Asset Database - Commonwealth Environmental Water Office
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores v3 03122014
Derived From QLD Dept of Natural Resources and Mines, Surface Water Entitlements 131204
Derived From Ramsar Wetlands of Australia
Derived From Permanent and Semi-Permanent Waterbodies of the Lake Eyre Basin (Queensland and South Australia) (DRAFT)
Derived From Asset database for the Galilee subregion on 2 December 2014
Derived From Key Environmental Assets - KEA - of the Murray Darling Basin
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores and NGIS v4 28072014
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From Great Artesian Basin and Laura Basin groundwater recharge areas
Derived From [QLD DNRM Licence Locations
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.