The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Excel spreadsheet of the 100 male and female first names for each year since 1954 to most recent year, based on births registered in New Zealand during each year.
The first names file contains data on the first names attributed to children born in France since 1900. These data are available at the level of France and by department. The files available for download list births and not living people in a given year. They are available in two formats (DBASE and CSV). To use these large files, it is recommended to use a database manager or statistical software. The file at the national level can be opened from some spreadsheets. The file at the departmental level is however too large (3.8 million lines) to be consulted with a spreadsheet, so it is proposed in a lighter version with births since 2000 only. The data can be accessed in: - a national data file containing the first names attributed to children born in France between 1900 and 2022 (data before 2012 relate only to France outside Mayotte) and the numbers by sex associated with each first name; - a departmental data file containing the same information at the department of birth level; - a lighter data file that contains information at the department level of birth since the year 2000.
Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Rank and count of the top names for baby boys, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year.
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107.With over 218 million forms based on 100,000 lemmas, this full-form database covers Arab personal names (both given names and surnames) in both Arabic and English and contains a rich set of romanized name variants for each name with a variety of supplementary information such as gender, name type and frequency statistics. This comprehensive lexicon (over 6.4 million variants) contains precise phonemic transcriptions and vocalized Arabic for all inflected and cliticized forms for each name.This database is provided with three options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the three proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications.Quantity and size: 218,215,875 lines / 32,659 MB (31.9 GB)File format: flat TSV text filesSamples and a specifications document available upon request.
Our Price Paid Data includes information on all property sales in England and Wales that are sold for value and are lodged with us for registration.
Get up to date with the permitted use of our Price Paid Data:
check what to consider when using or publishing our Price Paid Data
If you use or publish our Price Paid Data, you must add the following attribution statement:
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Price Paid Data is released under the http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/" class="govuk-link">Open Government Licence (OGL). You need to make sure you understand the terms of the OGL before using the data.
Under the OGL, HM Land Registry permits you to use the Price Paid Data for commercial or non-commercial purposes. However, OGL does not cover the use of third party rights, which we are not authorised to license.
Price Paid Data contains address data processed against Ordnance Survey’s AddressBase Premium product, which incorporates Royal Mail’s PAF® database (Address Data). Royal Mail and Ordnance Survey permit your use of Address Data in the Price Paid Data:
If you want to use the Address Data in any other way, you must contact Royal Mail. Email address.management@royalmail.com.
The following fields comprise the address data included in Price Paid Data:
The January 2025 release includes:
As we will be adding to the January data in future releases, we would not recommend using it in isolation as an indication of market or HM Land Registry activity. When the full dataset is viewed alongside the data we’ve previously published, it adds to the overall picture of market activity.
Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.
Google Chrome (Chrome 88 onwards) is blocking downloads of our Price Paid Data. Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.
We update the data on the 20th working day of each month. You can download the:
These include standard and additional price paid data transactions received at HM Land Registry from 1 January 1995 to the most current monthly data.
Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.
The data is updated monthly and the average size of this file is 3.7 GB, you can download:
<
Description:This data deposit contains the Numerical Identification Death Files (National Archives Identifier 23845618), the NUMIDENT SS-5 Application Files (National Archives Identifier 23845613), the NUMIDENT Claims Files (National Archives Identifier 23852747), and the associated technical documentation. Data Acquisition:These files were e-delivered to Anthony Wray via secure link by the Electronic Records Division of the National Archives and Records Administration (NARA) on 17 October 2019, as per a digitized reproduction order (Quote QO1-525370500 and Quote QO1-528389077). The packing slip is included in the data deposit (docs/Packing Slip.PDF).Rights to Publish:The data are in the public domain, as confirmed by emails received from NARA on 28 December 2023 and 3 January 2024 (see docs/permission_to_publish_email.pdf).How to Cite: Please adhere to the citation and data usage guidelines when using this dataset. See the included LICENSE.txt and README.md files for details. Details:The Numerical Identification Files (NUMIDENT), 1936–2007, series contains records for every Social Security number (SSN) assigned to individuals with a verified death or who would have been over 110 years old by December 31, 2007. There are three types of entries in NUMIDENT: application (SS-5), claim, and death records. A NUMIDENT record may contain more than one entry. Information contained in NUMIDENT records includes: each applicant's full name, SSN, date of birth, place of birth, citizenship, sex, father's name, mother's maiden name, and race/ethnic description (optional). NUMIDENT includes information regarding any subsequent changes made to the applicant's record, including name changes and life or death claims. The death records in NUMIDENT do not include any State reported deaths in accordance with the Social Security Act section 205(r). There are 72,182,729 SS-5 records entries; 25,230,486 claim record entries; and 49,459,293 death record entries.See https://catalog.archives.gov/id/12004494 for more information.Related Data:Visit the CenSoc Project for public micro datasets linked to NUMIDENT: https://censoc.berkeley.edu/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.
It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.
International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.
UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.
The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.
The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description of the INSPIRE Download Service (predefined Atom): Saarland (and adjacent areas) (2011) Hierarchically structured natural spatial structure (natural spaces first to fourth order). Shown is the TK grid rectangle 6304-6810 with the Saarland as the center. Attributes: IDENTIFICATION: ID of the surface element in the original database; NATNR1_SLL: Number of first-order natural space; NATNAM1: Name of the first order natural space; NATNR2_SLL: Number of second-order natural space; NATNAM2: Name of the second-order natural space; NATNR3_SLL: Number of third-order natural space; NATNR3_MS: The number of the large unit; NATNAM3: Name of third order natural space; NATNR4_SLL: Number of the fourth order natural space; NATNR4_MS: The number of the subunit; NATNAM4: Name of the fourth order natural space; NRBEM: Remark; NRZINFO: Natural space More information; Viewing object in the GDZ; the MultiFeature class (composed of area feature class GDZ2010.A_ngnraum and the business table with the property data (GDZ2010.ngnraum)) has been exported to the filegeodatabase; The following user-relevant attributes are available: ID NATNR1: Natural space number 1.Order NATNAM1: Natural space name 1. Order NATNR2: Natural space number 2.Order NATNAM2: Natural space name 2. Order NATNR3: 3.Order NATNAM3: Natural space name 3. Order NATNR4: 4.Order NATNAM4: Natural space name 4. Order NRZINFO: Natural area More information NRBEM: Note — The link(s) for downloading the records is/are generated dynamically from getFeature Requests to a WFS 1.1.0
Data from this project focuses on the evaluation of breeding lines. Significant progress was made in advancing breeding populations directed towards release of improved varieties in Tanzania. Thirty promising F4:7, 1st generation 2014 PIC (Phaseolus Improvement Cooperative) and ~100 F4:6, 2nd generation 2015 PIC breeding lines were selected. In addition, ~300 F4:5, 3rd generation 2016 PIC single plant selections were completed in Arusha and Mbeya. These breeding lines, derived from 109 PIC populations specifically developed to combine abiotic and biotic stress tolerance, showed superior agronomic potential compared with checks and local landraces. The diversity, scale, and potential of the material in the PIC breeding pipeline is invaluable and requires continued support to ensure the release of varieties that promise to increase the productivity of common bean in the E. African region. Data available includes databases, spreadsheets, and images related to the project. Resources in this dataset:Resource Title: Data Dictionary. File Name: ADP-1_DD.pdfResource Title: ADP-1 Database. File Name: ADP1-DB.zipResource Description: This file is a link to a draft version of the development and characterization of the common bean diversity panel (ADP) database in Microsoft Access. Preliminary information is provided in this database, while the full version is being prepared. In order to use the database you’ll need to download the complete file, extract it and open the MS access file. You must allow active content when opening the database for it to work properly. Downloaded on November 17, 2017.Resource Title: Anthracnose Screening of Andean Diversity Panel (ADP) . File Name: Anthracnose-screening-of-ADP.pdfResource Description: Approximately 230 ADP lines of the ADP were screened with 8 races of anthracnose under controlled conditions at Michigan State University. Dr. James Kelly has provided this valuable dataset for sharing in light of the Open Data policy of the US government. This dataset represents the first comprehensive screening of the ADP with a broad set of races of a specific pathogen.Resource Title: ARS - Feed the Future Shared Data . File Name: ARS-FtF-Data-Sharing.zipResource Description: The data provided herein is an early draft version of the data that has been generated by the ARS Feed-the-Future Grain Legumes Project that is focused on common bean research. Resource Title: PIC (Phaseolus Improvement Cooperative) Populations . File Name: PIC-breeding-populations.xlsxResource Description: The complete list of PIC breeding populations (Excel Format) PIC (Phaseolus Improvement Cooperative) populations are bulked populations for improvement of common bean in Feed the Future Countries, with a principal focus on sub-Saharan Africa. These populations are for distribution to collaborators, are segregating for key biotic and abiotic stress constraints, and can be used for selection and release of improved cultivars/germplasm. Many of these populations are derived from crosses between ADP landrances and cultivars from sub-Saharan Africa and other improved genotypes with key biotic or abiotic stress tolerance. Phenotypic and genotypic information related to the parents of the crosses can be found in the ADP Database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The first public release of the GRID database. Please note, the csv download only includes IDs, names & locations. See the JSON download for all metadata including types & relationships Please see here for a descriotion of the database format: https://www.grid.ac/format Release notes: Database seeded from research institutes in grant data from over 65 global funders. GeoNames IDs added to all institutes. NUTS codes added to all European institutes. Metadata added for the top 3000 Universities, majority of Germany and Australia and many more. Parent / Child relationships added for 65 super institute members (e.g. Max Planck, Chinese Academy of Sciences, etc.) External identification systems: - HESA institution codes (Higher Education Statistics Agency UK) - UCAS institution codes (Universities and Colleges Admissions Service, UK) - UKPRN institution codes (UK Provider Reference Number, UK) - 4373 Fundref codes
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. • How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 EthnicSeer: http://singularity.ist.psu.edu/ethnicity Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada SexMachine 0.1.1: https://pypi.org/project/SexMachine First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. • The code and back-end data is periodically updated and made available for query at Torvik Research Group • What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Training.gov.au (TGA) is the National Register of Vocational Education and Training in Australia and contains authoritative information about Registered Training Organisations (RTOs), Nationally Recognised Training (NRT) and the approved scope of each RTO to deliver NRT as required in national and jurisdictional legislation.
TGA has a web service available to allow external systems to access and utilise information stored in TGA through an external system. The TGA web service is exposed through a single interface and web service users are assigned a data reader role which will apply to all data stored in the TGA.
The web service can be broadly split into three categories:
RTOs and other organisation types;
Training components including Accredited courses, Accredited course Modules Training Packages, Qualifications, Skill Sets and Units of Competency;
System metadata including static data and statistical classifications.
Users will gain access to the TGA web service by first passing a user name and password through to the web server. The web server will then authenticate the user against the TGA security provider before passing the request to the application that supplies the web services.
There are two web services environments:
1. Production - ws.training.gov.au – National Register production web services
2. Sandbox - ws.sandbox.training.gov.au – National Register sandbox web services.
The National Register sandbox web service is used to test against the current version of the web services where the functionality will be identical to the current production release. The web service definition and schema of the National Register sandbox database will also be identical to that of production release at any given point in time. The National Register sandbox database will be cleared down at regular intervals and realigned with the National Register production environment.
Each environment has three configured services:
Organisation Service;
Training Component Service; and
Classification Service.
To access the download area for web services, navigate to http://tga.hsd.com.au and use the below name and password:
Username: WebService.Read (case sensitive)
Password: Asdf098 (case sensitive)
This download area contains various versions of the following artefacts that you may find useful
• Training.gov.au web service specification document;
• Training.gov.au logical data model and definitions document;
• .NET web service SDK sample app (with source code);
• Java sample client (with source code);
• How to setup web service client in VS 2010 video; and
• Web services WSDL's and XSD's.
For the business areas, the specification/definition documents and the sample application is a good place to start while the IT areas will find the sample source code and the video useful to start developing against the TGA web services.
The web services Sandbox end point is: https://ws.sandbox.training.gov.au/Deewr.Tga.Webservices
Once you are ready to access the production web service, please email the TGA team at tgaproject@education.gov.au to obtain a unique user name and password.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/
RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.
This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
The following tables are included in the RxNorm dataset:
RXNCONSO contains concept and source information
RXNREL contains information regarding relationships between entities
RXNSAT contains attribute information
RXNSTY contains semantic information
RXNSAB contains source info
RXNCUI contains retired rxcui codes
RXNATOMARCHIVE contains archived data
RXNCUICHANGES contains concept changes
Update Frequency: Monthly
Fork this kernel to get started with this dataset.
https://www.nlm.nih.gov/research/umls/rxnorm/
https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm
https://cloud.google.com/bigquery/public-data/rxnorm
Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.
Banner Photo by @freestocks from Unsplash.
What are the RXCUI codes for the ingredients of a list of drugs?
Which ingredients have the most variety of dose forms?
In what dose forms is the drug phenylephrine found?
What are the ingredients of the drug labeled with the generic code number 072718?
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
ISO 3166-1-alpha-2 English country names and code elements. This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For many countries lat/lng are determined with an algorithm that searches the place names in the main geonames database using administrative divisions and numerical vicinity of the postal codes as factors in the disambiguation of place names. For postal codes and place name for which no corresponding toponym in the main geonames database could be found an average lat/lng of 'neighbouring' postal codes is calculated. Please let us know if you find any errors in the data set. ThanksFor Canada we have only the first letters of the full postal codes (for copyright reasons)For Ireland we have only the first letters of the full postal codes (for copyright reasons)For Malta we have only the first letters of the full postal codes (for copyright reasons)The Argentina data file contains 4-digit postal codes which were replaced with a new system in 1999.For Brazil only major postal codes are available (only the codes ending with -000 and the major code per municipality).For India the lat/lng accuracy is not yet comparable to other countries.Update frequency: 1 month
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.
It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.
The date for every symbol is saved in CSV format with common fields:
All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv
contains some additional metadata for each ticker such as full name.
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.