Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptions of the parcel tables and fields.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset results from 13 data description sessions conducted at U. Porto. In each session researchers have created metadata in the Dendro, research data management platform. A project for each session was created beforehand in Dendro and all the sessions were kept under the same account. All projects were kept private. This was explained to the researchers and they could have changed any information if they wanted to. When scheduling the sessions researchers were asked to choose a dataset to describe. The sessions started by introducing researchers to Dendro with a brief demonstration of its features. The researchers were then asked to create a folder and upload their datasets. During the session the selection of descriptors was mostly up to them. Exceptionally, they were asked if a given descriptor was suitable to contextualize their data. Sessions audio was recorded with the researchers’ consent and were deleted after the transcription of relevant events and comments during each session to complement the analysis of the metadata produced. The audio was also used to mark the moment the researchers started and finished the description, in order to ascertain the session duration.
The data set description provides a detail account of the type of data that is used within the peer-reviewed literature. The data involves special instrumentation, such as hyperspectral imaging cameras to develop thousands of pixels, which form images, like on a television screen. Other data is used to develop absorbance spectra from infrared spectrometers and compared to reference data to confirm the presence of a desired, tested chemical. This dataset is associated with the following publication: Baseley, D., L. Wunderlich, G. Phillips, K. Gross, G. Perram, S. Willison, M. Magnuson, S. Lee, R. Phillips, and W. Harper Jr.. Hyperspectral Analysis for Standoff Detection of Dimethyl Methylphosphonate on Building Materials [HS7.52.01]. JOURNAL OF ENVIRONMENTAL MANAGEMENT. Elsevier Science Ltd, New York, NY, USA, 135-142, (2016).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset consists in data from 13 multi-domain data description sessions follow-up questionnaires. Researchers from the University of Porto participated in a data description session and filled in a follow-up questionnaire to assess their interest in research data management, the usefulness of data description, among others. The questionnaire was conducted on Google Forms and the data copied to a spreadsheet, because the questionnaires were made individually taking into account the specificity of one of the questions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).
The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).
Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset
The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.
Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.
The 25 fields of the dataset are:
| Attributes | Definition | Completeness |
| ------------- | ------------- | ------------- |
| bookId | Book Identifier as in goodreads.com | 100 |
| title | Book title | 100 |
| series | Series Name | 45 |
| author | Book's Author | 100 |
| rating | Global goodreads rating | 100 |
| description | Book's description | 97 |
| language | Book's language | 93 |
| isbn | Book's ISBN | 92 |
| genres | Book's genres | 91 |
| characters | Main characters | 26 |
| bookFormat | Type of binding | 97 |
| edition | Type of edition (ex. Anniversary Edition) | 9 |
| pages | Number of pages | 96 |
| publisher | Editorial | 93 |
| publishDate | publication date | 98 |
| firstPublishDate | Publication date of first edition | 59 |
| awards | List of awards | 20 |
| numRatings | Number of total ratings | 100 |
| ratingsByStars | Number of ratings by stars | 97 |
| likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
| setting | Story setting | 22 |
| coverImg | URL to cover image | 99 |
| bbeScore | Score in Best Books Ever list | 100 |
| bbeVotes | Number of votes in Best Books Ever list | 100 |
| price | Book's price (extracted from Iberlibro) | 73 |
This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Table explaining the naming convention and parameters of FLEA experiments performed in the laboratory.
💁♀️Please take a moment to carefully read through this description and metadata to better understand the dataset and its nuances before proceeding to the Suggestions and Discussions section.
This dataset provides a comprehensive collection of setlists from Taylor Swift’s official era tours, curated expertly by Spotify. The playlist, available on Spotify under the title "Taylor Swift The Eras Tour Official Setlist," encompasses a diverse range of songs that have been performed live during the tour events of this global artist. Each dataset entry corresponds to a song featured in the playlist.
Taylor Swift, a pivotal figure in both country and pop music scenes, has had a transformative impact on the music industry. Her tours are celebrated not just for their musical variety but also for their theatrical elements, narrative style, and the deep emotional connection they foster with fans worldwide. This dataset aims to provide fans and researchers an insight into the evolution of Swift's musical and performance style through her tours, capturing the essence of what makes her tour unique.
Obtaining the Data: The data was obtained directly from the Spotify Web API, specifically focusing on the setlist tracks by the artist. The Spotify API provides detailed information about tracks, artists, and albums through various endpoints.
Data Processing: To process and structure the data, Python scripts were developed using data science libraries such as pandas for data manipulation and spotipy for API interactions, specifically for Spotify data retrieval.
Workflow:
Authentication API Requests Data Cleaning and Transformation Saving the Data
Note: Popularity score reflects the score recorded on the day that retrieves this dataset. The popularity score could fluctuate daily.
This dataset, derived from Spotify focusing on Taylor Swift's The Eras Tour setlist data, is intended for educational, research, and analysis purposes only. Users are urged to use this data responsibly, ethically, and within the bounds of legal stipulations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the SQL tables of the training and test datasets used in our experimentation. These tables contain the preprocessed textual data (in a form of tokens) extracted from each training and test project. Besides the preprocessed textual data, this dataset also contains meta-data about the projects, GitHub topics, and GitHub collections. The GitHub projects are identified by the tuple “Owner” and “Name”. The descriptions of the table fields are attached to their respective data descriptions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.
Data content areas include:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the SANDBOX research project, we investigated the natural dynamics of the North Sea bed. As part of this research, we conducted multiple research cruises on the North Sea. The documents in this dataset explain which data was collected, when it was collected and the structure of the data repository (svn.citg.tudelft.nl/sandbox).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These core photos and descriptions were taken from the five boreholes that were drilled as part of the kISMET SubTER project conducted at the Sanford Underground Research Facility (SURF) in Lead, SD. The boreholes are subvertical in orientation, and were drilled on the 4850 level of SURF on the West Drift, about 450 feet from Governor's Corner. The well heads for the five wells are in a line, but the outer two wells (k001 and k005) were deviated to form a five-spot configuration at 50 m depth. Four of the five boreholes have a nominal depth of 50 m and have HQ core - the fifth, located in the center (k003) was drilled to a depth of 100m and has NQ core. The central borehole was used for stress and hydraulic fracturing - the other four boreholes were used for monitoring purposes. Core logging was conducted by Paul Cook (LBNL), Bill Roggenthen (SDSMT), and Drew Siler (LBNL). All core consists of rocks from the Poorman Formation. Some of the core photos are missing. These have been documented in the included spreadsheets labeled with the well name and the word missing. The locations of the boreholes are documented on the included map and spreadsheet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the results of developing alternative text for images using chatbots based on large language models. The study was carried out in April-June 2024. Microsoft Copilot, Google Gemini, and YandexGPT chatbots were used to generate 108 text descriptions for 12 images. Descriptions were generated by chatbots using keywords specified by a person. The experts then rated the resulting descriptions on a Likert scale (from 1 to 5). The data set is presented in a Microsoft Excel table on the “Data” sheet with the following fields: record number; image number; chatbot; image type (photo, logo); request date; list of keywords; number of keywords; length of keywords; time of compilation of keywords; generated descriptions; required length of descriptions; actual length of descriptions; description generation time; usefulness; reliability; completeness; accuracy; literacy. The “Images” sheet contains links to the original images. Data set is presented in Russian.
Archival Descriptions from the National Archives Catalog data set provides archival descriptions of the permanent holdings of the federal government in the custody of the National Archives and Records administration. The archival descriptions include information on traditional paper holdings, logical data records (electronic records), and artifacts.
The USDA-Agricultural Research Service carried out a water productivity field trial for irrigated maize (Zea mays L.) at the Limited Irrigation Research Farm (LIRF) facility in northeastern Colorado in 2008 through 2011. The dataset includes daily measurements of irrigation, precipitation, soil water storage, and plant growth; daily estimates of crop evapotranspiration; and seasonal measurement of crop water use and crop yield. Soil parameters and hourly and daily weather data are also provided. The dataset can be useful to validate and refine maize crop models. The data are presented in spreadsheet format. The primary data files are the four annual LIRF Maize 20xx.xlsx files that include the daily water balance and phenology, final yield and biomass data, and crop management logs. Annual LIRF Weather 20xx.xlsx files provide hourly and daily weather parameters including reference evapotranspiration. The LIRF Soils.xlsx file gives soil parameters. Each spreadsheet contains a Data Descriptions worksheet that provides worksheet or column specific information. Comments are embedded in cells with specific information. A LIRF photos.pdf file provides images of the experimental area, measurement processes and crop conditions. Photo credit Peggy Greb, ARS; copyright-free, public domain copyright policy. Resources in this dataset:Resource Title: LIRF Weather 2008. File Name: LIRF Weather 2008.xlsxResource Description: LIRF hourly and daily weather data for 2008Resource Title: LIRF Weather 2009. File Name: LIRF Weather 2009.xlsxResource Description: LIRF hourly and daily weather data for 2009Resource Title: LIRF Weather 2010. File Name: LIRF Weather 2010.xlsxResource Description: LIRF hourly and daily weather data for 2010Resource Title: LIRF Weather 2011. File Name: LIRF Weather 2011.xlsxResource Description: LIRF hourly and daily weather data for 2011Resource Title: LIRF Soils. File Name: LIRF Soils.xlsxResource Description: LIRF soil maps, soil texture, moisture retention, and chemical constituentsResource Title: LIRF Photo Log. File Name: LIRF Photo Log.pdfResource Description: Photos of the LIRF Water Productivity field trials and instrumentation.Resource Title: Data Dictionaries. File Name: DataDictionary r1.xlsxResource Description: Data descriptions of all the data resources (also included in their respective data files).Resource Title: LIRF Methodology. File Name: LIRF Methodology r1.pdfResource Description: Description of data files, data, and data collection methodology for the LIRF 2008-2011 Water Productivity field trials.Resource Title: LIRF Maize 2008. File Name: LIRF Maize 2008 r1.xlsxResource Description: Water balance and yield data for 2008 LIRF field trialResource Title: LIRF Maize 2009. File Name: LIRF Maize 2009 r1.xlsxResource Description: Water balance and yield data for 2009 LIRF field trialResource Title: LIRF Maize 2010. File Name: LIRF Maize 2010 r1.xlsxResource Description: Water balance and yield data for 2010 LIRF field trialResource Title: LIRF Maize 2011. File Name: LIRF Maize 2011 r1.xlsxResource Description: Water balance and yield data for 2011 LIRF field trial
Descriptions for each city department and program, taken from the budget Blue Books from the CAO site.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.
Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.
The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.
Fork this kernel to get started.
https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research
Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @rawpixel from Unplash.
What are the top ten most common types of physicians in Mountain View?
What are the names and phone numbers of dentists in California who studied public health?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptions of the parcel tables and fields.