13 datasets found

Pokemon database
kaggle.com
zip
Updated Nov 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silvio Sopic (2024). Pokemon database [Dataset]. https://www.kaggle.com/datasets/silviosopic/pokemon-database
Explore at:
zip(16051 bytes)Available download formats
Dataset updated
Nov 18, 2024
Authors
Silvio Sopic
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description:

This dataset contains detailed information about Pokémon evolution chains in a both a wide and a long format. It's derived from the PokéAPI (https://pokeapi.co/) using web scraping techniques. The data includes details on the evolving Pokémon, evolution triggers, conditions, and other relevant information.

Column description:

Features long: - Evolving From (str): The name of the Pokémon that evolves. - Evolving To (str): The name of the Pokémon that the previous Pokémon evolves into. - trigger (str): The specific trigger for evolution (e.g., "level_up"). - Condition (str): The specific condition for evolution (e.g., "Item"). - value (str): The value for the evolution condition (e.g., "fire-stone").

Features wide:

Evolving From (str): The name of the Pokémon that evolves.

Evolving To (str): The name of the Pokémon that the previous Pokémon evolves into.

turn_upside_down (bool): Whether the Pokémon needs to be turned upside down to evolve.

trade_species (str): The required Pokémon to trade for evolution (if applicable).

time_of_day (str): Specific time of day required for evolution (if applicable).

relative_physical_stats (str/None): Information about the relative physical stats required for evolution (if applicable).

party_type (str): The required party type for evolution (if applicable).

party_species (str): The required Pokémon in the party for evolution (if applicable).

Item (str): The required item held for evolution (if applicable).

needs_overworld_rain (bool): Whether overworld rain is needed for evolution.

min_level (int/None): The minimum level required for evolution.

min_happiness (int/None): The minimum happiness required for evolution.

trigger (str): The specific trigger for evolution (e.g., "level_up").

min_beauty (int/None): The minimum beauty required for evolution (if applicable).

min_affection (int/None): The minimum affection required for evolution (if applicable).

gender (str): The required gender for evolution (if applicable).

held_item (str): The required held item for evolution (if applicable, name extracted from dictionary).

known_move (str): The required move known for evolution (if applicable, name extracted from dictionary).

known_move_type (str): The required move type known for evolution (if applicable, name extracted from dictionary).

location (str): The required location for evolution (if applicable, name extracted from dictionary).

Format:

Wide format CSV file (evolution_wide.csv) Long format CSV file (evolution_long.csv)

Source:

PokéAPI (https://pokeapi.co/) License:

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) (https://creativecommons.org/licenses/by-nc-sa/4.0/)
Note:

This dataset is a derivative work of PokéAPI data and adheres to their licensing terms. "Pokémon" and character names are trademarks of Nintendo. Please feel free to use and modify this dataset for non-commercial purposes, with proper attribution.

Sources and related content
f
Data_Sheet_1_In Praise of Artifice Reloaded: Caution With Natural Image...
frontiersin.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marina Martinez-Garcia; Marcelo Bertalmío; Jesús Malo (2023). Data_Sheet_1_In Praise of Artifice Reloaded: Caution With Natural Image Databases in Modeling Vision.pdf [Dataset]. http://doi.org/10.3389/fnins.2019.00008.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnins.2019.00008.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Marina Martinez-Garcia; Marcelo Bertalmío; Jesús Malo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Subjective image quality databases are a major source of raw data on how the visual system works in naturalistic environments. These databases describe the sensitivity of many observers to a wide range of distortions of different nature and intensity seen on top of a variety of natural images. Data of this kind seems to open a number of possibilities for the vision scientist to check the models in realistic scenarios. However, while these natural databases are great benchmarks for models developed in some other way (e.g., by using the well-controlled artificial stimuli of traditional psychophysics), they should be carefully used when trying to fit vision models. Given the high dimensionality of the image space, it is very likely that some basic phenomena are under-represented in the database. Therefore, a model fitted on these large-scale natural databases will not reproduce these under-represented basic phenomena that could otherwise be easily illustrated with well selected artificial stimuli. In this work we study a specific example of the above statement. A standard cortical model using wavelets and divisive normalization tuned to reproduce subjective opinion on a large image quality dataset fails to reproduce basic cross-masking. Here we outline a solution for this problem by using artificial stimuli and by proposing a modification that makes the model easier to tune. Then, we show that the modified model is still competitive in the large-scale database. Our simulations with these artificial stimuli show that when using steerable wavelets, the conventional unit norm Gaussian kernels in divisive normalization should be multiplied by high-pass filters to reproduce basic trends in masking. Basic visual phenomena may be misrepresented in large natural image datasets but this can be solved with model-interpretable stimuli. This is an additional argument in praise of artifice in line with Rust and Movshon (2005).
PICKLE 2.0: A human protein-protein interaction meta-database employing data...
plos.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aris Gioutlakis; Maria I. Klapa; Nicholas K. Moschonas (2023). PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology [Dataset]. http://doi.org/10.1371/journal.pone.0186039
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0186039
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Aris Gioutlakis; Maria I. Klapa; Nicholas K. Moschonas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes.
Z
Data Analysis for the Systematic Literature Review of DL4SE
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4768586
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Washington and Lee University
College of William and Mary
Authors
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
o
Beach Litter - Median number of total abundance items normalized per 100m &...
nodc.ogs.it
Updated 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EMODnet Chemistry (2021). Beach Litter - Median number of total abundance items normalized per 100m & to 1 survey - Other sources 2001/2020 v2021 [Dataset]. http://doi.org/10.13120/5615830e-8b8e-42e1-8050-69a6d5e3d0b5
Explore at:
Unique identifier
https://doi.org/10.13120/5615830e-8b8e-42e1-8050-69a6d5e3d0b5
Dataset updated
2021
Dataset provided by
EMODnet Chemistry
datacite
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Dataset funded by
European Commission
Description
This visualization product displays the total abundance of marine macro-litter (> 2.5cm) per beach per year from non-MSFD monitoring surveys, research & cleaning operations.

EMODnet Chemistry included the collection of marine litter in its 3rd phase. Since the beginning of 2018, data of beach litter have been gathered and processed in the EMODnet Chemistry Marine Litter Database (MLDB).
The harmonization of all the data has been the most challenging task considering the heterogeneity of the data sources, sampling protocols and reference lists used on a European scale.

Preliminary processing were necessary to harmonize all the data:
- Exclusion of OSPAR 1000 protocol: in order to follow the approach of OSPAR that it is not including these data anymore in the monitoring;
- Selection of surveys from non-MSFD monitoring, cleaning and research operations;
- Exclusion of beaches without coordinates;
- Some categories & some litter types like organic litter, small fragments (paraffin and wax; items > 2.5cm) and pollutants have been removed. The list of selected items is attached to this metadata. This list was created using EU Marine Beach Litter Baselines and EU Threshold Value for Macro Litter on Coastlines from JRC (these two documents are attached to this metadata).
- Exclusion of surveys without associated length;
- Normalization of survey lengths to 100m & 1 survey / year: in some case, the survey length was not 100m, so in order to be able to compare the abundance of litter from different beaches a normalization is applied using this formula:
Number of items (normalized by 100 m) = Number of litter per items x (100 / survey length)
Then, this normalized number of items is summed to obtain the total normalized number of litter for each survey. Finally, the median abundance for each beach and year is calculated from these normalized abundances per survey.

Percentiles 50, 75, 95 & 99 have been calculated taking into account other sources data for all years.

More information is available in the attached documents.

Warning: the absence of data on the map doesn't necessarily mean that they don't exist, but that no information has been entered in the Marine Litter Database for this area.
d
Residential Real Estate Data | Tax Assessor & Recorder of Deeds Data | Bulk...
datarade.ai
.json, .csv, .xls
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompCurve, Residential Real Estate Data | Tax Assessor & Recorder of Deeds Data | Bulk + API | 158M Properties and Parcels [Dataset]. https://datarade.ai/data-products/compcurve-residential-real-estate-assessor-recorder-of-compcurve
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset authored and provided by
CompCurve
Area covered
United States of America
Description
Like other Assessor and Recorder data sets from First American, BlackKnight, ATTOM or HouseCanary, we provide both residential real estate and commercial restate data on homes, properties and pracels nationally.

Over 250M parcels, updated daily.

Access detailed property and tax assessment records with our extensive nationwide database. This robust dataset provides comprehensive information about residential and commercial properties, including detailed ownership, valuation, and transaction history. Core Data Elements:

Complete property identification (APNs, Tax IDs) Full property addresses with geocoding Precise latitude/longitude coordinates FIPS codes and Census tract information School district assignments

Property Characteristics:

Detailed lot dimensions and size Building square footage breakdowns Living area measurements Basement and attic specifications Garage and parking information Year built and effective year Number of bedrooms and bathrooms Room counts and configurations Building class and condition codes Construction details and materials Property amenities and features

Valuation Information:

Current AVM (Automated Valuation Model) values Confidence scores and value ranges Market valuations with dates Assessed values (land and improvements) Tax amounts and years Tax rate codes and districts Various tax exemption statuses

Transaction History:

Current and previous sale details Recording dates and document numbers Sale prices and price codes Buyer and seller information Multiple mortgage records including:

Loan amounts and terms Lender information Recording dates Interest rates Due dates Loan types and positions

Ownership Details:

Current owner information Corporate ownership indicators Owner-occupied status Mailing addresses Care of names Foreign address indicators

Legal Information:

Complete legal descriptions Subdivision details Lot and block numbers Zoning information Land use codes HOA information and fees

Property Status Indicators:

Vacancy flags Pre-foreclosure status Current listing status Price ranges Market position

Perfect For:

Real Estate Professionals

Property researchers Title companies Real estate attorneys Appraisers Market analysts

Financial Services

Mortgage lenders Insurance companies Investment firms Risk assessment teams Portfolio managers

Government & Planning

Urban planners Tax assessors Economic developers Policy researchers Municipal agencies

Data Analytics

Market researchers Data scientists Economic analysts GIS specialists Demographics experts

Data Delivery Features:

Multiple format options Regular updates Bulk download capability Custom field selection Geographic filtering API access available Standardized formatting Quality assured data

Quality Assurance:

Verified against public records Regular updates Standardized formatting Address verification Geocoding validation Duplicate removal Data normalization Quality control processes

This comprehensive property database provides unprecedented access to detailed property information, perfect for industry professionals requiring in-depth property data for analysis, research, or business development. Our data undergoes rigorous quality control processes to ensure accuracy and completeness, making it an invaluable resource for real estate professionals, financial institutions, and government agencies. Updated continuously from authoritative sources, this dataset offers the most current and accurate property information available in the market. Custom data extracts and specific geographic coverage options are available to meet your exact needs.

Weekly/Quarterly/Annual and One-time options are available for sale.

See our sample
Additional file 8 of Public transcriptome database-based selection and...
figshare.com
xlsx
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiang Song; Lu Dou; Wenjin Zhang; Yang Peng; Man Huang; Mengyuan Wang (2024). Additional file 8 of Public transcriptome database-based selection and validation of reliable reference genes for breast cancer research [Dataset]. http://doi.org/10.6084/m9.figshare.17162991.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17162991.v1
Dataset updated
Feb 20, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Qiang Song; Lu Dou; Wenjin Zhang; Yang Peng; Man Huang; Mengyuan Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 8: Table S6. Relative expression levels of FAAH and HIF1A genes normalized by 13 types of single or multiple gene combinations of RGs in 21 BC cell strain samples.
Data articles in journals
data.niaid.nih.gov
Updated Sep 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Balsa-Sanchez, Carlota; Loureiro, Vanesa (2023). Data articles in journals [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3753373
Explore at:
Dataset updated
Sep 22, 2023
Dataset provided by
University of A Coruñahttp://udc.es/
Univeridade da Coruña
Authors
Balsa-Sanchez, Carlota; Loureiro, Vanesa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Version: 5

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2023/09/05

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:

data_articles_journal_list_v5.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published

data_articles_journal_list_v5.csv: full list of 140 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 5th version - Information updated: number of journals, URL, document types associated to a specific journal.

Version: 4

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/12/15

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:

data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published

data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 4th version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.

Version: 3

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/10/28

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:

data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published

data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 3rd version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).

Erratum - Data articles in journals Version 3:

Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2 Data -- ISSN 2306-5729 -- JCR (JIF) n/a Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a

Version: 2

Author: Francisco Rubio, Universitat Politècnia de València.

Date of data collection: 2020/06/23

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:

data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published

data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 2nd version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

Total size: 32 KB

Version 1: Description

This dataset contains a list of journals that publish data articles, code, software articles and database articles.

The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals. Acknowledgements: Xaquín Lores Torres for his invaluable help in preparing this dataset.
UCI Automobile Dataset
kaggle.com
Updated Feb 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Otrivedi (2023). UCI Automobile Dataset [Dataset]. https://www.kaggle.com/datasets/otrivedi/automobile-data/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Otrivedi
Description
In this project, I have done exploratory data analysis on the UCI Automobile dataset available at https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

This dataset consists of data From the 1985 Ward's Automotive Yearbook. Here are the sources

1) 1985 Model Import Car and Truck Specifications, 1985 Ward's Automotive Yearbook. 2) Personal Auto Manuals, Insurance Services Office, 160 Water Street, New York, NY 10038 3) Insurance Collision Report, Insurance Institute for Highway Safety, Watergate 600, Washington, DC 20037

Number of Instances: 398 Number of Attributes: 9 including the class attribute

Attribute Information:

mpg: continuous cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance)

This data set consists of three types of entities:

I - The specification of an auto in terms of various characteristics

II - Tts assigned an insurance risk rating. This corresponds to the degree to which the auto is riskier than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is riskier (or less), this symbol is adjusted by moving it up (or down) the scale. Actuaries call this process "symboling".

III - Its normalized losses in use as compared to other cars. This is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/specialty, etc...), and represents the average loss per car per year.

The analysis is divided into two parts:

Data Wrangling

Pre-processing data in python

Dealing with missing values

Data formatting

Data normalization

Binning

Exploratory Data Analysis

Descriptive statistics

Groupby

Analysis of variance

Correlation

Correlation stats

Acknowledgment Dataset: UCI Machine Learning Repository Data link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data
Additional file 7 of Public transcriptome database-based selection and...
springernature.figshare.com
xlsx
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiang Song; Lu Dou; Wenjin Zhang; Yang Peng; Man Huang; Mengyuan Wang (2024). Additional file 7 of Public transcriptome database-based selection and validation of reliable reference genes for breast cancer research [Dataset]. http://doi.org/10.6084/m9.figshare.17162988.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17162988.v1
Dataset updated
Feb 20, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Qiang Song; Lu Dou; Wenjin Zhang; Yang Peng; Man Huang; Mengyuan Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 7: Table S5. Relative expression levels of MAPK3 and MAPK9 genes normalized by 13 types of single or multiple gene combinations of RGs in 66 BC tissue samples.
p
Beach litter - Composition of litter according to material categories in...
pigma.org
catalogue.arctic-sdi.org
doi, ogc:wms +2
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EMODnet Chemistry (2025). Beach litter - Composition of litter according to material categories in percent normalized per beach per year - EU-MSFD monitoring 2001/2023 v2025 [Dataset]. https://www.pigma.org/geonetwork/BORDEAUX_METROPOLE_DIR_INFO_GEO/api/records/5569270d-1ffc-4e14-8fa8-6760b048fc81
Explore at:
www:link, ogc:wms, www:download, doiAvailable download formats
Dataset updated
Feb 21, 2025
Dataset provided by
IFREMER, SISMER, Scientific Information Systems for the SEA
National Institute of Oceanography and Applied Geophysics - OGS, Division of Oceanography
Ifremer, VIGIES (Information Valuation Service for Integrated Management and Monitoring)
EMODnet Chemistry
Time period covered
Jan 1, 2001 - Dec 24, 2023
Area covered

Description
This visualization product displays marine macro-litter (> 2.5cm) material categories percentages per beach per year from the Marine Strategy Framework Directive (MSFD) monitoring surveys.

EMODnet Chemistry included the collection of marine litter in its 3rd phase. Since the beginning of 2018, data of beach litter have been gathered and processed in the EMODnet Chemistry Marine Litter Database (MLDB). The harmonization of all the data has been the most challenging task considering the heterogeneity of the data sources, sampling protocols and reference lists used on a European scale.

Preliminary processings were necessary to harmonize all the data: - Exclusion of OSPAR 1000 protocol: in order to follow the approach of OSPAR that it is not including these data anymore in the monitoring; - Selection of MSFD surveys only (exclusion of other monitoring, cleaning and research operations); - Exclusion of beaches without coordinates; - Some litter types like organic litter, small fragments (paraffin and wax; items > 2.5cm) and pollutants have been removed. The list of selected items is attached to this metadata. This list was created using EU Marine Beach Litter Baselines, the European Threshold Value for Macro Litter on Coastlines and the Joint list of litter categories for marine macro-litter monitoring from JRC (these three documents are attached to this metadata); - Exclusion of the "feaces" category: it concerns more exactly the items of dog excrements in bags of the OSPAR (item code: 121) and ITA (item code: IT59) reference lists; - Normalization of survey lengths to 100m & 1 survey / year: in some case, the survey length was not exactly 100m, so in order to be able to compare the abundance of litter from different beaches a normalization is applied using this formula: Number of items (normalized by 100 m) = Number of litter per items x (100 / survey length) Then, this normalized number of items is summed to obtain the total normalized number of litter for each survey. Sometimes the survey length was null or equal to 0. Assuming that the MSFD protocol has been applied, the length has been set at 100m in these cases.

To calculate the percentage for each material category, formula applied is: Material (%) = (∑number of items (normalized at 100 m) of each material category)*100 / (∑number of items (normalized at 100 m) of all categories)

The material categories differ between reference lists (OSPAR, ITA, TSG-ML, UNEP, UNEP-MARLIN, JLIST). In order to apply a common procedure for all the surveys, the material categories have been harmonized.

More information is available in the attached documents.

Warning: the absence of data on the map does not necessarily mean that they do not exist, but that no information has been entered in the Marine Litter Database for this area.
Cartoonists of Color Datasets
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Padilla; Mari Naomi (2023). Cartoonists of Color Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1557870.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1557870.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Thomas Padilla; Mari Naomi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cartoonists of Color Dataset and the LGBTQ Cartoonists of Color Dataset are derived from the Cartoonists of Color Database (CoC). Dataset release is the result of a collaboration with CoC Database creator, artist, author, and illustrator MariNaomi. These datasets are provided in order to support folks interested in exploring the data computationally. :: Backstory :: MariNaomi developed the CoC database, “For visibility. For Academia. For inspiration. For community building.” CoC contains contains biographical information (ethnicity, gender, creative roles, location, comic titles, genre, etc.) for 861 cartoonists who identify as non-Caucasian (non-white). The CoC database is a unique resource that holds potential to support a growing conversation on diversity (race, gender, sex) in comics that spans academic, popular, creator, and fan discourse. CoC data was gathered by hand and via completion of an online form. The database continues to grow. :: Data Preparation :: Data derived from the database have been minimally normalized. In some cases, data have not been normalized at all. I leave this to the discretion of the data user, but I would say to tread ethically. While the ethnicity data field contains more than 160 different types, (e.g. "mixed, black", "African American", "African-American + Afro-Bermudian / Irish-American") some of which you could interpret as acceptable candidates for normalization, keep in mind what the act of normalization does. Especially in light of the connection between data point and identity. Take the diversity of the data as a challenge. Strain those data models. Where they don't work, cast them aside. Perhaps make something new. :: Data Summary :: Cartoonists of Color Dataset - contains biographical information for all cartoonists of color 20150912_coc.csv20150912_coc.json LGBTQ Cartoonists of Color Dataset - contains biographical information for the LGBTQ subset of the Cartoonists of Color Dataset 20150912_coc_lgbtq.csv20150912_coc_lgbtq.json :: Acknowledgements :: Source data compiled and maintained by MariNaomi. Devin Higgins advised on JSON data structure.
Key words used for electronic data base search.
plos.figshare.com
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marieke Saan; Floryt van Wesel; Sonja Leferink; Joop Hox; Hennie Boeije; Peter van der Velden (2023). Key words used for electronic data base search. [Dataset]. http://doi.org/10.1371/journal.pone.0276476.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276476.t001
Dataset updated
Jun 6, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Marieke Saan; Floryt van Wesel; Sonja Leferink; Joop Hox; Hennie Boeije; Peter van der Velden
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Key words used for electronic data base search.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Silvio Sopic (2024). Pokemon database [Dataset]. https://www.kaggle.com/datasets/silviosopic/pokemon-database

Pokemon database

A pokemon database with various normalization levels. ETA: January 2025

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

zip(16051 bytes)Available download formats

Dataset updated

Nov 18, 2024

Authors

Silvio Sopic

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Description:

This dataset contains detailed information about Pokémon evolution chains in a both a wide and a long format. It's derived from the PokéAPI (https://pokeapi.co/) using web scraping techniques. The data includes details on the evolving Pokémon, evolution triggers, conditions, and other relevant information.

Column description:

Features long: - Evolving From (str): The name of the Pokémon that evolves. - Evolving To (str): The name of the Pokémon that the previous Pokémon evolves into. - trigger (str): The specific trigger for evolution (e.g., "level_up"). - Condition (str): The specific condition for evolution (e.g., "Item"). - value (str): The value for the evolution condition (e.g., "fire-stone").

Features wide:

Evolving From (str): The name of the Pokémon that evolves.
Evolving To (str): The name of the Pokémon that the previous Pokémon evolves into.
turn_upside_down (bool): Whether the Pokémon needs to be turned upside down to evolve.
trade_species (str): The required Pokémon to trade for evolution (if applicable).
time_of_day (str): Specific time of day required for evolution (if applicable).
relative_physical_stats (str/None): Information about the relative physical stats required for evolution (if applicable).
party_type (str): The required party type for evolution (if applicable).
party_species (str): The required Pokémon in the party for evolution (if applicable).
Item (str): The required item held for evolution (if applicable).
needs_overworld_rain (bool): Whether overworld rain is needed for evolution.
min_level (int/None): The minimum level required for evolution.
min_happiness (int/None): The minimum happiness required for evolution.
trigger (str): The specific trigger for evolution (e.g., "level_up").
min_beauty (int/None): The minimum beauty required for evolution (if applicable).
min_affection (int/None): The minimum affection required for evolution (if applicable).
gender (str): The required gender for evolution (if applicable).
held_item (str): The required held item for evolution (if applicable, name extracted from dictionary).
known_move (str): The required move known for evolution (if applicable, name extracted from dictionary).
known_move_type (str): The required move type known for evolution (if applicable, name extracted from dictionary).
location (str): The required location for evolution (if applicable, name extracted from dictionary).

Format:

Wide format CSV file (evolution_wide.csv) Long format CSV file (evolution_long.csv)

Source:

PokéAPI (https://pokeapi.co/) License:

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) (https://creativecommons.org/licenses/by-nc-sa/4.0/)
Note:

This dataset is a derivative work of PokéAPI data and adheres to their licensing terms. "Pokémon" and character names are trademarks of Nintendo. Please feel free to use and modify this dataset for non-commercial purposes, with proper attribution.

Sources and related content

Clear search

Close search

Google apps

Main menu

Pokemon database

Data_Sheet_1_In Praise of Artifice Reloaded: Caution With Natural Image...

PICKLE 2.0: A human protein-protein interaction meta-database employing data...

Data Analysis for the Systematic Literature Review of DL4SE

Beach Litter - Median number of total abundance items normalized per 100m &...

Residential Real Estate Data | Tax Assessor & Recorder of Deeds Data | Bulk...

Additional file 8 of Public transcriptome database-based selection and...

Data articles in journals

UCI Automobile Dataset

Additional file 7 of Public transcriptome database-based selection and...

Beach litter - Composition of litter according to material categories in...

Cartoonists of Color Datasets

Key words used for electronic data base search.

Pokemon database

A pokemon database with various normalization levels. ETA: January 2025