Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff0d45220cad473000b1e59942548dd45%2Fanimated_bubble_chart.gif?generation=1705615116968842&alt=media" alt="">This comprehensive football dataset, derived primarily from Transfermarkt, serves as a valuable resource for football enthusiasts, offering structured information on competitions, clubs, and players. With over 60,000 games across major global competitions, the dataset delves into the performance metrics of 400+ clubs and detailed statistics for more than 30,000 players.
Structured in CSV files, each with unique IDs, users can seamlessly join datasets to perform in-depth analyses. The dataset encompasses market values, historical valuations, and detailed player statistics, including physical attributes, contract statuses, and individual performances. A specialized Python-based web scraper ensures consistent updates, with data meticulously processed through Python scripts and SQL databases.
To use the dataset effectively, users are encouraged to understand the relevant files, join datasets using unique IDs, and leverage compatible software tools like Python's pandas or R's ggplot2 for analysis. The guide emphasizes the potential for fantasy football predictions, tracking player value over time, assessing market value versus performance, and exploring the impact of cards on match outcomes.
Research ideas include player performance analysis for fantasy football or recruitment purposes, studying market value trends for economic insights, evaluating club performance for strategic decision-making, developing predictive models for match outcomes, and conducting social network analysis to understand interactions among clubs and players.
Acknowledging the dataset's unknown license, users are encouraged to credit the original authors, particularly David Cereijo, if used in research. The dataset's dedication to accessibility is evident through active discussions on GitHub for improvements and bug fixes.
In conclusion, this football dataset offers a wealth of information, empowering users to explore diverse analyses and research ideas, bridging the gap between structured data and the dynamic world of football.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is prepared and intended as a data source for development of a stress analysis method based on machine learning. It consists of finite element stress analyses of randomly generated mechanical structures. The dataset contains more than 270,794 pairs of stress analyses images (von Mises stress) of randomly generated 2D structures with predefined thickness and material properties. All the structures are fixed at their bottom edges and loaded with gravity force only. See PREVIEW directory with some examples. The zip file contains all the files in the dataset.
Facebook
TwitterTree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables Ă trees Ă time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables Ă trees or ring variables Ă time. Here, we explore the potential of the partial triadic analysis (PTA), a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller) during the period of 1967â2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967â2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i) the temporal evolution of spatial structures and ii) the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive is associated with the article "Biases of STRUCTURE software when exploring introduction routes of invasive species". Authors: Eric Lombaert, Thomas Guillemaud & Emeline Deleury.
The file contains the 22,500 simulated datasets, the corresponding 900,000 STRUCTURE outputs and the summary statistics files. It also contains SIM_STRUCT which is a home-made pipeline developed for the purpose of carrying out analyzes as described in the manuscript. It can be used to simulate and summarize datasets, and to perform STRUCTURE analyses in batch on those simulated datasets. It is currently based on several softwares such as DIYABC, ARLSUMSTAT and STRUCTURE, as well as on some home-made PERL scripts. A tutorial is included. See the Readme file for details.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This synthetic dataset is designed specifically for Power BI and DAX (Data Analysis Expressions) learners and professionals. It provides a complete star schema for practicing DAX measures, relationships, filters, and time intelligence â just like in real-world business analytics projects.
The dataset simulates a multi-year sales environment with customers, employees, products, geographies, and dates â allowing you to perform calculations across multiple business dimensions.
This dataset contains 6 CSV files, forming a clean star schema:
| Table Name | Type | Description |
|---|---|---|
| FactSales | Fact | Contains transactional sales data with quantities, amounts, profits, discounts, and references to all dimension keys. |
| DimDate | Dimension | A complete date table (2018â2024) including Year, Quarter, Month, DayOfWeek, Weekend/Holiday flags, etc. |
| DimProduct | Dimension | Product catalog with Category, SubCategory, Color, Size, StandardCost, and ListPrice. |
| DimCustomer | Dimension | Customer information including name, gender, signup date, loyalty tier, and geographic key. |
| DimEmployee | Dimension | Sales employee data including name, role, hire date, and region. |
| DimGeography | Dimension | Geographic data covering countries, regions, and cities. |
| Column | Description |
|---|---|
SalesKey | Unique identifier for each transaction |
OrderDateKey, ShipDateKey | Foreign keys to DimDate |
ProductKey, CustomerKey, EmployeeKey, GeographyKey | Foreign keys to respective dimensions |
Quantity | Number of units sold |
UnitPrice | Price per unit |
Discount | Discount applied to the sale |
SalesAmount | Total sales value after discount |
TotalCost | Total cost of goods sold |
Profit | SalesAmount â TotalCost |
Channel | Online, Retail, or Distributor |
PaymentMethod | Credit, Cash, or Transfer |
OrderPriority | Low, Medium, or High priority |
Includes:
Perfect for DAX time intelligence functions like:
TOTALYTD, SAMEPERIODLASTYEAR, DATESINPERIOD, and PARALLELPERIOD.
Imagine a mid-sized electronics retailer operating across multiple regions and sales channels. The dataset captures 7 years of simulated performance â including seasonal patterns, regional sales variations, and customer loyalty effects.
This dataset is designed for:
You can use this dataset to practice almost every DAX concept:
Total Sales = SUM(FactSales[SalesAmount])
Total Profit = SUM(FactSales[Profit])
Online Sales = CALCULATE([Total Sales], FactSales[Channel] = "Online")
YTD Sales = TOTALYTD([Total Sales], DimDate[Date])
Sales YoY % = DIVIDE([Total Sales] - [Previous Year Sales], [Previous Year Sales])
Shipped Sales = CALCULATE([Total Sales], USERELATIONSHIP(FactSales[ShipDateKey], DimDate[DateKey]))
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here you can find raw data and information about each of the 34 datasets generated by the mulset algorithm and used for further analysis in SIMON. Each dataset is stored in separate folder which contains 4 files: json_info: This file contains, number of features with their names and number of subjects that are available for the same dataset data_testing: data frame with data used to test trained model data_training: data frame with data used to train models results: direct unfiltered data from database Files are written in feather format. Here is an example of data structure for each file in repository. File was compressed using 7-Zip available at https://www.7-zip.org/.
Facebook
TwitterThis dataset includes a table of the VOC concentrations detected in firefighter breath samples. QQ-plots for benzene, toluene, and ethylbenzene levels in breath samples as well as box-and-whisker plots of pre-, post-, and 1 h post-exposure breath levels of VOCs for firefighters participating in attack, search, and outside ventilation positions are provided. Graphs detailing the responses of individuals to pre-, post-, and 1 h post-exposure concentrations of benzene, toluene, and ethylbenzene are shown. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. Format: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. This dataset is associated with the following publication: Wallace, A., J. Pleil, K. Oliver, D. Whitaker, S. Mentese, K. Fent, and G. Horn. Targeted GC-MS analysis of firefightersâ exhaled breath: Exploring biomarker response at the individual level. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE. Taylor & Francis, Inc., Philadelphia, PA, USA, 16(5): 355-366, (2019).
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
About
We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.
Distribution
Detailing the format, size, and structure of the dataset: Data Volume: -Total Size: 2.7TB
-Total Videos: 47,547
-Identities Covered: 20,841
-Resolution: 60% 4k(1980), 33% fullHD(1080)
-Formats: MP4
-Full-length videos with visible mouth movements in every frame.
-Minimum face size of 400 pixels.
-Video durations range from 20 seconds to 5 minutes.
-Faces have not been cut out, full screen videos including backgrounds.
Usage
This dataset is ideal for a variety of applications:
Face Recognition & Verification: Training and benchmarking facial recognition models.
Action Recognition: Identifying human activities and behaviors.
Re-Identification (Re-ID): Tracking identities across different videos and environments.
Deepfake Detection: Developing methods to detect manipulated videos.
Generative AI: Training high-resolution video generation models.
Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.
Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.
Coverage
Explaining the scope and coverage of the dataset:
Geographic Coverage: Worldwide
Time Range: Time range and size of the videos have been noted in the CSV file.
Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.
Languages Covered (Videos):
English: 23,038 videos
Portuguese: 1,346 videos
Spanish: 677 videos
Norwegian: 1,266 videos
Swedish: 1,056 videos
Korean: 848 videos
Polish: 1,807 videos
Indonesian: 1,163 videos
French: 1,102 videos
German: 1,276 videos
Japanese: 1,433 videos
Dutch: 1,666 videos
Indian: 1,163 videos
Czech: 590 videos
Chinese: 685 videos
Italian: 975 videos
Philipeans: 920 videos
Bulgaria: 340 videos
Romanian: 1144 videos
Arabic: 1691 videos
Who Can Use It
List examples of intended users and their use cases:
Data Scientists: Training machine learning models for video-based AI applications.
Researchers: Studying human behavior, facial analysis, or video AI advancements.
Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.
Additional Notes
Ensure ethical usage and compliance with privacy regulations. The datasetâs quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, down sampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 100GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"etch_pit_density_analysis.zip" contains the analysis code. See the README.txt for more information.
"secco_etched_mc_Si_wafer_image.png" - Optical microscope image, depicting a 2.5cm*1.2cm Secco etched multicrystalline Silicon wafer. Dark spots are etch pits, typically associated with dislocation lines that intersect with the wafer surface. Dark lines are grain boundaries. The leftmost 20% of the wafer have been in contact with the sample carrier during defect etching, explaining the uneven etch result.
Facebook
TwitterA little paragraph from one real dataset, with a few little changes to protect students' private information. Permissions are given.
You are going to help teachers with only the data: 1. Prediction: To tell what makes a brilliant student who can apply for a graduate school, whether abroad or not. 2. Application: To help those who fails to apply for a graduate school with advice in job searching.
Some of the original structure are deleted or censored. For those are left: Basic data like: - ID - class: categorical, initially students were divided into 2 classes, yet teachers suspect that of different classes students may performance significant differently. - gender - race: categorical and censored - GPA: real numbers, float
Some teachers assume that scores of math curriculums can represent one's likelihood perfectly: - Algebra: real numbers, Advanced Algebra - ......
Some assume that background of students can affect their choices and likelihood significantly, which are all censored as: - from1: students' home locations - from2: a probably bad indicator for preference on mathematics - from 3: how did students apply for this university (undergraduate) - from4: a probably bad indicator for family background. 0 with more wealth, 4 with more poverty
The final indicator y: - 0, one fails to apply for the graduate school, who may apply again or search jobs in the future - 1, success, inland - 2, success, abroad
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains all data used in "Training data composition affects performance of protein structure analysis algorithms", published in the Pacific Symposium on Biocomputing 2022 by A. Derry, K. A. Carpenter, & R. B. Altman.
The data consists of the following files:
Details on dataset construction can be found in our paper and dataloaders can be found in our Github repo.
Reference
A. Derry*, K. A. Carpenter*, & R. B. Altman, "Training data composition affects performance of protein structure analysis algorithms", 2021.
Dataset References
Datasets used were derived from the following works:
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K., & Moult, J. (2019). Critical assessment of methods of protein structure prediction (CASP)âRound XIII. In Proteins: Structure, Function and Bioinformatics (Vol. 87, Issue 12, pp. 1011â1020). https://doi.org/10.1002/prot.25823
Ingraham, J., Garg, V. K., Barzilay, R., & Jaakkola, T. (2019). Generative Models for Graph-Based Protein Design. https://openreview.net/pdf?id=SJgxrLLKOE
Furnham, N., Holliday, G. L., de Beer, T. A. P., Jacobsen, J. O. B., Pearson, W. R., & Thornton, J. M. (2014). The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Research, 42 (Database issue), D485âD489.
Facebook
TwitterWater buffalo (Bubalus bubalis L.) is an important livestock species worldwide. Like many other livestock species, water buffalo lacks high quality and continuous reference genome assembly, required for fine-scale comparative genomics studies. In this work, we present a dataset, which characterizes genomic differences between water buffalo genome and the extensively studied cattle (Bos taurus Taurus) reference genome. This data set is obtained after alignment of 14 river buffalo whole genome sequencing datasets to the cattle reference. This data set consisted of 13, 444 deletion CNV regions, and 11,050 merged mobile element insertion (MEI) events within the upstream regions of annotated cattle genes. Gene expression data from cattle and buffalo were also presented for genes impacted by these regions. This study sought to characterize differences in gene content, regulation and structure between taurine cattle and river buffalo (2n=50) (one extant type of water buffalo) using the extensively annotated UMD3.1 cattle reference genome as a basis for comparisons. Using 14 WGS datasets from river buffalo, we identified 13,444 deletion CNV regions (Supplemental Table 1) in river buffalo, but not identified in cattle. We also presented 11,050 merged mobile element insertion (MEI) events (Supplemental Table 2) in river buffalo, out of which, 568 of them are within the upstream regions of annotated cattle genes. Furthermore, our tissue transcriptomics analysis provided expression profiles of genes impacted by MEI (Supplemental Tables 3â6) and CNV (Supplemental Table 7) events identified in this study. This data provides the genomic coordinates of identified CNV-deletions and MEI events. Additionally, normalized read count of impacted genes, along with their adjusted p-values of statistical analysis were presented (Supplemental Tables 3â6). Genomic coordinates of identified CNV-deletion and MEI events, and Ensemble gene names of impacted genes (Supplemental Tables 1 and 2) Gene expression profiles and statistical significance (adjusted p-values) of genes impacted by MEI in liver (Supplemental Tables 3 and 4) Gene expression profiles and statistical significance (adjusted p-values) of genes impacted by MEI in muscle (Supplemental Tables 5 and 6) Gene expression profiles and statistical significance (adjusted p-values) of genes impacted by CNV deletions in river buffalo (Supplemental Table 7) Public assessment of this dataset will allow for further analyses and functional annotation of genes that are potentially associated with phenotypic difference between cattle and water buffalo. Raw read data of whole genome and transcriptome sequencing were deposited to NCBI Bioprojects. Resources in this dataset:Resource Title: Genomic structural differences between cattle and River Buffalo identified through comparative genomic and transcriptomic analysis. File Name: Web Page, url: https://www.sciencedirect.com/science/article/pii/S2352340918305183 Data in Brief presenting a dataset which characterizes genomic differences between water buffalo genome and the extensively studied cattle (Bos taurus Taurus) reference genome. This data set is obtained after alignment of 14 river buffalo whole genome sequencing datasets to the cattle reference. This data set consisted of 13, 444 deletion CNV regions, and 11,050 merged mobile element insertion (MEI) events within the upstream regions of annotated cattle genes. Gene expression data from cattle and buffalo were also presented for genes impacted by these regions. Tables are with this article. Raw read data of whole genome and transcriptome sequencing were deposited to NCBI Bioprojects as the following: PRJNA350833 (https://www.ncbi.nlm.nih.gov/bioproject/?term=350833) PRJNA277147 (https://www.ncbi.nlm.nih.gov/bioproject/?term=277147) PRJEB4351 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB4351)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this paper a fluid-structure interaction (FSI) experiment is presented. The aim of this experiment is to provide a challenging yet easy-to-setup FSI test case that addresses the need for rigorous testing of FSI algorithms and modeling frameworks. Steady-state and periodic steady-state test cases with constant and periodic inflow were established. Focus of the experiment is on biomedical engineering applications with flow being in the laminar regime with Reynolds numbers 1283 and 651. Flow and solid domains were defined using CAD tools. The experimental design aimed at providing a straight-forward boundary condition definition. Material parameters and mechanical response of a moderately viscous Newtonian fluid and a nonlinear incompressible solid were experimentally determined. A comprehensive data set was acquired by employing magnetic resonance imaging to record the interaction between the fluid and the solid, quantifying flow and solid motion.
Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
causRCA is a collection of time series datasets recorded from the CNC control of an industrial vertical lathe.
The datasets comprise real-world recordings from normal factory operation and labeled fault data from a hardware-in-the-loop simulation. The fault datasets come with labels for the underlying (simulated) cause of the failure, a labeled diagnosis, and a causal model of all variables in the datasets.
The extensive metadata and provided ground truth causal structure enable benchmarking of methods in causal discovery, root cause analysis, anomaly detection, and fault diagnosis in general.
data/
⣠real_op/
⣠dig_twin/
â ⣠exp_coolant/
â ⣠exp_hydraulics/
â â exp_probe/
⣠expert_graph/
â README_DATASET.md
The data folder contains:
| (Sub-)graph | #Nodes | #Edges | #Datasets normal | #Datasets Fault | #Fault Scenarios | #Different Diagnoses | #Causing Variables |
| Lathe (Full graph) | 92 | 104 | 170 | 100 | 19 | 10 | 14 |
| --Probe | 11 | 15 | 170 | 34 | 6 | 3 | 2 |
| --Hydraulics | 17 | 18 | 170 | 41 | 9 | 5 | 6 |
| --Coolant | 15 | 10 | 170 | 25 | 4 | 2 | 6 |
| --(Other Vars) | 49 | 61 | 170 | - | - | - | - |
*datasets from normal operation contain all machine variables and therefore all subgraphs and their respective variables within it.
real_op)Data were recorded through an OPC UA interface during normal production cycles on a vertical lathe. These files capture baseline machine behavior under standard operating conditions, without induced or known faults.
dig_twin)A hardware-in-the-loop digital twin was developed by connecting the original machine controller to a real-time simulation. Faults (e.g., valve leaks, filter clogs) were injected by manipulating specific twin variables, providing known ground-truth causes. Data were recorded via the same OPC UA interface to ensure consistent structure.
Data was sampled via an OPC UA interface. The timestamps only reflect the published time of value change by the CNC and do not necessarily reflect the exact time of value changes.
Consequently, the chronological order of changes across different variables is not strictly guaranteed. This may impact time-series analyses that are highly sensitive to precise temporal ordering.
The authors gratefully acknowledge the contributions of:
During the preparation of the dataset, the author(s) used generative AI tools to enhance the dataset's applicability by structuring data in an accessible format with extensive metadata, assist in coding transformations, and draft description content. All AI-generated output was reviewed and edited under human oversight, and no original dataset content was created by AI.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Wardâs algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an example dataset as part of the Supplementary Material for our manuscript "3D quantification of vascular-like structures in z-stack confocal images" in STAR Protocols". The dataset provides an example raw confocal image stack, demonstrates the data visualisation at major steps throughout the protocol, as well as the received output from WinFiber3D.
Facebook
TwitterThis dataset includes a list of chemicals used to create the ChromGenius retention time prediction model used for validation of non-targeted compounds. The list of identified non-targeted compounds in the samples is also provided. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: By viewing the analyzed spreadsheets attached to the Journal Article. Format: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed data can be made publicly available. This dataset is associated with the following publication: Wallace, A., J. Pleil, K. Oliver, D. Whitaker, S. Mentese, K. Fent, and G. Horn. Non-targeted GC/MS analysis of exhaled breath samples: Exploring human biomarkers of exogenous exposure and endogenous response from professional firefighting activity. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH - PART A: CURRENT ISSUES. Taylor & Francis, Inc., Philadelphia, PA, USA, 82(4): 244-260, (2019).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created as part of a student project and contains microCT data from additively manufactured tensile specimens with different lattice structures, images of the failure mechanisms, and other experimental data.
Information on file structuring:
The specimen designations can be found in the Excel table SampleDesignations.xlsx. Using this specimen designation, the measured values of the tensile test can be identified, as well as the microCT recordings of the specimens. Additionally, photos and videos of the failure mechanisms are provided (partially) unstructured. The material parameters are shown in MaterialProperties and an overview of the design space of the sample in SampleGeometry+LatticeDesignSpace.
https://i.ibb.co/9by1hxX/allebrueche.jpg" alt="Alt text">
Facebook
Twitterhttps://india-data.org/terms-conditionshttps://india-data.org/terms-conditions
This dataset contains drone-captured video footage (.mp4 format) for automated building infrastructure assessment tasks. The dataset is organized into seven modules: 1. Window Detection: Identifying and segmenting windows on building facades. 2. Storey Count: Estimating and counting the number of floors (stories) in buildings. 3. Roof Area Estimation: Calculating the total area of building roofs from drone footage. 4. Roof Layout and Occupancy Estimation: Analyzing roof layouts and occupancy patterns. 5. Distance Between Adjacent Buildings: Measuring the spatial distance between neighboring buildings. 6. Crack Detection: Detecting and localizing cracks or structural damage on building surfaces. 7. Building Tilt/Slope Estimation: Estimating the tilt or slope of buildings for structural analysis.
Facebook
TwitterThe NDT for High value manufacturing of Composites project is an EPSRC fellowship in Manufacturing aimed at developing new 3D non-destructive characterisation algorithms for ultrasonic data inversion. These will map 3D fibre-tow orientation and porosity and will offer the ability to create Finite Element Analysis models of the actual as-manufactured structure to determine strength and performance.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff0d45220cad473000b1e59942548dd45%2Fanimated_bubble_chart.gif?generation=1705615116968842&alt=media" alt="">This comprehensive football dataset, derived primarily from Transfermarkt, serves as a valuable resource for football enthusiasts, offering structured information on competitions, clubs, and players. With over 60,000 games across major global competitions, the dataset delves into the performance metrics of 400+ clubs and detailed statistics for more than 30,000 players.
Structured in CSV files, each with unique IDs, users can seamlessly join datasets to perform in-depth analyses. The dataset encompasses market values, historical valuations, and detailed player statistics, including physical attributes, contract statuses, and individual performances. A specialized Python-based web scraper ensures consistent updates, with data meticulously processed through Python scripts and SQL databases.
To use the dataset effectively, users are encouraged to understand the relevant files, join datasets using unique IDs, and leverage compatible software tools like Python's pandas or R's ggplot2 for analysis. The guide emphasizes the potential for fantasy football predictions, tracking player value over time, assessing market value versus performance, and exploring the impact of cards on match outcomes.
Research ideas include player performance analysis for fantasy football or recruitment purposes, studying market value trends for economic insights, evaluating club performance for strategic decision-making, developing predictive models for match outcomes, and conducting social network analysis to understand interactions among clubs and players.
Acknowledging the dataset's unknown license, users are encouraged to credit the original authors, particularly David Cereijo, if used in research. The dataset's dedication to accessibility is evident through active discussions on GitHub for improvements and bug fixes.
In conclusion, this football dataset offers a wealth of information, empowering users to explore diverse analyses and research ideas, bridging the gap between structured data and the dynamic world of football.