MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
--- EN ---
Data collected from the Folha website, with dates prior to February 2024. The data encoding is UTF-8. You may feel the need to clean the data, although I have already done some work in this regard. The columns you will find are: Title, Content, URL, Published and Category. The application used was developed by me in C# and you can find the repository at the link below.
--- PT-BR ----
Dados coletados do site da Folha, com datas anteriores a fevereiro de 2024. O encoding dos dados é UTF-8. Você pode sentir necessidade de limpar os dados, embora eu já tenha feito algo nesse sentido. As colunas que você vai encontrar são: Título, Conteúdo, URL, publicado(em) e categoria. A aplicação usada foi desenvolvida por mim em C# e você pode encontrar o repositório no link abaixo.
https://github.com/luisfcaldeira/WebScrapper
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19155838%2Fb33a84bd4c5c02defaf5bc4574afd042%2Fcloud%20-%20Copia.png?generation=1708745865454053&alt=media" alt="">
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for [LLM Science Exam Kaggle Competition]
Dataset Summary
https://www.kaggle.com/competitions/kaggle-llm-science-exam/data
Languages
[en, de, tl, it, es, fr, pt, id, pl, ro, so, ca, da, sw, hu, no, nl, et, af, hr, lv, sl]
Dataset Structure
Columns prompt - the text of the question being asked A - option A; if this option is correct, then answer will be A B - option B; if this option is correct, then answer will be B C - option C; if this… See the full description on the dataset page: https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam.
Datos para el trabajo correspondiente al proyecto final de la Diplomatura en Machine Learning con Python, del Instituto Data Science, titulado "Análisis de la deforestación en la región aledaña a la localidad de Joaquín V. González, Salta" https://www.kaggle.com/code/mariavirginiaforcone/deforestation-analysis-main-notebook
Contiene un informe que resume el trabajo realizado y una carpeta que incluye:
Data for the final project of the Diploma in Machine Learning with Python, from the Data Science Institute, entitled "Analysis of deforestation in the region surrounding the town of Joaquín V. González, Salta" https://www.kaggle.com/code/mariavirginiaforcone/deforestation-analysis-main-notebook
It contains a report that summaries the work and a folder that includes:
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains electrical impedance measurements of freshly excised tissue samples from the breast. The data is sourced from the UCI Machine Learning Repository.
Impedance measurements were taken at the following frequencies: 15.625, 31.25, 62.5, 125, 250, 500, and 1000 KHz. These measurements, when plotted in the (real, -imaginary) plane, constitute the impedance spectrum from which the breast tissue features are computed. The dataset can be used for predicting the classification of either the original 6 classes or of 4 classes by merging the fibro-adenoma, mastopathy, and glandular classes, which are hard to discriminate.
This dataset is suitable for classification tasks. The impedance measurements can be used to predict the type of breast tissue.
If you use this dataset, please cite it as follows:
S, JP and Jossinet, J. (2010). Breast Tissue. UCI Machine Learning Repository. https://doi.org/10.24432/C5P31H.
@misc{misc_breast_tissue_192,
author = "S, JP and Jossinet, J",
title = "Breast Tissue",
year = 2010,
howpublished = "UCI Machine Learning Repository",
note = "DOI: https://doi.org/10.24432/C5P31H"
}
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.
It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
The columns in this dataset are:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Campeonato Brasileiro de futebol’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adaoduque/campeonato-brasileiro-de-futebol on 28 January 2022.
--- Dataset description provided by original source is as follows ---
18 anos de campeonato brasileiro de futebol
No total 7645 partidas de 2003 à 2021
https://github.com/adaoduque/Brasileirao_Dataset
--- Original source retains full ownership of the source dataset ---
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Shail_2604
Released under Apache 2.0
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Jonathan Ortiz [source]
This College Completion dataset provides an invaluable insight into the success and progress of college students in the United States. It contains graduation rates, race and other data to offer a comprehensive view of college completion in America. The data is sourced from two primary sources – the National Center for Education Statistics (NCES)’ Integrated Postsecondary Education System (IPEDS) and Voluntary System of Accountability’s Student Success and Progress rate.
At four-year institutions, the graduation figures come from IPEDS for first-time, full-time degree seeking students at the undergraduate level, who entered college six years earlier at four-year institutions or three years earlier at two-year institutions. Furthermore, colleges report how many students completed their program within 100 percent and 150 percent of normal time which corresponds with graduation within four years or six year respectively. Students reported as being of two or more races are included in totals but not shown separately
When analyzing race and ethnicity data NCES have classified student demographics since 2009 into seven categories; White non-Hispanic; Black non Hispanic; American Indian/ Alaskan native ; Asian/ Pacific Islander ; Unknown race or ethnicity ; Non resident with two new categorize Native Hawaiian or Other Pacific Islander combined with Asian plus students belonging to several races. Also worth noting is that different classifications for graduate data stemming from 2008 could be due to variations in time frame examined & groupings used by particular colleges – those who can’t be identified from National Student Clearinghouse records won’t be subjected to penalty by these locations .
When it comes down to efficiency measures parameters like “Awards per 100 Full Time Undergraduate Students which includes all undergraduate completions reported by a particular institution including associate degrees & certificates less than 4 year programme will assist us here while we also take into consideration measures like expenditure categories , Pell grant percentage , endowment values , average student aid amounts & full time faculty members contributing outstandingly towards instructional research / public service initiatives .
When trying to quantify outcomes back up Median Estimated SAT score metric helps us when it is derived either on 25th percentile basis / 75th percentile basis with all these factors further qualified by identifying required criteria meeting 90% threshold when incoming students are considered for relevance . Last but not least , Average Student Aid equalizes amount granted by institution dividing same over total sum received against what was allotted that particular year .
All this analysis gives an opportunity get a holistic overview about performance , potential deficits &
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains data on student success, graduation rates, race and gender demographics, an efficiency measure to compare colleges across states and more. It is a great source of information to help you better understand college completion and student success in the United States.
In this guide we’ll explain how to use the data so that you can find out the best colleges for students with certain characteristics or focus on your target completion rate. We’ll also provide some useful tips for getting the most out of this dataset when seeking guidance on which institutions offer the highest graduation rates or have a good reputation for success in terms of completing programs within normal timeframes.
Before getting into specifics about interpreting this dataset, it is important that you understand that each row represents information about a particular institution – such as its state affiliation, level (two-year vs four-year), control (public vs private), name and website. Each column contains various demographic information such as rate of awarding degrees compared to other institutions in its sector; race/ethnicity Makeup; full-time faculty percentage; median SAT score among first-time students; awards/grants comparison versus national average/state average - all applicable depending on institution location — and more!
When using this dataset, our suggestion is that you begin by forming a hypothesis or research question concerning student completion at a given school based upon observable characteristics like financ...
This dataset was created by DA
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Dresden Surgical Anatomy Dataset includes semantic segmentations for eight abdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands), the abdominal wall, and two vascular structures (inferior mesenteric artery, intestinal veins) as seen in laparoscopic views. The dataset was collected from 32 surgeries, with the majority of patients (26/32) being male, an average age of 63 years, and a mean BMI of 26.75 kg/m². All patients had clinical reasons for the procedures. The surgeries were conducted using a Da Vinci® Xi/X Endoscope with an 8mm diameter, 30° angled camera (Intuitive Surgical, Item code 470057), and recorded in MPEG-4 format at 1920 × 1080 pixel resolution, with each surgery lasting between two and ten hours. A medical student with two years of experience in robot-assisted rectal surgery (MC, FMR) used the Surgery Workflow Toolbox [Annotate] version 2.2.0 (b<>com, Cesson-Sévigné, France) to annotate the surgical processes. To ensure diversity, videos from at least 20 surgeries were selected for each anatomical structure, with up to 100 equidistant frames randomly chosen per organ. Consequently, the dataset contains at least 1,000 annotated images for each organ or structure, covering at least 20 patients. The pixel-wise segmentation was performed using 3D Slicer 4.11.20200930 (with the SlicerRT extension), an open-source medical imaging software, and was done manually with a stylus on a tablet computer. Additionally, weak labels were used to indicate the visibility of anatomical structures in each image, annotated by a medical student with experience in minimally invasive surgery and reviewed by a second annotator.
Paper Link: The Dresden Surgical Anatomy Dataset
This dataset was created by Roberto Barberá
Released under Data files © Original Authors
This dataset was created by Sheila Dias da Silva
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Francisco Alessandri
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
With the trend toward audiobooks growing, I gathered this data to understand how the audiobook market has been growing over the years. From authors of audiobooks to release dates, the data represents the important details of audiobooks from 1998 till 2025 (pre-planned releases).
I have yet to find a great audiobooks dataset and hence the urge to make a dataset that provides us with information on the basics and the history of audiobooks. I look to improve the dataset with more details in the near future.
The Uncleaned data or audible_uncleaned.csv is exactly the raw data I derived from Audible.in The Cleaned one or audible_cleaned.csv consists of a few basic data cleaning steps.
The data was collected using webs-scraping.
- re
- Beautiful Soup
- Selenium
Beautiful Soup
and Selenium
were used in unison to mainly gather the data. The code can be re-used and you can find the code here: https://github.com/snehangsude/audible_scraper
name
: Name of the audiobook author
: Author of the audiobook narrator
: Narrator of the audiobooktime
: Length of the audiobookreleasedate
: Release date of the audiobooklanguage
: Language of the audiobookstars
: No. of stars the audiobook received price
: Price of the audiobook in INRratings
: No. of reviews received by the audiobookThis dataset was created by Éderson de Almeida Pedro
Ebac Python - Material de apoio para o projeto final. Análise de crédito.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by yaso
Released under CC0: Public Domain
This dataset was created by Sarah da Silva Silveira
This dataset was created by Guilherme H da Silva
This dataset was created by Zoen de Loi
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
--- EN ---
Data collected from the Folha website, with dates prior to February 2024. The data encoding is UTF-8. You may feel the need to clean the data, although I have already done some work in this regard. The columns you will find are: Title, Content, URL, Published and Category. The application used was developed by me in C# and you can find the repository at the link below.
--- PT-BR ----
Dados coletados do site da Folha, com datas anteriores a fevereiro de 2024. O encoding dos dados é UTF-8. Você pode sentir necessidade de limpar os dados, embora eu já tenha feito algo nesse sentido. As colunas que você vai encontrar são: Título, Conteúdo, URL, publicado(em) e categoria. A aplicação usada foi desenvolvida por mim em C# e você pode encontrar o repositório no link abaixo.
https://github.com/luisfcaldeira/WebScrapper
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19155838%2Fb33a84bd4c5c02defaf5bc4574afd042%2Fcloud%20-%20Copia.png?generation=1708745865454053&alt=media" alt="">