100+ datasets found

Folha - News of the Brazilian Newspaper - 2024
kaggle.com
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
luisfcaldeira (2024). Folha - News of the Brazilian Newspaper - 2024 [Dataset]. https://www.kaggle.com/datasets/luisfcaldeira/folha-news-of-the-brazilian-newspaper-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Dataset provided by
Kaggle
Authors
luisfcaldeira
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
--- EN ---

Data collected from the Folha website, with dates prior to February 2024. The data encoding is UTF-8. You may feel the need to clean the data, although I have already done some work in this regard. The columns you will find are: Title, Content, URL, Published and Category. The application used was developed by me in C# and you can find the repository at the link below.

--- PT-BR ----

Dados coletados do site da Folha, com datas anteriores a fevereiro de 2024. O encoding dos dados é UTF-8. Você pode sentir necessidade de limpar os dados, embora eu já tenha feito algo nesse sentido. As colunas que você vai encontrar são: Título, Conteúdo, URL, publicado(em) e categoria. A aplicação usada foi desenvolvida por mim em C# e você pode encontrar o repositório no link abaixo.

https://github.com/luisfcaldeira/WebScrapper

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19155838%2Fb33a84bd4c5c02defaf5bc4574afd042%2Fcloud%20-%20Copia.png?generation=1708745865454053&alt=media" alt="">
h
Kaggle-LLM-Science-Exam
huggingface.co
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sangeetha Venkatesan (2023). Kaggle-LLM-Science-Exam [Dataset]. https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2023
Authors
Sangeetha Venkatesan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for [LLM Science Exam Kaggle Competition]

Dataset Summary

https://www.kaggle.com/competitions/kaggle-llm-science-exam/data

Languages

[en, de, tl, it, es, fr, pt, id, pl, ro, so, ca, da, sw, hu, no, nl, et, af, hr, lv, sl]

Dataset Structure

Columns prompt - the text of the question being asked A - option A; if this option is correct, then answer will be A B - option B; if this option is correct, then answer will be B C - option C; if this… See the full description on the dataset page: https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam.
imagenes
kaggle.com
Updated Feb 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Virginia Forcone (2023). imagenes [Dataset]. https://www.kaggle.com/datasets/mariavirginiaforcone/imagenes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Maria Virginia Forcone
Description
Datos para el trabajo correspondiente al proyecto final de la Diplomatura en Machine Learning con Python, del Instituto Data Science, titulado "Análisis de la deforestación en la región aledaña a la localidad de Joaquín V. González, Salta" https://www.kaggle.com/code/mariavirginiaforcone/deforestation-analysis-main-notebook

Contiene un informe que resume el trabajo realizado y una carpeta que incluye:

Imágenes satelitales en combinación de falso color compuesto (nir, swir1, red) de la región a analizar, en formato .tif. Corresponden al periodo desde 1986 a 2021; una imagen por año.

Dataset de entrenamiento (train_df.csv) para el modelo de clasificación supervisada.

Dataset sobre métricas medias (mean_data.csv) del modelo de clasificación supervisada.

Dataset de la serie temporal de deforestación (disminución de la vegetación nativa) (area_0.csv).

Dataset sobre el error de cada punto de la serie temporal (error_0.csv).

Data for the final project of the Diploma in Machine Learning with Python, from the Data Science Institute, entitled "Analysis of deforestation in the region surrounding the town of Joaquín V. González, Salta" https://www.kaggle.com/code/mariavirginiaforcone/deforestation-analysis-main-notebook

It contains a report that summaries the work and a folder that includes:

Satelital images in color combination of nir, swir1, red, corresponding to the analysis region, in .tif format. They're 1 image per year, from 1986 to 2021.

Train dataset (train_df.csv) for the supervised classification model.

Mean metrics dataset (mean_data.csv) of the supervised classification model.

Time series dataset about deforestation (area_0.csv)

Time series error dataset (error_0.csv)
Breast Tissue Impedance Measurements
kaggle.com
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarık Tuna Taşaltı (2024). Breast Tissue Impedance Measurements [Dataset]. https://www.kaggle.com/datasets/tarktunataalt/breast-tissue-impedance-measurements
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2024
Dataset provided by
Kaggle
Authors
Tarık Tuna Taşaltı
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Breast Tissue Dataset

This dataset contains electrical impedance measurements of freshly excised tissue samples from the breast. The data is sourced from the UCI Machine Learning Repository.

Dataset Characteristics

Type: Multivariate

Subject Area: Health and Medicine

Associated Tasks: Classification

Feature Type: Real

Instances: 106

Features: Various impedance measurements

Dataset Information

Impedance measurements were taken at the following frequencies: 15.625, 31.25, 62.5, 125, 250, 500, and 1000 KHz. These measurements, when plotted in the (real, -imaginary) plane, constitute the impedance spectrum from which the breast tissue features are computed. The dataset can be used for predicting the classification of either the original 6 classes or of 4 classes by merging the fibro-adenoma, mastopathy, and glandular classes, which are hard to discriminate.

Features

I0: Impedivity (ohm) at zero frequency

PA500: Phase angle at 500 KHz

HFS: High-frequency slope of phase angle

DA: Impedance distance between spectral ends

AREA: Area under spectrum

A/DA: Area normalized by DA

MAX IP: Maximum of the spectrum

DR: Distance between I0 and real part of the maximum frequency point

P: Length of the spectral curve

Class: Tissue type (carcinoma, fibro-adenoma, mastopathy, glandular, connective, adipose)

Classes

car: Carcinoma

fad: Fibro-adenoma

mas: Mastopathy

gla: Glandular

con: Connective

adi: Adipose

Usage

This dataset is suitable for classification tasks. The impedance measurements can be used to predict the type of breast tissue.

If you use this dataset, please cite it as follows:

S, JP and Jossinet, J. (2010). Breast Tissue. UCI Machine Learning Repository. https://doi.org/10.24432/C5P31H.

@misc{misc_breast_tissue_192,
author = "S, JP and Jossinet, J",
title = "Breast Tissue",
year = 2010,
howpublished = "UCI Machine Learning Repository",
note = "DOI: https://doi.org/10.24432/C5P31H"
}
Iris Species
kaggle.com
zip
Updated Sep 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2016). Iris Species [Dataset]. https://www.kaggle.com/datasets/uciml/iris
Explore at:
zip(3687 bytes)Available download formats
Dataset updated
Sep 27, 2016
Dataset authored and provided by
UCI Machine Learning
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

Id

SepalLengthCm

SepalWidthCm

PetalLengthCm

PetalWidthCm

Species
A
‘Campeonato Brasileiro de futebol’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Campeonato Brasileiro de futebol’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-campeonato-brasileiro-de-futebol-76c7/884f5307/?iid=019-463&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Campeonato Brasileiro de futebol’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adaoduque/campeonato-brasileiro-de-futebol on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Campeonato Brasileiro de Futebol

18 anos de campeonato brasileiro de futebol

Conteúdo

No total 7645 partidas de 2003 à 2021

Github do projeto

https://github.com/adaoduque/Brasileirao_Dataset

--- Original source retains full ownership of the source dataset ---
D.A.Project_1
kaggle.com
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shail_2604 (2024). D.A.Project_1 [Dataset]. https://www.kaggle.com/shail2604/d-a-project-1/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shail_2604
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Shail_2604

Released under Apache 2.0

Contents
Data from: College Completion Dataset
kaggle.com
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). College Completion Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/boost-student-success-with-college-completion-da
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
College Completion Dataset

Graduation Rates, Race, Efficiency Measures and More

By Jonathan Ortiz [source]

About this dataset

This College Completion dataset provides an invaluable insight into the success and progress of college students in the United States. It contains graduation rates, race and other data to offer a comprehensive view of college completion in America. The data is sourced from two primary sources – the National Center for Education Statistics (NCES)’ Integrated Postsecondary Education System (IPEDS) and Voluntary System of Accountability’s Student Success and Progress rate.

At four-year institutions, the graduation figures come from IPEDS for first-time, full-time degree seeking students at the undergraduate level, who entered college six years earlier at four-year institutions or three years earlier at two-year institutions. Furthermore, colleges report how many students completed their program within 100 percent and 150 percent of normal time which corresponds with graduation within four years or six year respectively. Students reported as being of two or more races are included in totals but not shown separately

When analyzing race and ethnicity data NCES have classified student demographics since 2009 into seven categories; White non-Hispanic; Black non Hispanic; American Indian/ Alaskan native ; Asian/ Pacific Islander ; Unknown race or ethnicity ; Non resident with two new categorize Native Hawaiian or Other Pacific Islander combined with Asian plus students belonging to several races. Also worth noting is that different classifications for graduate data stemming from 2008 could be due to variations in time frame examined & groupings used by particular colleges – those who can’t be identified from National Student Clearinghouse records won’t be subjected to penalty by these locations .

When it comes down to efficiency measures parameters like “Awards per 100 Full Time Undergraduate Students which includes all undergraduate completions reported by a particular institution including associate degrees & certificates less than 4 year programme will assist us here while we also take into consideration measures like expenditure categories , Pell grant percentage , endowment values , average student aid amounts & full time faculty members contributing outstandingly towards instructional research / public service initiatives .

When trying to quantify outcomes back up Median Estimated SAT score metric helps us when it is derived either on 25th percentile basis / 75th percentile basis with all these factors further qualified by identifying required criteria meeting 90% threshold when incoming students are considered for relevance . Last but not least , Average Student Aid equalizes amount granted by institution dividing same over total sum received against what was allotted that particular year .

All this analysis gives an opportunity get a holistic overview about performance , potential deficits &

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains data on student success, graduation rates, race and gender demographics, an efficiency measure to compare colleges across states and more. It is a great source of information to help you better understand college completion and student success in the United States.

In this guide we’ll explain how to use the data so that you can find out the best colleges for students with certain characteristics or focus on your target completion rate. We’ll also provide some useful tips for getting the most out of this dataset when seeking guidance on which institutions offer the highest graduation rates or have a good reputation for success in terms of completing programs within normal timeframes.

Before getting into specifics about interpreting this dataset, it is important that you understand that each row represents information about a particular institution – such as its state affiliation, level (two-year vs four-year), control (public vs private), name and website. Each column contains various demographic information such as rate of awarding degrees compared to other institutions in its sector; race/ethnicity Makeup; full-time faculty percentage; median SAT score among first-time students; awards/grants comparison versus national average/state average - all applicable depending on institution location — and more!

When using this dataset, our suggestion is that you begin by forming a hypothesis or research question concerning student completion at a given school based upon observable characteristics like financ...
public_submit
kaggle.com
Updated Mar 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DA (2021). public_submit [Dataset]. https://www.kaggle.com/graafffff/public-submit/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 25, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DA
Description
Dataset

This dataset was created by DA

Contents
The Dresden Surgical Anatomy Dataset
kaggle.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anindya Majumder (2025). The Dresden Surgical Anatomy Dataset [Dataset]. https://www.kaggle.com/datasets/anindyamajumder/the-dresden-surgical-anatomy-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anindya Majumder
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Dresden Surgical Anatomy Dataset includes semantic segmentations for eight abdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands), the abdominal wall, and two vascular structures (inferior mesenteric artery, intestinal veins) as seen in laparoscopic views. The dataset was collected from 32 surgeries, with the majority of patients (26/32) being male, an average age of 63 years, and a mean BMI of 26.75 kg/m². All patients had clinical reasons for the procedures. The surgeries were conducted using a Da Vinci® Xi/X Endoscope with an 8mm diameter, 30° angled camera (Intuitive Surgical, Item code 470057), and recorded in MPEG-4 format at 1920 × 1080 pixel resolution, with each surgery lasting between two and ten hours. A medical student with two years of experience in robot-assisted rectal surgery (MC, FMR) used the Surgery Workflow Toolbox [Annotate] version 2.2.0 (b<>com, Cesson-Sévigné, France) to annotate the surgical processes. To ensure diversity, videos from at least 20 surgeries were selected for each anatomical structure, with up to 100 equidistant frames randomly chosen per organ. Consequently, the dataset contains at least 1,000 annotated images for each organ or structure, covering at least 20 patients. The pixel-wise segmentation was performed using 3D Slicer 4.11.20200930 (with the SlicerRT extension), an open-source medical imaging software, and was done manually with a stylus on a tablet computer. Additionally, weak labels were used to indicate the visibility of anatomical structures in each image, annotated by a medical student with experience in minimally invasive surgery and reviewed by a second annotator.

Paper Link: The Dresden Surgical Anatomy Dataset
Portfolio de ativos da B3 Bovespa
kaggle.com
Updated Sep 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Barberá (2018). Portfolio de ativos da B3 Bovespa [Dataset]. https://www.kaggle.com/rbarbera/portfolio-de-ativos-da-b3-bovespa/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 14, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Roberto Barberá
Description
Dataset

This dataset was created by Roberto Barberá

Released under Data files © Original Authors

Contents
NYC_Jobs
kaggle.com
zip
Updated Sep 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sheila Dias da Silva (2021). NYC_Jobs [Dataset]. https://www.kaggle.com/sheiladiasdasilva/nyc-jobs
Explore at:
zip(3104402 bytes)Available download formats
Dataset updated
Sep 2, 2021
Authors
Sheila Dias da Silva
Area covered
New York
Description
Dataset

This dataset was created by Sheila Dias da Silva

Contents
Tarea 2 visualización de datos
kaggle.com
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francisco Alessandri (2023). Tarea 2 visualización de datos [Dataset]. https://www.kaggle.com/datasets/franciscoalessandri/tarea-2-visualizacin-de-datos/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Francisco Alessandri
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Francisco Alessandri

Released under CC0: Public Domain

Contents
Audible Dataset
kaggle.com
Updated Apr 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Snehangsu De (2022). Audible Dataset [Dataset]. https://www.kaggle.com/datasets/snehangsude/audible-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Snehangsu De
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introduction

With the trend toward audiobooks growing, I gathered this data to understand how the audiobook market has been growing over the years. From authors of audiobooks to release dates, the data represents the important details of audiobooks from 1998 till 2025 (pre-planned releases).

I have yet to find a great audiobooks dataset and hence the urge to make a dataset that provides us with information on the basics and the history of audiobooks. I look to improve the dataset with more details in the near future.

File Information

The Uncleaned data or audible_uncleaned.csv is exactly the raw data I derived from Audible.in The Cleaned one or audible_cleaned.csv consists of a few basic data cleaning steps.

Libraries used

The data was collected using webs-scraping. - re - Beautiful Soup - Selenium

Beautiful Soup and Selenium were used in unison to mainly gather the data. The code can be re-used and you can find the code here: https://github.com/snehangsude/audible_scraper

Column Breakdown

name: Name of the audiobook

author: Author of the audiobook

narrator: Narrator of the audiobook

time: Length of the audiobook

releasedate: Release date of the audiobook

language: Language of the audiobook

stars: No. of stars the audiobook received

price: Price of the audiobook in INR

ratings: No. of reviews received by the audiobook
nr12atualizada
kaggle.com
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Éderson de Almeida Pedro (2023). nr12atualizada [Dataset]. https://www.kaggle.com/dersondealmeidapedro/nr12atualizada/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Éderson de Almeida Pedro
Description
Dataset

This dataset was created by Éderson de Almeida Pedro

Contents
Dataset de Crédito
kaggle.com
Updated Sep 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Narciso Nascimento (2022). Dataset de Crédito [Dataset]. https://www.kaggle.com/datasets/narcisonascimento/ebac-python-projeto-final-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Narciso Nascimento
Description
Ebac Python - Material de apoio para o projeto final. Análise de crédito.
Microdados da Rede Municipal Matrículas
kaggle.com
Updated Apr 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yaso (2018). Microdados da Rede Municipal Matrículas [Dataset]. https://www.kaggle.com/datasets/yasodara/microdados-da-rede-municipal-matrculas/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2018
Dataset provided by
Kaggle
Authors
yaso
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by yaso

Released under CC0: Public Domain

Contents
deteccao de objetos roboflow
kaggle.com
Updated Jan 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah da Silva Silveira (2023). deteccao de objetos roboflow [Dataset]. https://www.kaggle.com/datasets/sarahsilveira/deteccao-de-objetos-roboflow/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sarah da Silva Silveira
Description
Dataset

This dataset was created by Sarah da Silva Silveira

Contents
pesosvaegan
kaggle.com
zip
Updated Feb 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guilherme H da Silva (2021). pesosvaegan [Dataset]. https://www.kaggle.com/guilhermehdasilva/pesosvaegan
Explore at:
zip(292526110 bytes)Available download formats
Dataset updated
Feb 12, 2021
Authors
Guilherme H da Silva
Description
Dataset

This dataset was created by Guilherme H da Silva

Contents
DataPaises
kaggle.com
zip
Updated Oct 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zoen de Loi (2023). DataPaises [Dataset]. https://www.kaggle.com/datasets/zoendeloi/datapaises
Explore at:
zip(566127 bytes)Available download formats
Dataset updated
Oct 24, 2023
Authors
Zoen de Loi
Description
Dataset

This dataset was created by Zoen de Loi

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

luisfcaldeira (2024). Folha - News of the Brazilian Newspaper - 2024 [Dataset]. https://www.kaggle.com/datasets/luisfcaldeira/folha-news-of-the-brazilian-newspaper-2024

Folha - News of the Brazilian Newspaper - 2024

150k news of the site Folha de São Paulo

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 24, 2024

Dataset provided by

Kaggle

Authors

luisfcaldeira

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

--- EN ---

Data collected from the Folha website, with dates prior to February 2024. The data encoding is UTF-8. You may feel the need to clean the data, although I have already done some work in this regard. The columns you will find are: Title, Content, URL, Published and Category. The application used was developed by me in C# and you can find the repository at the link below.

--- PT-BR ----

Dados coletados do site da Folha, com datas anteriores a fevereiro de 2024. O encoding dos dados é UTF-8. Você pode sentir necessidade de limpar os dados, embora eu já tenha feito algo nesse sentido. As colunas que você vai encontrar são: Título, Conteúdo, URL, publicado(em) e categoria. A aplicação usada foi desenvolvida por mim em C# e você pode encontrar o repositório no link abaixo.

https://github.com/luisfcaldeira/WebScrapper

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19155838%2Fb33a84bd4c5c02defaf5bc4574afd042%2Fcloud%20-%20Copia.png?generation=1708745865454053&alt=media" alt="">

Clear search

Close search

Google apps

Main menu

Folha - News of the Brazilian Newspaper - 2024

Kaggle-LLM-Science-Exam

imagenes

Breast Tissue Impedance Measurements

Breast Tissue Dataset

Dataset Characteristics

Dataset Information

Features

Classes

Usage

Iris Species

‘Campeonato Brasileiro de futebol’ analyzed by Analyst-2

Campeonato Brasileiro de Futebol

Conteúdo

Github do projeto

D.A.Project_1

Dataset

Contents

Data from: College Completion Dataset

College Completion Dataset

Graduation Rates, Race, Efficiency Measures and More

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

public_submit

Dataset

Contents

The Dresden Surgical Anatomy Dataset

Portfolio de ativos da B3 Bovespa

Dataset

Contents

NYC_Jobs

Dataset

Contents

Tarea 2 visualización de datos

Dataset

Contents

Audible Dataset

Introduction

File Information

Libraries used

Column Breakdown

nr12atualizada

Dataset

Contents

Dataset de Crédito

Microdados da Rede Municipal Matrículas

Dataset

Contents

deteccao de objetos roboflow

Dataset

Contents

pesosvaegan

Dataset

Contents

DataPaises

Dataset

Contents

Folha - News of the Brazilian Newspaper - 2024

150k news of the site Folha de São Paulo