6 datasets found

Data from: arXiv Dataset
kaggle.com
Updated Jul 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cornell University (2025). arXiv Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7548853
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7548853
Dataset updated
Jul 5, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Cornell University
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About ArXiv

For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.

In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.

Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.

The dataset is freely available via Google Cloud Storage buckets (more info here). Stay tuned for weekly updates to the dataset!

ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University.

The release of this dataset was featured further in a Kaggle blog post here.

https://storage.googleapis.com/kaggle-public-downloads/arXiv.JPG" alt="">

See here for more information.

ArXiv On Kaggle

Metadata

This dataset is a mirror of the original ArXiv data. Because the full dataset is rather large (1.1TB and growing), this dataset provides only a metadata file in the json format. This file contains an entry for each paper, containing: - id: ArXiv ID (can be used to access the paper, see below) - submitter: Who submitted the paper - authors: Authors of the paper - title: Title of the paper - comments: Additional info, such as number of pages and figures - journal-ref: Information about the journal the paper was published in - doi: https://www.doi.org - abstract: The abstract of the paper - categories: Categories / tags in the ArXiv system - versions: A version history

You can access each paper directly on ArXiv using these links: - https://arxiv.org/abs/{id}: Page for this paper including its abstract and further links - https://arxiv.org/pdf/{id}: Direct link to download the PDF

Bulk access

The full set of PDFs is available for free in the GCS bucket gs://arxiv-dataset or through Google API (json documentation and xml documentation).

You can use for example gsutil to download the data to your local machine. ```

List files:

gsutil cp gs://arxiv-dataset/arxiv/

Download pdfs from March 2020:

gsutil cp gs://arxiv-dataset/arxiv/arxiv/pdf/2003/ ./a_local_directory/

Download all the source files

gsutil cp -r gs://arxiv-dataset/arxiv/ ./a_local_directory/ ```

Update Frequency

We're automatically updating the metadata as well as the GCS bucket on a weekly basis.

License

Creative Commons CC0 1.0 Universal Public Domain Dedication applies to the metadata in this dataset. See https://arxiv.org/help/license for further details and licensing on individual papers.

Acknowledgements

The original data is maintained by ArXiv, huge thanks to the team for building and maintaining this dataset.

We're using https://github.com/mattbierbaum/arxiv-public-datasets to pull the original data, thanks to Matt Bierbaum for providing this tool.
A
‘COVID-19's Impact on Educational Stress’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID-19's Impact on Educational Stress’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-s-impact-on-educational-stress-49b5/4f12e21a/?iid=019-206&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘COVID-19's Impact on Educational Stress’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/bsoyka3/educational-stress-due-to-the-coronavirus-pandemic on 28 January 2022.

--- Dataset description provided by original source is as follows ---

The survey collecting this information is still open for responses here.

Context

I just made this public survey because I want someone to be able to do something fun or insightful with the data that's been gathered. You can fill it out too!

Content

Each row represents a response to the survey. A few things have been done to sanitize the raw responses: - Column names and options have been renamed to make them easier to work with without much loss of meaning. - Responses from non-students have been removed. - Responses with ages greater than or equal to 22 have been removed.

Take a look at the column description for each column to see what exactly it represents.

Acknowledgements

This dataset wouldn't exist without the help of others. I'd like to thank the following people for their contributions: - Every student who responded to the survey with valid responses - @radcliff on GitHub for providing the list of countries and abbreviations used in the survey and dataset - Giovanna de Vincenzo for providing the list of US states used in the survey and dataset - Simon Migaj for providing the image used for the survey and this dataset

--- Original source retains full ownership of the source dataset ---
N
Forestry Planting Spaces
data.cityofnewyork.us
catalog.data.gov
+2more
Updated Mar 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Parks and Recreation (DPR) (2025). Forestry Planting Spaces [Dataset]. https://data.cityofnewyork.us/Environment/Forestry-Planting-Spaces/82zj-84is
Explore at:
csv, xml, application/rdfxml, application/rssxml, tsv, kml, application/geo+json, kmzAvailable download formats
Dataset updated
Mar 5, 2025
Dataset authored and provided by
Department of Parks and Recreation (DPR)
Description
Record of Forestry planting spaces for NYC Parks & Recreation.

Tree Points and Planting Spaces form the basis of ForMS 2.0’s data inventory and are the core entities that all Service Requests, Inspections, and Work Orders are associated to. The system has built-in rules to ensure that every Tree Point has a Planting Space and each Planting Space can have no more than one active Tree Point at a given time. Locations that have had one tree removed and another tree replanted will appear in ForMS 2.0 as a single Planting Space associated with one retired Tree Point (that has a removal Work Order) and one active Tree Point.

User guide: https://docs.google.com/document/d/1PVPWFi-WExkG3rvnagQDoBbqfsGzxCKNmR6n678nUeU/edit?usp=sharing

Data dictionary: https://docs.google.com/spreadsheets/d/1yMfZgcsrvx9M0b3-ZdEQ3WCk2dFxgitCWytTrJSwEAs/edit?usp=sharing
18 excel spreadsheets by species and year giving reproduction and growth...
catalog.data.gov
data.wu.ac.at
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Future Asteroids
kaggle.com
Updated Dec 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Future Asteroids [Dataset]. https://www.kaggle.com/datasets/thedevastator/investigate-near-earth-asteroids-track-close-app
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2022
Dataset provided by
Kaggle
Authors
The Devastator
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Future Asteroids

All known future asteroids poised to pass near Earth, some dangerous (PHO)

By Mark Di Marco [source]

About this dataset

This fascinating dataset visualizes the ever-changing and dynamic world of Near Earth Asteroids (NEAs) that are either on their way to us or have recently came by! This real-time data offers an insight into our universe, helping you get a grasp of just how often asteroids fly by our planet and how close they can get. With this dataset containing information on those NEAs, you'll be able to get up close and personal with the cosmic travelers that grace the hood of our galaxy. We've included data like their known names, dates & times of their close approaches, distances in both astronomical units & Lunar Distances from Earth, velocities relative to us & the sun as well as other essential properties that will help paint a humanistic picture of these celestial objects. So join us on this exploration and take a journey through time into our cosmos with these asteroids!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Introduction This dataset provides information about Near Earth Asteroids that will make a close approach to Earth within the next 12 months, or have made a close approach within the last 12 months. The columns of data include characteristics such as distance from Earth and relative velocity, among others. To gain more insight on Near Earth Asteroids, follow these steps below:

Download the Dataset Download this Investigate Near-Earth Asteroid – Track Close Approaches to Earth! dataset from Kaggle. With this download you’ll receive two CSV files: future.csv and all.csv. The first file (future) covers asteroids making a close approach in the next 12 months and ones that have made one in last 12 months; while all covers asteroids making a close approaches at later times (further than twelve months away). Analyze, Interpret & Vizualize Once you’ve downloaded your data files onto your machine, open them up with Microsoft Excel or Google Sheets to begin analyzing your collected asteroid dataset! Utilize organizational tools available in each spreadsheet program to sort through each column of data observing its classification as well as minimum distances etc… for any correlations/conclusions one can draw about these objects as they pertain our current space environment . After exploring patterns found among the contents it’s time for data visualization ! Using programs such as Tableau or looker assist in creating interactive charts and graphs visually depicting collected asteroid knowledge based upon attributes like distances traveled and composition classifications observed throughout researching available entries across both csv sheets! Begin to compile stories generated through gathered info presented using said aforementioned charting platforms leading readers/viewers deeper into their own analysis of various NEA boundaries; showcasing understanding found through digging passed tabular datasets utilizing more impressive display visuals suitable for broader consumption beyond personal analysis !

Find Trends & Patterns The Future spreadsheet outlines all known asteroids categorized by their Distance Nominal(LD), Composition Classifications (GK), minimum (relative) speed VRelative(km/s)through space , size Vinfinity(km/s), standard deviation N Sigma of orbital path pertaining to earth; forming meaningful comparisons understandable almost anyone regardless their background knowledge when viewing provided visualizations created earlier during workflow joining together interpreted values researched throughout 3–4 emphasizing significance each metric holds when attempting assess risk posed our society at given moment presence current yearly trends collated applicable datasheets analyzed beforehand helping

Research Ideas

Use the data to build an accurate 3D-printed model of a NEA at different scales, depending on the size and shape it describes

Build a computer simulation which simulates close approaches of NEAs and the risk they pose to Earth

Develop an interactive map which displays current positions of NEAs and radar detection for confirmed threats

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) - You are free to: - Share - copy and redistribute the material in any mediu...

Covid19arData COVID-19 Argentina data

kaggle.com

Updated Jul 2, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Vladimiro Bellini (2025). Covid19arData COVID-19 Argentina data [Dataset]. https://www.kaggle.com/vladimirobellini/covid19ardata/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 2, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Vladimiro Bellini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Argentina

Description

Spreadsheet exportado

para acceder al spreadsheet dinamico: https://docs.google.com/spreadsheets/d/16-bnsDdmmgtSxdWbVMboIHo5FRuz76DBxsz_BbsEVWA/edit?usp=sharing

Contexto

Repositorio creado por Sistemas Mapache con el objetivo de poder contar con datos abiertos de la información oficial proveniente de los partes diarios sobre la situación de COVID-19 en Argentina.

También se suman datos con mayor segmentación territorial de fuentes provinciales.

Los datos historicos provienen de fuentes oficiales y no se mezclan con fuentes no oficiales.

Data Dictionary

Column Name	Description
fecha	fecha a la que corresponde los datos
dia_inicio	cant dias desde el inicio del caso 1
dia_cuarentena_dnu260	cant dias desde la cuarentena por DNU 260
osm_admin_level_2	nombre administrativo en OpenStreetMap escala país
osm_admin_level_4	nombre administrativo en OpenStreetMap escala provincia
osm_admin_level_8	nombre administrativo en OpenStreetMap escala ciudad
tot_casosconf	total de casos de infectados confirmados. Columna que sumariza fila a fila el total de casos confirmados
nue_casosconf_diff	nuevos casos infectados del dia
tot_fallecidos	total de fallecidos. Columna que sumariza fila a fila el total de fallecidos
nue_fallecidos_diff	nuevos casos fallecidos del dia
tot_recuperados	total acumulado de casos recuperados.
tot_test_negativos	total acumulado de tests negativos
tot_test	total acumulado de tests
transmision_tipo	tipo de transmision al dia de la fecha
informe_link	URL de acceso al informe de donde sale el dato
transmision_tipo	Region shapefile in WKT
observacion	observaciones relacionadas al dato o diferencias entre reportes
covid19argentina_admin_level_4	formato provincia requerido por necesario covid19argentina.com

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Cornell University (2025). arXiv Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7548853

Data from: arXiv Dataset

arXiv dataset and metadata of 1.7M+ scholarly papers across STEM

Explore at:

82 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/7548853

Dataset updated

Jul 5, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Cornell University

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

About ArXiv

For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.

In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.

Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.

The dataset is freely available via Google Cloud Storage buckets (more info here). Stay tuned for weekly updates to the dataset!

ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University.

The release of this dataset was featured further in a Kaggle blog post here.

https://storage.googleapis.com/kaggle-public-downloads/arXiv.JPG" alt="">

See here for more information.

ArXiv On Kaggle

Metadata

This dataset is a mirror of the original ArXiv data. Because the full dataset is rather large (1.1TB and growing), this dataset provides only a metadata file in the json format. This file contains an entry for each paper, containing: - id: ArXiv ID (can be used to access the paper, see below) - submitter: Who submitted the paper - authors: Authors of the paper - title: Title of the paper - comments: Additional info, such as number of pages and figures - journal-ref: Information about the journal the paper was published in - doi: https://www.doi.org - abstract: The abstract of the paper - categories: Categories / tags in the ArXiv system - versions: A version history

You can access each paper directly on ArXiv using these links: - https://arxiv.org/abs/{id}: Page for this paper including its abstract and further links - https://arxiv.org/pdf/{id}: Direct link to download the PDF

Bulk access

The full set of PDFs is available for free in the GCS bucket gs://arxiv-dataset or through Google API (json documentation and xml documentation).

You can use for example gsutil to download the data to your local machine. ```

List files:

gsutil cp gs://arxiv-dataset/arxiv/

Download pdfs from March 2020:

gsutil cp gs://arxiv-dataset/arxiv/arxiv/pdf/2003/ ./a_local_directory/

Download all the source files

gsutil cp -r gs://arxiv-dataset/arxiv/ ./a_local_directory/ ```

Update Frequency

We're automatically updating the metadata as well as the GCS bucket on a weekly basis.

License

Creative Commons CC0 1.0 Universal Public Domain Dedication applies to the metadata in this dataset. See https://arxiv.org/help/license for further details and licensing on individual papers.

Acknowledgements

The original data is maintained by ArXiv, huge thanks to the team for building and maintaining this dataset.

We're using https://github.com/mattbierbaum/arxiv-public-datasets to pull the original data, thanks to Matt Bierbaum for providing this tool.

Clear search

Close search

Google apps

Main menu

Data from: arXiv Dataset

About ArXiv

ArXiv On Kaggle

Metadata

Bulk access

List files:

Download pdfs from March 2020:

Download all the source files

Update Frequency

License

Acknowledgements

‘COVID-19's Impact on Educational Stress’ analyzed by Analyst-2

Context

Content

Acknowledgements

Forestry Planting Spaces

18 excel spreadsheets by species and year giving reproduction and growth...

Future Asteroids

Future Asteroids

All known future asteroids poised to pass near Earth, some dangerous (PHO)

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Covid19arData COVID-19 Argentina data

Spreadsheet exportado

Contexto

Data Dictionary

Data from: arXiv DatasetSee More Versions

arXiv dataset and metadata of 1.7M+ scholarly papers across STEM

About ArXiv

ArXiv On Kaggle

Metadata

Bulk access

List files:

Download pdfs from March 2020:

Download all the source files

Update Frequency

License

Acknowledgements

Data from: arXiv Dataset