https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.
In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.
Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.
The dataset is freely available via Google Cloud Storage buckets (more info here). Stay tuned for weekly updates to the dataset!
ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University.
The release of this dataset was featured further in a Kaggle blog post here.
https://storage.googleapis.com/kaggle-public-downloads/arXiv.JPG" alt="">
See here for more information.
This dataset is a mirror of the original ArXiv data. Because the full dataset is rather large (1.1TB and growing), this dataset provides only a metadata file in the json
format. This file contains an entry for each paper, containing:
- id
: ArXiv ID (can be used to access the paper, see below)
- submitter
: Who submitted the paper
- authors
: Authors of the paper
- title
: Title of the paper
- comments
: Additional info, such as number of pages and figures
- journal-ref
: Information about the journal the paper was published in
- doi
: https://www.doi.org
- abstract
: The abstract of the paper
- categories
: Categories / tags in the ArXiv system
- versions
: A version history
You can access each paper directly on ArXiv using these links:
- https://arxiv.org/abs/{id}
: Page for this paper including its abstract and further links
- https://arxiv.org/pdf/{id}
: Direct link to download the PDF
The full set of PDFs is available for free in the GCS bucket gs://arxiv-dataset
or through Google API (json documentation and xml documentation).
You can use for example gsutil to download the data to your local machine. ```
gsutil cp gs://arxiv-dataset/arxiv/
gsutil cp gs://arxiv-dataset/arxiv/arxiv/pdf/2003/ ./a_local_directory/
gsutil cp -r gs://arxiv-dataset/arxiv/ ./a_local_directory/ ```
We're automatically updating the metadata as well as the GCS bucket on a weekly basis.
Creative Commons CC0 1.0 Universal Public Domain Dedication applies to the metadata in this dataset. See https://arxiv.org/help/license for further details and licensing on individual papers.
The original data is maintained by ArXiv, huge thanks to the team for building and maintaining this dataset.
We're using https://github.com/mattbierbaum/arxiv-public-datasets to pull the original data, thanks to Matt Bierbaum for providing this tool.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19's Impact on Educational Stress’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/bsoyka3/educational-stress-due-to-the-coronavirus-pandemic on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The survey collecting this information is still open for responses here.
I just made this public survey because I want someone to be able to do something fun or insightful with the data that's been gathered. You can fill it out too!
Each row represents a response to the survey. A few things have been done to sanitize the raw responses: - Column names and options have been renamed to make them easier to work with without much loss of meaning. - Responses from non-students have been removed. - Responses with ages greater than or equal to 22 have been removed.
Take a look at the column description for each column to see what exactly it represents.
This dataset wouldn't exist without the help of others. I'd like to thank the following people for their contributions: - Every student who responded to the survey with valid responses - @radcliff on GitHub for providing the list of countries and abbreviations used in the survey and dataset - Giovanna de Vincenzo for providing the list of US states used in the survey and dataset - Simon Migaj for providing the image used for the survey and this dataset
--- Original source retains full ownership of the source dataset ---
Record of Forestry planting spaces for NYC Parks & Recreation.
Tree Points and Planting Spaces form the basis of ForMS 2.0’s data inventory and are the core entities that all Service Requests, Inspections, and Work Orders are associated to. The system has built-in rules to ensure that every Tree Point has a Planting Space and each Planting Space can have no more than one active Tree Point at a given time. Locations that have had one tree removed and another tree replanted will appear in ForMS 2.0 as a single Planting Space associated with one retired Tree Point (that has a removal Work Order) and one active Tree Point.
User guide: https://docs.google.com/document/d/1PVPWFi-WExkG3rvnagQDoBbqfsGzxCKNmR6n678nUeU/edit?usp=sharing
Data dictionary: https://docs.google.com/spreadsheets/d/1yMfZgcsrvx9M0b3-ZdEQ3WCk2dFxgitCWytTrJSwEAs/edit?usp=sharing
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
By Mark Di Marco [source]
This fascinating dataset visualizes the ever-changing and dynamic world of Near Earth Asteroids (NEAs) that are either on their way to us or have recently came by! This real-time data offers an insight into our universe, helping you get a grasp of just how often asteroids fly by our planet and how close they can get. With this dataset containing information on those NEAs, you'll be able to get up close and personal with the cosmic travelers that grace the hood of our galaxy. We've included data like their known names, dates & times of their close approaches, distances in both astronomical units & Lunar Distances from Earth, velocities relative to us & the sun as well as other essential properties that will help paint a humanistic picture of these celestial objects. So join us on this exploration and take a journey through time into our cosmos with these asteroids!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Introduction This dataset provides information about Near Earth Asteroids that will make a close approach to Earth within the next 12 months, or have made a close approach within the last 12 months. The columns of data include characteristics such as distance from Earth and relative velocity, among others. To gain more insight on Near Earth Asteroids, follow these steps below:
Download the Dataset Download this Investigate Near-Earth Asteroid – Track Close Approaches to Earth! dataset from Kaggle. With this download you’ll receive two CSV files: future.csv and all.csv. The first file (future) covers asteroids making a close approach in the next 12 months and ones that have made one in last 12 months; while all covers asteroids making a close approaches at later times (further than twelve months away). Analyze, Interpret & Vizualize Once you’ve downloaded your data files onto your machine, open them up with Microsoft Excel or Google Sheets to begin analyzing your collected asteroid dataset! Utilize organizational tools available in each spreadsheet program to sort through each column of data observing its classification as well as minimum distances etc… for any correlations/conclusions one can draw about these objects as they pertain our current space environment . After exploring patterns found among the contents it’s time for data visualization ! Using programs such as Tableau or looker assist in creating interactive charts and graphs visually depicting collected asteroid knowledge based upon attributes like distances traveled and composition classifications observed throughout researching available entries across both csv sheets! Begin to compile stories generated through gathered info presented using said aforementioned charting platforms leading readers/viewers deeper into their own analysis of various NEA boundaries; showcasing understanding found through digging passed tabular datasets utilizing more impressive display visuals suitable for broader consumption beyond personal analysis !
Find Trends & Patterns The Future spreadsheet outlines all known asteroids categorized by their Distance Nominal(LD), Composition Classifications (GK), minimum (relative) speed VRelative(km/s)through space , size Vinfinity(km/s), standard deviation N Sigma of orbital path pertaining to earth; forming meaningful comparisons understandable almost anyone regardless their background knowledge when viewing provided visualizations created earlier during workflow joining together interpreted values researched throughout 3–4 emphasizing significance each metric holds when attempting assess risk posed our society at given moment presence current yearly trends collated applicable datasheets analyzed beforehand helping
- Use the data to build an accurate 3D-printed model of a NEA at different scales, depending on the size and shape it describes
- Build a computer simulation which simulates close approaches of NEAs and the risk they pose to Earth
- Develop an interactive map which displays current positions of NEAs and radar detection for confirmed threats
If you use this dataset in your research, please credit the original authors. Data Source
License: Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) - You are free to: - Share - copy and redistribute the material in any mediu...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
para acceder al spreadsheet dinamico: https://docs.google.com/spreadsheets/d/16-bnsDdmmgtSxdWbVMboIHo5FRuz76DBxsz_BbsEVWA/edit?usp=sharing
Repositorio creado por Sistemas Mapache con el objetivo de poder contar con datos abiertos de la información oficial proveniente de los partes diarios sobre la situación de COVID-19 en Argentina.
También se suman datos con mayor segmentación territorial de fuentes provinciales.
Los datos historicos provienen de fuentes oficiales y no se mezclan con fuentes no oficiales.
Column Name | Description |
---|---|
fecha | fecha a la que corresponde los datos |
dia_inicio | cant dias desde el inicio del caso 1 |
dia_cuarentena_dnu260 | cant dias desde la cuarentena por DNU 260 |
osm_admin_level_2 | nombre administrativo en OpenStreetMap escala país |
osm_admin_level_4 | nombre administrativo en OpenStreetMap escala provincia |
osm_admin_level_8 | nombre administrativo en OpenStreetMap escala ciudad |
tot_casosconf | total de casos de infectados confirmados. Columna que sumariza fila a fila el total de casos confirmados |
nue_casosconf_diff | nuevos casos infectados del dia |
tot_fallecidos | total de fallecidos. Columna que sumariza fila a fila el total de fallecidos |
nue_fallecidos_diff | nuevos casos fallecidos del dia |
tot_recuperados | total acumulado de casos recuperados. |
tot_test_negativos | total acumulado de tests negativos |
tot_test | total acumulado de tests |
transmision_tipo | tipo de transmision al dia de la fecha |
informe_link | URL de acceso al informe de donde sale el dato |
transmision_tipo | Region shapefile in WKT |
observacion | observaciones relacionadas al dato o diferencias entre reportes |
covid19argentina_admin_level_4 | formato provincia requerido por necesario covid19argentina.com |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.
In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.
Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.
The dataset is freely available via Google Cloud Storage buckets (more info here). Stay tuned for weekly updates to the dataset!
ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University.
The release of this dataset was featured further in a Kaggle blog post here.
https://storage.googleapis.com/kaggle-public-downloads/arXiv.JPG" alt="">
See here for more information.
This dataset is a mirror of the original ArXiv data. Because the full dataset is rather large (1.1TB and growing), this dataset provides only a metadata file in the json
format. This file contains an entry for each paper, containing:
- id
: ArXiv ID (can be used to access the paper, see below)
- submitter
: Who submitted the paper
- authors
: Authors of the paper
- title
: Title of the paper
- comments
: Additional info, such as number of pages and figures
- journal-ref
: Information about the journal the paper was published in
- doi
: https://www.doi.org
- abstract
: The abstract of the paper
- categories
: Categories / tags in the ArXiv system
- versions
: A version history
You can access each paper directly on ArXiv using these links:
- https://arxiv.org/abs/{id}
: Page for this paper including its abstract and further links
- https://arxiv.org/pdf/{id}
: Direct link to download the PDF
The full set of PDFs is available for free in the GCS bucket gs://arxiv-dataset
or through Google API (json documentation and xml documentation).
You can use for example gsutil to download the data to your local machine. ```
gsutil cp gs://arxiv-dataset/arxiv/
gsutil cp gs://arxiv-dataset/arxiv/arxiv/pdf/2003/ ./a_local_directory/
gsutil cp -r gs://arxiv-dataset/arxiv/ ./a_local_directory/ ```
We're automatically updating the metadata as well as the GCS bucket on a weekly basis.
Creative Commons CC0 1.0 Universal Public Domain Dedication applies to the metadata in this dataset. See https://arxiv.org/help/license for further details and licensing on individual papers.
The original data is maintained by ArXiv, huge thanks to the team for building and maintaining this dataset.
We're using https://github.com/mattbierbaum/arxiv-public-datasets to pull the original data, thanks to Matt Bierbaum for providing this tool.