12 datasets found

n
Coronavirus (Covid-19) Data in the United States
nytimes.com
openicpsr.org
+4more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Explore at:
Dataset provided by
New York Times
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Explore Bike Share Data
kaggle.com
zip
Updated Jun 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaltout (2021). Explore Bike Share Data [Dataset]. https://www.kaggle.com/shaltout/explore-bike-share-data
Explore at:
zip(26232124 bytes)Available download formats
Dataset updated
Jun 3, 2021
Authors
Shaltout
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Bike Share Data Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

In this project, you will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

The Datasets Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

Start Time (e.g., 2017-01-01 00:07:57) End Time (e.g., 2017-01-01 00:20:53) Trip Duration (in seconds - e.g., 776) Start Station (e.g., Broadway & Barry Ave) End Station (e.g., Sedgwick St & North Ave) User Type (Subscriber or Customer) The Chicago and New York City files also have the following two columns:

Gender Birth Year

Data for the first 10 rides in the new_york_city.csv file

The original files are much larger and messier, and you don't need to download them, but they can be accessed here if you'd like to see them (Chicago, New York City, Washington). These files had more columns and they differed in format in many cases. Some data wrangling has been performed to condense these files to the above core six columns to make your analysis and the evaluation of your Python skills more straightforward. In the Data Wrangling course that comes later in the Data Analyst Nanodegree program, students learn how to wrangle the dirtiest, messiest datasets, so don't worry, you won't miss out on learning this important skill!

Statistics Computed You will learn about bike share use in Chicago, New York City, and Washington by computing a variety of descriptive statistics. In this project, you'll write code to provide the following information:

1 Popular times of travel (i.e., occurs most often in the start time)

most common month most common day of week most common hour of day

2 Popular stations and trip

most common start station most common end station most common trip from start to end (i.e., most frequent combination of start station and end station)

3 Trip duration

total travel time average travel time

4 User info

counts of each user type counts of each gender (only available for NYC and Chicago) earliest, most recent, most common year of birth (only available for NYC and Chicago) The Files To answer these questions using Python, you will need to write a Python script. To help guide your work in this project, a template with helper code and comments is provided in a bikeshare.py file, and you will do your scripting in there also. You will need the three city dataset files too:

chicago.csv new_york_city.csv washington.csv

All four of these files are zipped up in the Bikeshare file in the resource tab in the sidebar on the left side of this page. You may download and open up that zip file to do your project work on your local machine.
d
Johns Hopkins COVID-19 Case Tracker
data.world
kaggle.com
csv, zip
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 3, 2025
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
S
Westchester Medical Center
health.data.ny.gov
csv, xlsx, xml
Updated Nov 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York State Department of Health (2025). Westchester Medical Center [Dataset]. https://health.data.ny.gov/Health/Westchester-Medical-Center/hsv3-bdmk
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Nov 22, 2025
Authors
New York State Department of Health
Area covered
Westchester County
Description
This data includes the name and location of active food service establishments and the violations that were found at the time of the inspection. Active food service establishments include only establishments that are currently operating. This dataset excludes inspections conducted in New York City (https://data.cityofnewyork.us/Health/Restaurant-Inspection-Results/4vkw-7nck), Suffolk County (http://apps.suffolkcountyny.gov/health/Restaurant/intro.html) and Erie County (http://www.healthspace.com/erieny). Inspections are a “snapshot” in time and are not always reflective of the day-to-day operations and overall condition of an establishment. Occasionally, remediation may not appear until the following month due to the timing of the updates. Update frequencies and availability of historical inspection data may vary from county to county. Some counties provide this information on their own websites and information found there may be updated more frequently. This dataset is refreshed on a monthly basis. The inspection data contained in this dataset was not collected in a manner intended for use as a restaurant grading system, and should not be construed or interpreted as such. Any use of this data to develop a restaurant grading system is not supported or endorsed by the New York State Department of Health. Historical inspection data through 2005 is also available. Inactive (closed) establishments can be found at: https://health.data.ny.gov/Health/Food-Service-Establishment-Inspections-Beginning-2/aaxz-j6pj. For more information, visit http://www.health.ny.gov/regulations/nycrr/title_10/part_14/subpart_14-1.htm or go to the “About” tab.
For Hire Vehicles (FHV) - Active
data.cityofnewyork.us
s.cnmilf.com
+2more
csv, xlsx, xml
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taxi and Limousine Commission (TLC) (2025). For Hire Vehicles (FHV) - Active [Dataset]. https://data.cityofnewyork.us/Transportation/For-Hire-Vehicles-FHV-Active/8wbx-tsch
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Dec 2, 2025
Dataset provided by
New York City Taxi and Limousine Commissionhttp://www.nyc.gov/tlc
Authors
Taxi and Limousine Commission (TLC)
Description
PLEASE NOTE: This dataset, which includes all TLC licensed for-hire vehicles which are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_for_hire_vehicle_active_and_inactive.csv

TLC authorized For-Hire vehicles that are active. This list is accurate to the date and time represented in the Last Date Updated and Last Time Updated fields. For inquiries about the contents of this dataset, please email licensinginquiries@tlc.nyc.gov.
Bike share systems for three major cities
kaggle.com
zip
Updated Jan 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amgad mahrous (2023). Bike share systems for three major cities [Dataset]. https://www.kaggle.com/datasets/amgadmahrous/bike-share-systems-for-three-major-cities
Explore at:
zip(26232124 bytes)Available download formats
Dataset updated
Jan 31, 2023
Authors
amgad mahrous
Description
Bike Share Data Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

In this project, provided by udacity and you will use data provided by Motivate, a bike-share system provider for many major cities in the United States, to uncover bike-share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

The Datasets Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core of six (6) columns:

Start Time (e.g., 2017-01-01 00:07:57) End Time (e.g., 2017-01-01 00:20:53) Trip Duration (in seconds - e.g., 776) Start Station (e.g., Broadway & Barry Ave) End Station (e.g., Sedgwick St & North Ave) User Type (Subscriber or Customer) The Chicago and New York City files also have the following two columns:

Gender Birth Year
Coronavirus (Covid-19) Data in the United States
kaggle.com
zip
Updated Apr 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wing (2020). Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.kaggle.com/gniwnyc/nytimescovid19usdataset
Explore at:
zip(610420 bytes)Available download formats
Dataset updated
Apr 19, 2020
Authors
Wing
Area covered
United States
Description
Copyright 2020 by The New York Times Company

Coronavirus (Covid-19) Data in the United States

[ U.S. Data (Raw CSV) | U.S. State-Level Data (Raw CSV) | U.S. County-Level Data (Raw CSV) ]

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

United States Data Data on cumulative coronavirus cases and deaths can be found in three files, one for each of these geographic levels: U.S., states and counties.

Each row of data reports cumulative counts based on our best reporting up to the moment we publish an update. We do our best to revise earlier entries in the data when we receive new information. If a county is not listed for a date, then there were zero reported confirmed cases and deaths.

State and county files contain FIPS codes, a standard geographic identifier, to make it easier for an analyst to combine this data with other data sets like a map file or population data.

Download all the data or clone this repository by clicking the green "Clone or download" button above.

U.S. National-Level Data The daily number of cases and deaths nationwide, including states, U.S. territories and the District of Columbia, can be found in the us.csv file. (Raw CSV file here.)

date,cases,deaths 2020-01-21,1,0 ... State-Level Data State-level data can be found in the states.csv file. (Raw CSV file here.)

date,state,fips,cases,deaths 2020-01-21,Washington,53,1,0 ... County-Level Data County-level data can be found in the counties.csv file. (Raw CSV file here.)

date,county,state,fips,cases,deaths 2020-01-21,Snohomish,Washington,53061,1,0 ... In some cases, the geographies where cases are reported do not map to standard county boundaries. See the list of geographic exceptions for more detail on these.

Methodology and Definitions The data is the product of dozens of journalists working across several time zones to monitor news conferences, analyze data releases and seek clarification from public officials on how they categorize cases.

It is also a response to a fragmented American public health system in which overwhelmed public servants at the state, county and territorial level have sometimes struggled to report information accurately, consistently and speedily. On several occasions, officials have corrected information hours or days after first reporting it. At times, cases have disappeared from a local government database, or officials have moved a patient first identified in one state or county to another, often with no explanation. In those instances, which have become more common as the number of cases has grown, our team has made every effort to update the data to reflect the most current, accurate information while ensuring that every known case is counted.

When the information is available, we count patients where they are being treated, not necessarily where they live.

In most instances, the process of recording cases has been straightforward. But because of the patchwork of reporting methods for this data across more than 50 state and territorial governments and hundreds of local health departments, our journalists sometimes had to make difficult interpretations about how to count and record cases.

For those reasons, our data will in some cases not exactly match with the information reported by states and counties. Those differences include these cases: When the federal government arranged flights to the United States for Americans exposed to the coronavirus in China and Japan, our team recorded those cases in the states where the patients subsequently were treated, even though local health departments generally did not. When a resident of Florida died in Los Angeles, we recorded her death as having occurred in California rather than Florida, though officials in Florida counted her case in their own records. And when officials in some states reported new cases without immediately identifying where the patients were being treated, we attempted to add informati...
c
Top 15 States by Estimated Number of Homeless People in 2024
consumershield.com
csv
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ConsumerShield Research Team (2025). Top 15 States by Estimated Number of Homeless People in 2024 [Dataset]. https://www.consumershield.com/articles/how-many-homeless-us
Explore at:
csvAvailable download formats
Dataset updated
Jun 9, 2025
Dataset authored and provided by
ConsumerShield Research Team
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
United States
Description
The graph displays the top 15 states by an estimated number of homeless people in the United States for the year 2025. The x-axis represents U.S. states, while the y-axis shows the number of homeless individuals in each state. California has the highest homeless population with 187,084 individuals, followed by New York with 158,019, while Hawaii places last in this dataset with 11,637. This bar graph highlights significant differences across states, with some states like California and New York showing notably higher counts compared to others, indicating regional disparities in homelessness levels across the country.
Travellers to Canada from the United States by state of origin, top 15...
www150.statcan.gc.ca
ouvert.canada.ca
+1more
Updated Jan 19, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2016). Travellers to Canada from the United States by state of origin, top 15 states of origin [Dataset]. http://doi.org/10.25318/2410004001-eng
Explore at:
Unique identifier
https://doi.org/10.25318/2410004001-eng
Dataset updated
Jan 19, 2016
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Government of Canadahttp://www.gg.ca/
Area covered
Canada
Description
This table contains 45 series, with data for years 2014 - 2014 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 item: Canada) State of origin (15 items: New York; Washington; Michigan; California; ...) Traveller characteristics (3 items: Trips; Nights; Spending in Canada).
Simple Flight Scheduling Optimization Dataset
kaggle.com
zip
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
agrover112 (2022). Simple Flight Scheduling Optimization Dataset [Dataset]. https://www.kaggle.com/datasets/agrover112/simple-flight-scheduling-optimization-dataset
Explore at:
zip(1208 bytes)Available download formats
Dataset updated
Sep 8, 2022
Authors
agrover112
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Dataset was introduced by Toby Seagaran in his book Programming Collective Intelligence.

The columns are: Departure airport code, Arrival airport code , Time of Arrival(24h), Time of Departure(24h), Cost (USD)

Problem Definition

Planning a trip for a group of people from different locations all arriving at the same place is always a challenge, and it makes for an interesting optimization problem. In our situation group members are from all over the country and wish to meet up at a prticular location say New York. They will all arrive on the same day and leave on the same day, and they would like to share transportation to and from the airport. There are dozens of flights per day to New York from any of the family members’ locations, all leaving at different times.

For more information and examples check out github.com/Agrover112/fliscopt/examples
CNN-DailyMail News Text Summarization
kaggle.com
zip
Updated Oct 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gowri Shankar Penugonda (2021). CNN-DailyMail News Text Summarization [Dataset]. https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail/code
Explore at:
zip(527738644 bytes)Available download formats
Dataset updated
Oct 23, 2021
Authors
Gowri Shankar Penugonda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
dataset-card-for-cnn-dailymail-dataset

dataset-summary

The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

supported-tasks-and-leaderboards

'summarization': Versions 2.0.0 and 3.0.0 of the CNN / DailyMail Dataset can be used to train a model for abstractive and extractive summarization (Version 1.0.0 was developed for machine reading and comprehension and abstractive question answering). The model performance is measured by how high the output summary's ROUGE score for a given article is when compared to the highlight as written by the original article author. Zhong et al (2020) report a ROUGE-1 score of 44.41 when testing a model trained for extractive summarization. See the Papers With Code leaderboard for more models.

languages

The BCP-47 code for English as generally spoken in the United States is en-US and the BCP-47 code for English as generally spoken in the United Kingdom is en-GB. It is unknown if other varieties of English are represented in the data.

dataset-structure

data-instances

For each instance, there is a string for the article, a string for the highlights, and a string for the id. See the CNN / Daily Mail dataset viewer to explore more examples.

{'id': '0054d6d30dbcad772e20b22771153a2a9cbeaf62', 'article': '(CNN) -- An American woman died aboard a cruise ship that docked at Rio de Janeiro on Tuesday, the same ship on which 86 passengers previously fell ill, according to the state-run Brazilian news agency, Agencia Brasil. The American tourist died aboard the MS Veendam, owned by cruise operator Holland America. Federal Police told Agencia Brasil that forensic doctors were investigating her death. The ship's doctors told police that the woman was elderly and suffered from diabetes and hypertension, according the agency. The other passengers came down with diarrhea prior to her death during an earlier part of the trip, the ship's doctors said. The Veendam left New York 36 days ago for a South America tour.' 'highlights': 'The elderly woman suffered from diabetes and hypertension, ship's doctors say . Previously, 86 passengers had fallen ill on the ship, Agencia Brasil says .'}

The average token count for the articles and the highlights are provided below:

Feature Mean Token Count
Article 781
Highlights 56

data-fields

id: a string containing the heximal formated SHA1 hash of the url where the story was retrieved from

article: a string containing the body of the news article

highlights: a string containing the highlight of the article as written by the article author

data-splits

The CNN/DailyMail dataset has 3 splits: train, validation, and test. Below are the statistics for Version 3.0.0 of the dataset.

Dataset Split Number of Instances in Split
Train 287,113
Validation 13,368
Test 11,490

dataset-creation

curation-rationale

Version 1.0.0 aimed to support supervised neural methodologies for machine reading and question answering with a large amount of real natural language training data and released about 313k unique articles and nearly 1M Cloze style questions to go with the articles. Versions 2.0.0 and 3.0.0 changed the structure of the dataset to support summarization rather than question answering. Version 3.0.0 provided a non-anonymized version of the data, whereas both the previous versions were preprocessed to replace named entities with unique identifier labels.

source-data

initial-data-collection-and-normalization

The data consists of news articles and...
Illegal Dumpsites
kaggle.com
zip
Updated Oct 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
satya phani (2022). Illegal Dumpsites [Dataset]. https://www.kaggle.com/datasets/phanipulagala619/illegal-dumpsites
Explore at:
zip(7978389 bytes)Available download formats
Dataset updated
Oct 4, 2022
Authors
satya phani
Description
The problem

Each year we produce more and more waste. Dumps are often found in places without an address, without an easy way to report them, so getting rid of them can be next to impossible. A small part of the waste gets recycled but a huge amount of trash still ends up on illegal dumps which are everywhere, namely in our cities, nature, rivers, and oceans. There were 55,000 reports of illegal dumping made in 110 countries.

Every day, an average of 1 kilogram of waste is generated per person around the world, which is 2.7 Billion tonnes of waste every year. This is enough waste to fill 285,000 trucks. If we were to put them in a row, the line would go from New York to London.

Illegal dumping and health

In addition to economical and ecological damage, illegal dumping can have detrimental health effects for people that are living nearby. Dumpsites are a breeding ground for insects like mosquitoes and flies, but also for animals that carry diseases like rats, skunks, and opossums.

Depending on the country, a few of the life-threatening diseases that these insects and animals can bring Dengue Fever, Yellow Fever, Encephalitis, and malaria. Also, living in a community that has visible dumpsites could wear on mental health.

Datasets are extracted from:

TrashOut: Reports on illegal dumping (s) provided by users through the TrashOut mobile App. For each report, a number of features are recorded, and the most relevant for this analysis were: location (latitude and longitude, city, country, and continent), date, picture, size, and type of waste. Open Street Maps (OSM): Geospatial dataset and information on the cities road network, including the type of roads (e.g. motorway, primary, residential, etc) Socioeconomic Data and Applications Center (SEDAC): Population density at 1km grid, from which we also calculated the population density gradient to account for population density in the neighboring cells FourSquare: Information about nearby venues World Bank Indicators, World Bank’s “What a Waste 2.0”, Eurostat, European Commission Directorate-General for Environment: Datasets for socio-economic indicators. Non-dumpsites Control Dataset: we generated our own Control Dataset, which was required to train the model on where dumpsites do not occur. For every TrashOut dumpsite location, we selected a pseudo-random location 1 km away and assigned this as a potential non-dumpsite location.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Feature	Mean Token Count
Article	781
Highlights	56

Dataset Split	Number of Instances in Split
Train	287,113
Validation	13,368
Test	11,490

Facebook

Twitter

Click to copy link

Link copied

Cite

New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

Coronavirus (Covid-19) Data in the United States

Explore at:

Dataset provided by

New York Times

Description

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Clear search

Close search

Google apps

Main menu

Coronavirus (Covid-19) Data in the United States

Explore Bike Share Data

1 Popular times of travel (i.e., occurs most often in the start time)

2 Popular stations and trip

3 Trip duration

4 User info

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

Westchester Medical Center

For Hire Vehicles (FHV) - Active

Bike share systems for three major cities

Coronavirus (Covid-19) Data in the United States

Coronavirus (Covid-19) Data in the United States

Top 15 States by Estimated Number of Homeless People in 2024

Travellers to Canada from the United States by state of origin, top 15...

Simple Flight Scheduling Optimization Dataset

Problem Definition

CNN-DailyMail News Text Summarization

dataset-card-for-cnn-dailymail-dataset

dataset-summary

supported-tasks-and-leaderboards

languages

dataset-structure

data-instances

data-fields

data-splits

dataset-creation

curation-rationale

source-data

initial-data-collection-and-normalization

Illegal Dumpsites

Coronavirus (Covid-19) Data in the United States