12 datasets found
  1. n

    Coronavirus (Covid-19) Data in the United States

    • nytimes.com
    • openicpsr.org
    • +4more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
    Explore at:
    Dataset provided by
    New York Times
    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  2. Explore Bike Share Data

    • kaggle.com
    zip
    Updated Jun 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaltout (2021). Explore Bike Share Data [Dataset]. https://www.kaggle.com/shaltout/explore-bike-share-data
    Explore at:
    zip(26232124 bytes)Available download formats
    Dataset updated
    Jun 3, 2021
    Authors
    Shaltout
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Bike Share Data Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

    Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

    In this project, you will use data provided by Motivate, a bike share system provider for many major cities in the United States, to uncover bike share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

    The Datasets Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

    Start Time (e.g., 2017-01-01 00:07:57) End Time (e.g., 2017-01-01 00:20:53) Trip Duration (in seconds - e.g., 776) Start Station (e.g., Broadway & Barry Ave) End Station (e.g., Sedgwick St & North Ave) User Type (Subscriber or Customer) The Chicago and New York City files also have the following two columns:

    Gender Birth Year

    Data for the first 10 rides in the new_york_city.csv file

    The original files are much larger and messier, and you don't need to download them, but they can be accessed here if you'd like to see them (Chicago, New York City, Washington). These files had more columns and they differed in format in many cases. Some data wrangling has been performed to condense these files to the above core six columns to make your analysis and the evaluation of your Python skills more straightforward. In the Data Wrangling course that comes later in the Data Analyst Nanodegree program, students learn how to wrangle the dirtiest, messiest datasets, so don't worry, you won't miss out on learning this important skill!

    Statistics Computed You will learn about bike share use in Chicago, New York City, and Washington by computing a variety of descriptive statistics. In this project, you'll write code to provide the following information:

    1 Popular times of travel (i.e., occurs most often in the start time)

    most common month most common day of week most common hour of day

    2 Popular stations and trip

    most common start station most common end station most common trip from start to end (i.e., most frequent combination of start station and end station)

    3 Trip duration

    total travel time average travel time

    4 User info

    counts of each user type counts of each gender (only available for NYC and Chicago) earliest, most recent, most common year of birth (only available for NYC and Chicago) The Files To answer these questions using Python, you will need to write a Python script. To help guide your work in this project, a template with helper code and comments is provided in a bikeshare.py file, and you will do your scripting in there also. You will need the three city dataset files too:

    chicago.csv new_york_city.csv washington.csv

    All four of these files are zipped up in the Bikeshare file in the resource tab in the sidebar on the left side of this page. You may download and open up that zip file to do your project work on your local machine.

  3. d

    Johns Hopkins COVID-19 Case Tracker

    • data.world
    • kaggle.com
    csv, zip
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 3, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  4. S

    Westchester Medical Center

    • health.data.ny.gov
    csv, xlsx, xml
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Department of Health (2025). Westchester Medical Center [Dataset]. https://health.data.ny.gov/Health/Westchester-Medical-Center/hsv3-bdmk
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Nov 22, 2025
    Authors
    New York State Department of Health
    Area covered
    Westchester County
    Description

    This data includes the name and location of active food service establishments and the violations that were found at the time of the inspection. Active food service establishments include only establishments that are currently operating. This dataset excludes inspections conducted in New York City (https://data.cityofnewyork.us/Health/Restaurant-Inspection-Results/4vkw-7nck), Suffolk County (http://apps.suffolkcountyny.gov/health/Restaurant/intro.html) and Erie County (http://www.healthspace.com/erieny). Inspections are a “snapshot” in time and are not always reflective of the day-to-day operations and overall condition of an establishment. Occasionally, remediation may not appear until the following month due to the timing of the updates. Update frequencies and availability of historical inspection data may vary from county to county. Some counties provide this information on their own websites and information found there may be updated more frequently. This dataset is refreshed on a monthly basis. The inspection data contained in this dataset was not collected in a manner intended for use as a restaurant grading system, and should not be construed or interpreted as such. Any use of this data to develop a restaurant grading system is not supported or endorsed by the New York State Department of Health. Historical inspection data through 2005 is also available. Inactive (closed) establishments can be found at: https://health.data.ny.gov/Health/Food-Service-Establishment-Inspections-Beginning-2/aaxz-j6pj. For more information, visit http://www.health.ny.gov/regulations/nycrr/title_10/part_14/subpart_14-1.htm or go to the “About” tab.

  5. For Hire Vehicles (FHV) - Active

    • data.cityofnewyork.us
    • s.cnmilf.com
    • +2more
    csv, xlsx, xml
    Updated Dec 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taxi and Limousine Commission (TLC) (2025). For Hire Vehicles (FHV) - Active [Dataset]. https://data.cityofnewyork.us/Transportation/For-Hire-Vehicles-FHV-Active/8wbx-tsch
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    New York City Taxi and Limousine Commissionhttp://www.nyc.gov/tlc
    Authors
    Taxi and Limousine Commission (TLC)
    Description

    PLEASE NOTE: This dataset, which includes all TLC licensed for-hire vehicles which are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_for_hire_vehicle_active_and_inactive.csv

    TLC authorized For-Hire vehicles that are active. This list is accurate to the date and time represented in the Last Date Updated and Last Time Updated fields. For inquiries about the contents of this dataset, please email licensinginquiries@tlc.nyc.gov.

  6. Bike share systems for three major cities

    • kaggle.com
    zip
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amgad mahrous (2023). Bike share systems for three major cities [Dataset]. https://www.kaggle.com/datasets/amgadmahrous/bike-share-systems-for-three-major-cities
    Explore at:
    zip(26232124 bytes)Available download formats
    Dataset updated
    Jan 31, 2023
    Authors
    amgad mahrous
    Description

    Bike Share Data Over the past decade, bicycle-sharing systems have been growing in number and popularity in cities across the world. Bicycle-sharing systems allow users to rent bicycles on a very short-term basis for a price. This allows people to borrow a bike from point A and return it at point B, though they can also return it to the same location if they'd like to just go for a ride. Regardless, each bike can serve several users per day.

    Thanks to the rise in information technologies, it is easy for a user of the system to access a dock within the system to unlock or return bicycles. These technologies also provide a wealth of data that can be used to explore how these bike-sharing systems are used.

    In this project, provided by udacity and you will use data provided by Motivate, a bike-share system provider for many major cities in the United States, to uncover bike-share usage patterns. You will compare the system usage between three large cities: Chicago, New York City, and Washington, DC.

    The Datasets Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core of six (6) columns:

    Start Time (e.g., 2017-01-01 00:07:57) End Time (e.g., 2017-01-01 00:20:53) Trip Duration (in seconds - e.g., 776) Start Station (e.g., Broadway & Barry Ave) End Station (e.g., Sedgwick St & North Ave) User Type (Subscriber or Customer) The Chicago and New York City files also have the following two columns:

    Gender Birth Year

  7. Coronavirus (Covid-19) Data in the United States

    • kaggle.com
    zip
    Updated Apr 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wing (2020). Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.kaggle.com/gniwnyc/nytimescovid19usdataset
    Explore at:
    zip(610420 bytes)Available download formats
    Dataset updated
    Apr 19, 2020
    Authors
    Wing
    Area covered
    United States
    Description

    Copyright 2020 by The New York Times Company

    Coronavirus (Covid-19) Data in the United States

    [ U.S. Data (Raw CSV) | U.S. State-Level Data (Raw CSV) | U.S. County-Level Data (Raw CSV) ]

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

    United States Data Data on cumulative coronavirus cases and deaths can be found in three files, one for each of these geographic levels: U.S., states and counties.

    Each row of data reports cumulative counts based on our best reporting up to the moment we publish an update. We do our best to revise earlier entries in the data when we receive new information. If a county is not listed for a date, then there were zero reported confirmed cases and deaths.

    State and county files contain FIPS codes, a standard geographic identifier, to make it easier for an analyst to combine this data with other data sets like a map file or population data.

    Download all the data or clone this repository by clicking the green "Clone or download" button above.

    U.S. National-Level Data The daily number of cases and deaths nationwide, including states, U.S. territories and the District of Columbia, can be found in the us.csv file. (Raw CSV file here.)

    date,cases,deaths 2020-01-21,1,0 ... State-Level Data State-level data can be found in the states.csv file. (Raw CSV file here.)

    date,state,fips,cases,deaths 2020-01-21,Washington,53,1,0 ... County-Level Data County-level data can be found in the counties.csv file. (Raw CSV file here.)

    date,county,state,fips,cases,deaths 2020-01-21,Snohomish,Washington,53061,1,0 ... In some cases, the geographies where cases are reported do not map to standard county boundaries. See the list of geographic exceptions for more detail on these.

    Methodology and Definitions The data is the product of dozens of journalists working across several time zones to monitor news conferences, analyze data releases and seek clarification from public officials on how they categorize cases.

    It is also a response to a fragmented American public health system in which overwhelmed public servants at the state, county and territorial level have sometimes struggled to report information accurately, consistently and speedily. On several occasions, officials have corrected information hours or days after first reporting it. At times, cases have disappeared from a local government database, or officials have moved a patient first identified in one state or county to another, often with no explanation. In those instances, which have become more common as the number of cases has grown, our team has made every effort to update the data to reflect the most current, accurate information while ensuring that every known case is counted.

    When the information is available, we count patients where they are being treated, not necessarily where they live.

    In most instances, the process of recording cases has been straightforward. But because of the patchwork of reporting methods for this data across more than 50 state and territorial governments and hundreds of local health departments, our journalists sometimes had to make difficult interpretations about how to count and record cases.

    For those reasons, our data will in some cases not exactly match with the information reported by states and counties. Those differences include these cases: When the federal government arranged flights to the United States for Americans exposed to the coronavirus in China and Japan, our team recorded those cases in the states where the patients subsequently were treated, even though local health departments generally did not. When a resident of Florida died in Los Angeles, we recorded her death as having occurred in California rather than Florida, though officials in Florida counted her case in their own records. And when officials in some states reported new cases without immediately identifying where the patients were being treated, we attempted to add informati...

  8. c

    Top 15 States by Estimated Number of Homeless People in 2024

    • consumershield.com
    csv
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ConsumerShield Research Team (2025). Top 15 States by Estimated Number of Homeless People in 2024 [Dataset]. https://www.consumershield.com/articles/how-many-homeless-us
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    ConsumerShield Research Team
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    The graph displays the top 15 states by an estimated number of homeless people in the United States for the year 2025. The x-axis represents U.S. states, while the y-axis shows the number of homeless individuals in each state. California has the highest homeless population with 187,084 individuals, followed by New York with 158,019, while Hawaii places last in this dataset with 11,637. This bar graph highlights significant differences across states, with some states like California and New York showing notably higher counts compared to others, indicating regional disparities in homelessness levels across the country.

  9. Travellers to Canada from the United States by state of origin, top 15...

    • www150.statcan.gc.ca
    • ouvert.canada.ca
    • +1more
    Updated Jan 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2016). Travellers to Canada from the United States by state of origin, top 15 states of origin [Dataset]. http://doi.org/10.25318/2410004001-eng
    Explore at:
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Government of Canadahttp://www.gg.ca/
    Area covered
    Canada
    Description

    This table contains 45 series, with data for years 2014 - 2014 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 item: Canada) State of origin (15 items: New York; Washington; Michigan; California; ...) Traveller characteristics (3 items: Trips; Nights; Spending in Canada).

  10. Simple Flight Scheduling Optimization Dataset

    • kaggle.com
    zip
    Updated Sep 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    agrover112 (2022). Simple Flight Scheduling Optimization Dataset [Dataset]. https://www.kaggle.com/datasets/agrover112/simple-flight-scheduling-optimization-dataset
    Explore at:
    zip(1208 bytes)Available download formats
    Dataset updated
    Sep 8, 2022
    Authors
    agrover112
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Dataset was introduced by Toby Seagaran in his book Programming Collective Intelligence.

    The columns are: Departure airport code, Arrival airport code , Time of Arrival(24h), Time of Departure(24h), Cost (USD)

    Problem Definition

    Planning a trip for a group of people from different locations all arriving at the same place is always a challenge, and it makes for an interesting optimization problem. In our situation group members are from all over the country and wish to meet up at a prticular location say New York. They will all arrive on the same day and leave on the same day, and they would like to share transportation to and from the airport. There are dozens of flights per day to New York from any of the family members’ locations, all leaving at different times.

    For more information and examples check out github.com/Agrover112/fliscopt/examples

  11. CNN-DailyMail News Text Summarization

    • kaggle.com
    zip
    Updated Oct 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gowri Shankar Penugonda (2021). CNN-DailyMail News Text Summarization [Dataset]. https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail/code
    Explore at:
    zip(527738644 bytes)Available download formats
    Dataset updated
    Oct 23, 2021
    Authors
    Gowri Shankar Penugonda
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    dataset-card-for-cnn-dailymail-dataset

    dataset-summary

    The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

    supported-tasks-and-leaderboards

    languages

    The BCP-47 code for English as generally spoken in the United States is en-US and the BCP-47 code for English as generally spoken in the United Kingdom is en-GB. It is unknown if other varieties of English are represented in the data.

    dataset-structure

    data-instances

    For each instance, there is a string for the article, a string for the highlights, and a string for the id. See the CNN / Daily Mail dataset viewer to explore more examples.

    {'id': '0054d6d30dbcad772e20b22771153a2a9cbeaf62',
     'article': '(CNN) -- An American woman died aboard a cruise ship that docked at Rio de Janeiro on Tuesday, the same ship on which 86 passengers previously fell ill, according to the state-run Brazilian news agency, Agencia Brasil. The American tourist died aboard the MS Veendam, owned by cruise operator Holland America. Federal Police told Agencia Brasil that forensic doctors were investigating her death. The ship's doctors told police that the woman was elderly and suffered from diabetes and hypertension, according the agency. The other passengers came down with diarrhea prior to her death during an earlier part of the trip, the ship's doctors said. The Veendam left New York 36 days ago for a South America tour.'
     'highlights': 'The elderly woman suffered from diabetes and hypertension, ship's doctors say .
    Previously, 86 passengers had fallen ill on the ship, Agencia Brasil says .'}
    

    The average token count for the articles and the highlights are provided below:

    FeatureMean Token Count
    Article781
    Highlights56

    data-fields

    • id: a string containing the heximal formated SHA1 hash of the url where the story was retrieved from
    • article: a string containing the body of the news article
    • highlights: a string containing the highlight of the article as written by the article author

    data-splits

    The CNN/DailyMail dataset has 3 splits: train, validation, and test. Below are the statistics for Version 3.0.0 of the dataset.

    Dataset SplitNumber of Instances in Split
    Train287,113
    Validation13,368
    Test11,490

    dataset-creation

    curation-rationale

    Version 1.0.0 aimed to support supervised neural methodologies for machine reading and question answering with a large amount of real natural language training data and released about 313k unique articles and nearly 1M Cloze style questions to go with the articles. Versions 2.0.0 and 3.0.0 changed the structure of the dataset to support summarization rather than question answering. Version 3.0.0 provided a non-anonymized version of the data, whereas both the previous versions were preprocessed to replace named entities with unique identifier labels.

    source-data

    initial-data-collection-and-normalization

    The data consists of news articles and...

  12. Illegal Dumpsites

    • kaggle.com
    zip
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    satya phani (2022). Illegal Dumpsites [Dataset]. https://www.kaggle.com/datasets/phanipulagala619/illegal-dumpsites
    Explore at:
    zip(7978389 bytes)Available download formats
    Dataset updated
    Oct 4, 2022
    Authors
    satya phani
    Description

    The problem

    Each year we produce more and more waste. Dumps are often found in places without an address, without an easy way to report them, so getting rid of them can be next to impossible. A small part of the waste gets recycled but a huge amount of trash still ends up on illegal dumps which are everywhere, namely in our cities, nature, rivers, and oceans. There were 55,000 reports of illegal dumping made in 110 countries.

    Every day, an average of 1 kilogram of waste is generated per person around the world, which is 2.7 Billion tonnes of waste every year. This is enough waste to fill 285,000 trucks. If we were to put them in a row, the line would go from New York to London.

    Illegal dumping and health

    In addition to economical and ecological damage, illegal dumping can have detrimental health effects for people that are living nearby. Dumpsites are a breeding ground for insects like mosquitoes and flies, but also for animals that carry diseases like rats, skunks, and opossums.

    Depending on the country, a few of the life-threatening diseases that these insects and animals can bring Dengue Fever, Yellow Fever, Encephalitis, and malaria. Also, living in a community that has visible dumpsites could wear on mental health.

    Datasets are extracted from:

    TrashOut: Reports on illegal dumping (s) provided by users through the TrashOut mobile App. For each report, a number of features are recorded, and the most relevant for this analysis were: location (latitude and longitude, city, country, and continent), date, picture, size, and type of waste.
    Open Street Maps (OSM): Geospatial dataset and information on the cities road network, including the type of roads (e.g. motorway, primary, residential, etc)
    Socioeconomic Data and Applications Center (SEDAC): Population density at 1km grid, from which we also calculated the population density gradient to account for population density in the neighboring cells
    FourSquare: Information about nearby venues
    World Bank Indicators, World Bank’s “What a Waste 2.0”, Eurostat, European Commission Directorate-General for Environment: Datasets for socio-economic indicators.
    Non-dumpsites Control Dataset: we generated our own Control Dataset, which was required to train the model on where dumpsites do not occur. For every TrashOut dumpsite location, we selected a pseudo-random location 1 km away and assigned this as a potential non-dumpsite location.
    
  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

Coronavirus (Covid-19) Data in the United States

Explore at:
Dataset provided by
New York Times
Description

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Search
Clear search
Close search
Google apps
Main menu