13 datasets found
  1. US Broadband Usage Across Counties

    • kaggle.com
    zip
    Updated Jan 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes/code
    Explore at:
    zip(46127 bytes)Available download formats
    Dataset updated
    Jan 6, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    US Broadband Usage Across Counties

    Utilizing Microsoft's Data to Estimate Access

    By Amber Thomas [source]

    About this dataset

    This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

    According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

    This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use the US Broadband Usage Dataset

    This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

    The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
    - Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
    - Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
    - Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

    Research Ideas

    • Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
    • Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
    • Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: broadband_data_2020October.csv

    Acknowledgements

    If you use this dataset in your research,...

  2. Company Records - Dataset - CRO

    • opendata.cro.ie
    Updated Dec 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cro.ie (2024). Company Records - Dataset - CRO [Dataset]. https://opendata.cro.ie/dataset/companies
    Explore at:
    Dataset updated
    Dec 1, 2024
    Dataset provided by
    Companies Registration Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a structured and machine-readable register of all companies recorded by the Companies Registration Office (CRO) in Ireland. It includes a daily snapshot of company records, covering both currently registered companies and historical records of dissolved or closed entities. The dataset aligns with the European Union’s Open Data Directive (Directive (EU) 2019/1024) and the Implementing Regulation (EU) 2023/138, which designates company and company ownership data as a high-value dataset. Updated daily, it ensures timely access to corporate information and is available for bulk download and API access under the Creative Commons Attribution 4.0 (CC BY 4.0) licence, allowing unrestricted reuse with appropriate attribution. By increasing transparency, accountability, and economic innovation, this dataset supports public sector initiatives, research, and digital services development.

  3. Financial Statements - Dataset - CRO

    • opendata.cro.ie
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cro.ie (2025). Financial Statements - Dataset - CRO [Dataset]. https://opendata.cro.ie/dataset/financial-statements
    Explore at:
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Companies Registration Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a structured and machine-readable collection of financial statements filed with the Companies Registration Office (CRO) in Ireland. It currently includes financial statements for the year 2022, with additional years to be added as they become available. The dataset aligns with the European Union’s Open Data Directive (Directive (EU) 2019/1024) and the Implementing Regulation (EU) 2023/138, which designates company and company ownership data as a high-value dataset. It is available for bulk download and API access under the Creative Commons Attribution 4.0 (CC BY 4.0) licence, allowing unrestricted reuse with appropriate attribution. By increasing transparency and enabling data-driven insights, this dataset supports public sector initiatives, financial analysis, and digital services development. The API endpoints can be accessed using these links - Query - https://opendata.cro.ie/api/3/action/datastore_search Query (via SQL) - https://opendata.cro.ie/api/3/action/datastore_search_sql

  4. g

    Location of companies in Wallonia - Series

    • gimi9.com
    • datasets.ai
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Location of companies in Wallonia - Series [Dataset]. https://gimi9.com/dataset/eu_https-geodata-wallonie-be-id-f279a447-8249-4205-9d07-c01a6a4b7e2e/
    Explore at:
    Dataset updated
    Feb 13, 2025
    Area covered
    Wallonia
    Description

    This layer of data occasionally locates the head offices and operations of companies within the Walloon territory. In order to promote economic activities, a business database was set up in 2012. This "Companies" database is intended to highlight the skills and know-how of companies with at least one place of business in Wallonia. It is maintained by the Directorate for Business Networks (Service public de Wallonie Économie, Emploi, Recherche). This database lists companies active in different sectors, namely: - Industry; - Metal subcontracting; - Plastic-elastomeric subcontracting; - The environment; - Business services. This series consists of two layers of point-in-time spatial data for distinguishing the location of head offices and company headquarters. A head office is linked to a single head office. The latter may include one or more places of business. Each head office, whether corporate or operational, is the subject of a data sheet publicly accessible via the "Companies in Wallonia" web platform (see Associated resources). This sheet contains the name, address, legal form, contact information, etc. Some of the information in the sheet is also reflected as attributes of the data layers. The web platform also offers the possibility to perform simple or advanced searches in the "Companies" database. Data are updated monthly and MSDSs are reviewed annually. The "Enterprises" database is used by many economic players: business leaders, buyers, salespeople, banks, various associations, inter-municipal associations, public interest bodies, administrations but also students or people looking for a job.

  5. Type Approval Certificates published by the National Measurement Office -...

    • ckan.publishing.service.gov.uk
    Updated Aug 30, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2013). Type Approval Certificates published by the National Measurement Office - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/type-approval-certificates-published-by-the-national-measurement-office
    Explore at:
    Dataset updated
    Aug 30, 2013
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The National Measurement Office provides a range of type approval services designed to enable manufacturers gain access to European and global markets for weighing and other regulated measuring instruments. Where a manufacture's instrument complies with the regulation, European Directives and International Recommendations (OIML), a certificate of conformity is issued. These data sets show which companies have been issued a certificate and which product it relates to.

  6. d

    Manufacturing and Energy Supply Chain

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated May 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Manufacturing and Energy Supply Chains (2025). Manufacturing and Energy Supply Chain [Dataset]. https://catalog.data.gov/dataset/manufacturing-and-energy-supply-chain
    Explore at:
    Dataset updated
    May 2, 2025
    Dataset provided by
    Office of Manufacturing and Energy Supply Chains
    Description

    The Office of Manufacturing and Energy Supply Chains is responsible for strengthening and securing manufacturing and energy supply chains needed to modernize the nation’s energy infrastructure and support a clean and equitable energy transition. The office is catalyzing the development of an energy sector industrial base through targeted investments that establish and secure domestic clean energy supply chains and manufacturing, and by engaging with private-sector companies, other Federal agencies, and key stakeholders to collect, analyze, respond to, and share data about energy supply chains to inform future decision making and investment. The office manages programs that develop clean domestic manufacturing and workforce capabilities, with an emphasis on opportunities for small and medium enterprises and communities in energy transition. The Office of Manufacturing and Energy Supply Chains coordinates closely with the Office of Clean Energy Demonstrations for the management of major demonstration projects, and across all of DOE’s programs on manufacturing and supply chain issues, including with the Advanced Manufacturing Office in the Office of Energy Efficiency and Renewable Energy.

  7. Data from: UK business: activity, size and location

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Sep 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). UK business: activity, size and location [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/business/activitysizeandlocation/datasets/ukbusinessactivitysizeandlocation
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 24, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    Numbers of enterprises and local units produced from a snapshot of the Inter-Departmental Business Register (IDBR) taken on 14 March 2025.

  8. c

    Office of the General Treasurer Data Portal

    • catalog.civicdataecosystem.org
    Updated Jun 24, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). Office of the General Treasurer Data Portal [Dataset]. https://catalog.civicdataecosystem.org/dataset/office-of-the-general-treasurer-data-portal
    Explore at:
    Dataset updated
    Jun 24, 2012
    Description

    AI Generated Summary: This data portal serves as a central repository for publications and datasets, providing tools for publishing, sharing, finding, and utilizing data. Built on the CKAN open-source platform, it aims to make data accessible to the public, particularly for governmental and organizational data publishers. The portal facilitates data storage and provides robust data APIs. About: This site is a clearinghouse for all publications and datasets produced by the office. Be sure to contact us if you have any questions or comments about data published on the site. CKAN is the world’s leading open-source data portal platform. CKAN is a complete out-of-the-box software solution that makes data accessible and usable – by providing tools to streamline publishing, sharing, finding and using data (including storage of data and provision of robust data APIs). CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. CKAN is used by governments and user groups worldwide and powers a variety of official and community data portals including portals for local, national and international government, such as the UK’s data.gov.uk and the European Union’s publicdata.eu, the Brazilian dados.gov.br, Dutch and Netherland government portals, as well as city and municipal sites in the US, UK, Argentina, Finland and elsewhere. CKAN: http://ckan.org/ Original Text: This site is a clearinghouse for all publications and datasets produced by the office. Be sure to contact us if you have any questions or comments about data published on the site. CKAN is the world’s leading open-source data portal platform. CKAN is a complete out-of-the-box software solution that makes data accessible and usable – by providing tools to streamline publishing, sharing, finding and using data (including storage of data and provision of robust data APIs). CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. CKAN is used by governments and user groups worldwide and powers a variety of official and community data portals including portals for local, national and international government, such as the UK’s data.gov.uk and the European Union’s publicdata.eu, the Brazilian dados.gov.br, Dutch and Netherland government portals, as well as city and municipal sites in the US, UK, Argentina, Finland and elsewhere. CKAN: http://ckan.org/

  9. D

    Registered Business Locations - San Francisco

    • data.sfgov.org
    • s.cnmilf.com
    • +2more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City and County of San Francisco (2025). Registered Business Locations - San Francisco [Dataset]. https://data.sfgov.org/widgets/g8m3-pdis
    Explore at:
    application/geo+json, kmz, kml, xml, xlsx, csvAvailable download formats
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    City and County of San Francisco
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Area covered
    San Francisco
    Description

    NEW!: Use the new Business Account Number lookup tool.

    SUMMARY This dataset includes the locations of businesses that pay taxes to the City and County of San Francisco. Each registered business may have multiple locations and each location is a single row. The Treasurer & Tax Collector’s Office collects this data through business registration applications, account update/closure forms, and taxpayer filings. Business locations marked as “Administratively Closed” have not filed or communicated with TTX for 3 years, or were marked as closed following a notification from another City and County Department.

    The data is collected to help enforce the Business and Tax Regulations Code including, but not limited to: Article 6, Article 12, Article 12-A, and Article 12-A-1. http://sftreasurer.org/registration.

    HOW TO USE THIS DATASET

  10. System migration in 2014: When the City transitioned to a new system in 2014, only active business accounts were migrated. As a result, any businesses that had already closed by that point were not included in the current dataset.
  11. 2018 account cleanup: In 2018, TTX did a major cleanup of dormant and unresponsive accounts and closed approximately 40,000 inactive businesses.

    To learn more about using this dataset watch this video. To update your listing or look up your BAN see this FAQ: Registered Business Locations Explainer

  • 🦈 Shark Tank India dataset 🇮🇳

    • kaggle.com
    zip
    Updated Oct 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
    Explore at:
    zip(45970 bytes)Available download formats
    Dataset updated
    Oct 5, 2025
    Authors
    Satya Thirumani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Shark Tank India Data set.

    Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

    All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

    Here is the data dictionary for (Indian) Shark Tank season's dataset.

    • Season Number - Season number
    • Startup Name - Company name or product name
    • Episode Number - Episode number within the season
    • Pitch Number - Overall pitch number
    • Season Start - Season first aired date
    • Season End - Season last aired date
    • Original Air Date - Episode original/first aired date, on OTT/TV
    • Episode Title - Episode title in SonyLiv
    • Anchor - Name of the episode presenter/host
    • Industry - Industry name or type
    • Business Description - Business Description
    • Company Website - Company Website URL
    • Started in - Year in which startup was started/incorporated
    • Number of Presenters - Number of presenters
    • Male Presenters - Number of male presenters
    • Female Presenters - Number of female presenters
    • Transgender Presenters - Number of transgender/LGBTQ presenters
    • Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no
    • Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old
    • Pitchers City - Presenter's town/city or place where company head office exists
    • Pitchers State - Indian state pitcher hails from or state where company head office exists
    • Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue
    • Monthly Sales - Total monthly sales, in lakhs
    • Gross Margin - Gross margin/profit of company, in percentages
    • Net Margin - Net margin/profit of company, in percentages
    • EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization
    • Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)
    • SKUs - Stock Keeping Units or number of varieties, at the time of pitch
    • Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch
    • Bootstrapped - Startup is bootstrapped or not (yes/no)
    • Part of Match off - Competition between two similar brands, pitched at same time
    • Original Ask Amount - Original Ask Amount, in lakhs INR
    • Original Offered Equity - Original Offered Equity, in percentages
    • Valuation Requested - Valuation Requested, in lakhs INR
    • Received Offer - Received offer or not, 1-received, 0-not received
    • Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected
    • Total Deal Amount - Total Deal Amount, in lakhs INR
    • Total Deal Equity - Total Deal Equity, in percentages
    • Total Deal Debt - Total Deal debt/loan amount, in lakhs INR
    • Debt Interest - Debt interest rate, in percentages
    • Deal Valuation - Deal Valuation, in lakhs INR
    • Number of sharks in deal - Number of sharks involved in deal
    • Deal has conditions - Deal has conditions or not? (yes or no)
    • Royalty Percentage - Royalty percentage, if it's royalty deal
    • Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs
    • Advisory Shares Equity - Deal with Advisory shares or equity, in percentages
    • Namita Investment Amount - Namita Investment Amount, in lakhs INR
    • Namita Investment Equity - Namita Investment Equity, in percentages
    • Namita Debt Amount - Namita Debt Amount, in lakhs INR
    • Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR
    • Vineeta Investment Equity - Vineeta Investment Equity, in percentages
    • Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR
    • Anupam Investment Amount - Anupam Investment Amount, in lakhs INR
    • Anupam Investment Equity - Anupam Investment Equity, in percentages
    • Anupam Debt Amount - Anupam Debt Amount, in lakhs INR
    • Aman Investment Amount - Aman Investment Amount, in lakhs INR
    • Aman Investment Equity - Aman Investment Equity, in percentages
    • Aman Debt Amount - Aman Debt Amount, in lakhs INR
    • Peyush Investment Amount - Peyush Investment Amount, in lakhs INR
    • Peyush Investment Equity - Peyush Investment Equity, in percentages
    • Peyush Debt Amount - Peyush Debt Amount, in lakhs INR
    • Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR
    • Ritesh Investment Equity - Ritesh Investment Equity, in percentages
    • Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR
    • Amit Investment Amount - Amit Investment Amount, in lakhs INR
    • Amit Investment Equity - Amit Investment Equity, in percentages
    • Amit Debt Amount - Amit Debt Amount, in lakhs INR
    • Guest Investment Amount - Guest Investment Amount, in lakhs INR
    • Guest Investment Equity - Guest Investment Equity, in percentages
    • Guest Debt Amount - Guest Debt Amount, in lakhs INR
    • Invested Guest Name - Name of the guest(s) who invested in deal
    • All Guest Names - Name of all guests, who are present in episode
    • Namita Present - Whether Namita present in episode or not
    • Vineeta Present - Whether Vineeta present in episode or not
    • Anupam ...
  • Train Passenger Density Analysis

    • kaggle.com
    zip
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lily-Rose Church (2024). Train Passenger Density Analysis [Dataset]. https://www.kaggle.com/datasets/lilyrosec/train-passenger-density-analysis
    Explore at:
    zip(5676 bytes)Available download formats
    Dataset updated
    Jun 12, 2024
    Authors
    Lily-Rose Church
    Description

    A proof of concept dataset, based on publically available Southeastern data, for finding the least busy train journey based on some day and time flexibility. I put this together for my personal benefit - as a person with autism, I often struggle with travel anxiety during my work commute, and wanted to find a way to minimise this distress. This concept could be developed further by adding data for all train journeys for the train provider, as well as any other companies providing similar data.

  • TMDB Movies Dataset 2025

    • kaggle.com
    zip
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Imaad Mahmood (2025). TMDB Movies Dataset 2025 [Dataset]. https://www.kaggle.com/datasets/imaadmahmood/tmdb-movies-dataset-2025
    Explore at:
    zip(10343 bytes)Available download formats
    Dataset updated
    Sep 23, 2025
    Authors
    Imaad Mahmood
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    2025 Movies Dataset

    Overview:

    ~This dataset contains detailed information on 40 popular movies released between 1998 and 2025, curated for data scientists, researchers, and entertainment enthusiasts. It includes a wide range of genres such as Action, Comedy, Drama, Thriller, Animation, Romance, Science Fiction, Horror, and more, covering blockbuster franchises like Mission: Impossible, Despicable Me, and Godzilla, as well as critically acclaimed films like The Departed and indie hits like Talk to Me. The dataset is designed to support analyses of box office performance, genre trends, audience reception, and production insights, making it ideal for predictive modeling, visualization, and trend analysis in the 2025 entertainment landscape.

    Content:

    ~The dataset comprises 40 rows, each representing a unique movie, with the following columns:

    ~**id:** Unique identifier for the movie. ~**title:** Movie title. ~**release_date:** Release date (YYYY-MM-DD). ~**genres:** Comma-separated list of genres (e.g., "Action,Thriller,Crime"). ~**budget:** Production budget in USD (0 if unavailable). ~**revenue:** Worldwide box office revenue in USD (0 if unavailable). ~**runtime:** Movie duration in minutes. ~**vote_average:** Average user rating (out of 10). ~**vote_count:** Number of user votes. ~**popularity:** Popularity score based on user engagement. ~**original_language:** Primary language of the movie (e.g., "en" for English, "ja" for Japanese). ~**production_countries:** Comma-separated list of countries involved in production. ~**production_companies:** Comma-separated list of production companies. ~**cast:** Comma-separated list of up to five lead actors. ~**director:** Director of the movie. ~**overview:** Brief summary of the movie's plot.

    Data Quality:

    ~**Completeness:** All 40 movies have complete data for most fields. Budget and revenue may be 0 where data is unavailable (e.g., for some Netflix releases). ~**Consistency:** Genres, countries, companies, and cast are formatted as comma-separated strings for ease of use. Special characters in overviews are preserved for accuracy. ~**Source:** Data is sourced from reliable entertainment databases, ensuring accuracy as of August 2025. ~**Usability:** Structured for Kaggle with clear column names and consistent formatting, targeting a usability score of 9.0+.

    Potential Use Cases:

    ~**Box Office Analysis:** Explore correlations between budget, revenue, and vote_average to predict financial success. ~**Genre Trends:** Analyze genre popularity over time or across regions using production_countries and release_date. ~**Audience Reception:** Investigate relationships between vote_average, vote_count, and popularity for insights into viewer preferences. ~**Production Insights:** Study the impact of production companies or directors on movie performance. ~**Machine Learning:** Build models to predict revenue, vote_average, or popularity based on features like genres, cast, or runtime.

    Why This Dataset?

    ~This dataset is crafted to align with 2025 entertainment trends, offering a diverse mix of recent releases and iconic films. ~Its comprehensive fields and clean structure make it ideal for both beginners and advanced users on Kaggle. Whether you're visualizing box office trends, predicting audience ratings, or exploring global cinema patterns, this dataset provides a robust foundation for impactful analyses.

    Acknowledgements:

    Data compiled from TMDB via api key and curated for Kaggle submission. Feedback is welcome to enhance future versions!

  • IMDB & TMDB Movie Metadata Big Dataset (over 1M)

    • kaggle.com
    zip
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubham Chandra (2024). IMDB & TMDB Movie Metadata Big Dataset (over 1M) [Dataset]. https://www.kaggle.com/datasets/shubhamchandra235/imdb-and-tmdb-movie-metadata-big-dataset-1m
    Explore at:
    zip(416807108 bytes)Available download formats
    Dataset updated
    Aug 5, 2024
    Authors
    Shubham Chandra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Title: IMDB & TMDB Movie Metadata Big Dataset (>1M)

    Subtitle: A Comprehensive Dataset Featuring Detailed Metadata of Movies (IMDB, TMDB). Over 1M Rows & 42 Features: Metadata, Ratings, Genres, Cast, Crew, Sentiment Analysis and many more...

    Detailed Description:

    Overview: This comprehensive dataset merges the extensive film data available from both IMDB and TMDB, offering a rich resource for movie enthusiasts, data scientists, and researchers. With over 1 million rows and 42 detailed features, this dataset provides in-depth information about a wide variety of movies, spanning different genres, periods, and production backgrounds.

    File Information: 1. File Size: ≈ 1GB 2. Format: CSV (Comma-Separated Values)

    Column Descriptors/Key Features: 1. ID: Unique identifier for each movie. 2. Title: The official title of the movie. 3. Vote Average: Average rating received by the movie. 4. Vote Count: Number of votes the movie has received. 5. Status: Current status of the movie (e.g., Released, Post-Production). 6. Release Date: Official release date of the movie. 7. Revenue: Box office revenue generated by the movie. 8. Runtime: Duration of the movie in minutes. 9. Adult: Indicates if the movie is for adults. 10. Genres: List of genres the movie belongs to. 11. Overview Sentiment: Sentiment analysis of the movie's overview text. 12. Cast: List of main actors in the movie. 13. Crew: List of key crew members, including directors, producers, and writers. 14. Genres List: Detailed genres in list format. 15. Keywords: List of relevant keywords associated with the movie. 16. Director of Photography: Name of the cinematographer. 17. Producers: Names of the producers. 18. Music Composer: Name of the music composer.

    Additional Features:

    1. Unnamed 0: Index column.
    2. Star1, Star2, Star3, Star4: Names of the top-billed stars.
    3. Writer: Name(s) of the writer(s).
    4. Original Language: Original language of the movie.
    5. Original Title: Original title if different from the main title.
    6. Popularity: Popularity score of the movie.
    7. Budget: Budget allocated for the movie.
    8. Tagline: Promotional tagline of the movie.
    9. Production Companies: Companies involved in the production.
    10. Production Countries: Countries where the movie was produced.
    11. Spoken Languages: Languages spoken in the movie.
    12. Homepage: Official website of the movie.
    13. IMDB ID: Unique identifier on IMDB.
    14. TMDB ID: Unique identifier on TMDB.
    15. Video: Indicates if there is a video associated.
    16. Poster Path: Path to the movie poster image.
    17. Backdrop Path: Path to the backdrop image.
    18. Release Year: Year the movie was released.
    19. Collection Name: Name of the collection the movie belongs to.
    20. Collection ID: Unique identifier for the collection.
    21. Genres ID: Unique identifier for the genres.
    22. Original Language Code: Code for the original language.
    23. Overview: Brief summary of the movie.
    24. All Combined Keywords: Combined keywords in a single field.

    Potential Use Cases: - Sentiment Analysis: Analyze audience sentiment towards movies based on reviews and ratings. - Recommendation Systems: Build models to recommend movies based on user preferences and viewing history. - Market Analysis: Study trends in the movie industry, including genre popularity and revenue patterns. - Content Analysis: Investigate the thematic content and diversity of movies over time. - Data Visualization: Create visual representations of movie data to uncover hidden insights.

  • Not seeing a result you expected?
    Learn how you can add new datasets to our index.

  • Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes/code
    Organization logo

    US Broadband Usage Across Counties

    Utilizing Microsoft's Data to Estimate Access

    Explore at:
    zip(46127 bytes)Available download formats
    Dataset updated
    Jan 6, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    US Broadband Usage Across Counties

    Utilizing Microsoft's Data to Estimate Access

    By Amber Thomas [source]

    About this dataset

    This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

    According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

    This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use the US Broadband Usage Dataset

    This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

    The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
    - Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
    - Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
    - Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

    Research Ideas

    • Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
    • Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
    • Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: broadband_data_2020October.csv

    Acknowledgements

    If you use this dataset in your research,...

    Search
    Clear search
    Close search
    Google apps
    Main menu