Facebook
TwitterBy Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a structured and machine-readable register of all companies recorded by the Companies Registration Office (CRO) in Ireland. It includes a daily snapshot of company records, covering both currently registered companies and historical records of dissolved or closed entities. The dataset aligns with the European Union’s Open Data Directive (Directive (EU) 2019/1024) and the Implementing Regulation (EU) 2023/138, which designates company and company ownership data as a high-value dataset. Updated daily, it ensures timely access to corporate information and is available for bulk download and API access under the Creative Commons Attribution 4.0 (CC BY 4.0) licence, allowing unrestricted reuse with appropriate attribution. By increasing transparency, accountability, and economic innovation, this dataset supports public sector initiatives, research, and digital services development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a structured and machine-readable collection of financial statements filed with the Companies Registration Office (CRO) in Ireland. It currently includes financial statements for the year 2022, with additional years to be added as they become available. The dataset aligns with the European Union’s Open Data Directive (Directive (EU) 2019/1024) and the Implementing Regulation (EU) 2023/138, which designates company and company ownership data as a high-value dataset. It is available for bulk download and API access under the Creative Commons Attribution 4.0 (CC BY 4.0) licence, allowing unrestricted reuse with appropriate attribution. By increasing transparency and enabling data-driven insights, this dataset supports public sector initiatives, financial analysis, and digital services development. The API endpoints can be accessed using these links - Query - https://opendata.cro.ie/api/3/action/datastore_search Query (via SQL) - https://opendata.cro.ie/api/3/action/datastore_search_sql
Facebook
TwitterThis layer of data occasionally locates the head offices and operations of companies within the Walloon territory. In order to promote economic activities, a business database was set up in 2012. This "Companies" database is intended to highlight the skills and know-how of companies with at least one place of business in Wallonia. It is maintained by the Directorate for Business Networks (Service public de Wallonie Économie, Emploi, Recherche). This database lists companies active in different sectors, namely: - Industry; - Metal subcontracting; - Plastic-elastomeric subcontracting; - The environment; - Business services. This series consists of two layers of point-in-time spatial data for distinguishing the location of head offices and company headquarters. A head office is linked to a single head office. The latter may include one or more places of business. Each head office, whether corporate or operational, is the subject of a data sheet publicly accessible via the "Companies in Wallonia" web platform (see Associated resources). This sheet contains the name, address, legal form, contact information, etc. Some of the information in the sheet is also reflected as attributes of the data layers. The web platform also offers the possibility to perform simple or advanced searches in the "Companies" database. Data are updated monthly and MSDSs are reviewed annually. The "Enterprises" database is used by many economic players: business leaders, buyers, salespeople, banks, various associations, inter-municipal associations, public interest bodies, administrations but also students or people looking for a job.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The National Measurement Office provides a range of type approval services designed to enable manufacturers gain access to European and global markets for weighing and other regulated measuring instruments. Where a manufacture's instrument complies with the regulation, European Directives and International Recommendations (OIML), a certificate of conformity is issued. These data sets show which companies have been issued a certificate and which product it relates to.
Facebook
TwitterThe Office of Manufacturing and Energy Supply Chains is responsible for strengthening and securing manufacturing and energy supply chains needed to modernize the nation’s energy infrastructure and support a clean and equitable energy transition. The office is catalyzing the development of an energy sector industrial base through targeted investments that establish and secure domestic clean energy supply chains and manufacturing, and by engaging with private-sector companies, other Federal agencies, and key stakeholders to collect, analyze, respond to, and share data about energy supply chains to inform future decision making and investment. The office manages programs that develop clean domestic manufacturing and workforce capabilities, with an emphasis on opportunities for small and medium enterprises and communities in energy transition. The Office of Manufacturing and Energy Supply Chains coordinates closely with the Office of Clean Energy Demonstrations for the management of major demonstration projects, and across all of DOE’s programs on manufacturing and supply chain issues, including with the Advanced Manufacturing Office in the Office of Energy Efficiency and Renewable Energy.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Numbers of enterprises and local units produced from a snapshot of the Inter-Departmental Business Register (IDBR) taken on 14 March 2025.
Facebook
TwitterAI Generated Summary: This data portal serves as a central repository for publications and datasets, providing tools for publishing, sharing, finding, and utilizing data. Built on the CKAN open-source platform, it aims to make data accessible to the public, particularly for governmental and organizational data publishers. The portal facilitates data storage and provides robust data APIs. About: This site is a clearinghouse for all publications and datasets produced by the office. Be sure to contact us if you have any questions or comments about data published on the site. CKAN is the world’s leading open-source data portal platform. CKAN is a complete out-of-the-box software solution that makes data accessible and usable – by providing tools to streamline publishing, sharing, finding and using data (including storage of data and provision of robust data APIs). CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. CKAN is used by governments and user groups worldwide and powers a variety of official and community data portals including portals for local, national and international government, such as the UK’s data.gov.uk and the European Union’s publicdata.eu, the Brazilian dados.gov.br, Dutch and Netherland government portals, as well as city and municipal sites in the US, UK, Argentina, Finland and elsewhere. CKAN: http://ckan.org/ Original Text: This site is a clearinghouse for all publications and datasets produced by the office. Be sure to contact us if you have any questions or comments about data published on the site. CKAN is the world’s leading open-source data portal platform. CKAN is a complete out-of-the-box software solution that makes data accessible and usable – by providing tools to streamline publishing, sharing, finding and using data (including storage of data and provision of robust data APIs). CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. CKAN is used by governments and user groups worldwide and powers a variety of official and community data portals including portals for local, national and international government, such as the UK’s data.gov.uk and the European Union’s publicdata.eu, the Brazilian dados.gov.br, Dutch and Netherland government portals, as well as city and municipal sites in the US, UK, Argentina, Finland and elsewhere. CKAN: http://ckan.org/
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
NEW!: Use the new Business Account Number lookup tool.
SUMMARY This dataset includes the locations of businesses that pay taxes to the City and County of San Francisco. Each registered business may have multiple locations and each location is a single row. The Treasurer & Tax Collector’s Office collects this data through business registration applications, account update/closure forms, and taxpayer filings. Business locations marked as “Administratively Closed” have not filed or communicated with TTX for 3 years, or were marked as closed following a notification from another City and County Department.
The data is collected to help enforce the Business and Tax Regulations Code including, but not limited to: Article 6, Article 12, Article 12-A, and Article 12-A-1. http://sftreasurer.org/registration.
HOW TO USE THIS DATASET
To learn more about using this dataset watch this video. To update your listing or look up your BAN see this FAQ: Registered Business Locations Explainer
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.
All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.
Here is the data dictionary for (Indian) Shark Tank season's dataset.
Facebook
TwitterA proof of concept dataset, based on publically available Southeastern data, for finding the least busy train journey based on some day and time flexibility. I put this together for my personal benefit - as a person with autism, I often struggle with travel anxiety during my work commute, and wanted to find a way to minimise this distress. This concept could be developed further by adding data for all train journeys for the train provider, as well as any other companies providing similar data.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
~This dataset contains detailed information on 40 popular movies released between 1998 and 2025, curated for data scientists, researchers, and entertainment enthusiasts. It includes a wide range of genres such as Action, Comedy, Drama, Thriller, Animation, Romance, Science Fiction, Horror, and more, covering blockbuster franchises like Mission: Impossible, Despicable Me, and Godzilla, as well as critically acclaimed films like The Departed and indie hits like Talk to Me. The dataset is designed to support analyses of box office performance, genre trends, audience reception, and production insights, making it ideal for predictive modeling, visualization, and trend analysis in the 2025 entertainment landscape.
~The dataset comprises 40 rows, each representing a unique movie, with the following columns:
~**id:** Unique identifier for the movie. ~**title:** Movie title. ~**release_date:** Release date (YYYY-MM-DD). ~**genres:** Comma-separated list of genres (e.g., "Action,Thriller,Crime"). ~**budget:** Production budget in USD (0 if unavailable). ~**revenue:** Worldwide box office revenue in USD (0 if unavailable). ~**runtime:** Movie duration in minutes. ~**vote_average:** Average user rating (out of 10). ~**vote_count:** Number of user votes. ~**popularity:** Popularity score based on user engagement. ~**original_language:** Primary language of the movie (e.g., "en" for English, "ja" for Japanese). ~**production_countries:** Comma-separated list of countries involved in production. ~**production_companies:** Comma-separated list of production companies. ~**cast:** Comma-separated list of up to five lead actors. ~**director:** Director of the movie. ~**overview:** Brief summary of the movie's plot.
~**Completeness:** All 40 movies have complete data for most fields. Budget and revenue may be 0 where data is unavailable (e.g., for some Netflix releases). ~**Consistency:** Genres, countries, companies, and cast are formatted as comma-separated strings for ease of use. Special characters in overviews are preserved for accuracy. ~**Source:** Data is sourced from reliable entertainment databases, ensuring accuracy as of August 2025. ~**Usability:** Structured for Kaggle with clear column names and consistent formatting, targeting a usability score of 9.0+.
~**Box Office Analysis:** Explore correlations between budget, revenue, and vote_average to predict financial success. ~**Genre Trends:** Analyze genre popularity over time or across regions using production_countries and release_date. ~**Audience Reception:** Investigate relationships between vote_average, vote_count, and popularity for insights into viewer preferences. ~**Production Insights:** Study the impact of production companies or directors on movie performance. ~**Machine Learning:** Build models to predict revenue, vote_average, or popularity based on features like genres, cast, or runtime.
~This dataset is crafted to align with 2025 entertainment trends, offering a diverse mix of recent releases and iconic films. ~Its comprehensive fields and clean structure make it ideal for both beginners and advanced users on Kaggle. Whether you're visualizing box office trends, predicting audience ratings, or exploring global cinema patterns, this dataset provides a robust foundation for impactful analyses.
Data compiled from TMDB via api key and curated for Kaggle submission. Feedback is welcome to enhance future versions!
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Title: IMDB & TMDB Movie Metadata Big Dataset (>1M)
Subtitle: A Comprehensive Dataset Featuring Detailed Metadata of Movies (IMDB, TMDB). Over 1M Rows & 42 Features: Metadata, Ratings, Genres, Cast, Crew, Sentiment Analysis and many more...
Detailed Description:
Overview: This comprehensive dataset merges the extensive film data available from both IMDB and TMDB, offering a rich resource for movie enthusiasts, data scientists, and researchers. With over 1 million rows and 42 detailed features, this dataset provides in-depth information about a wide variety of movies, spanning different genres, periods, and production backgrounds.
File Information: 1. File Size: ≈ 1GB 2. Format: CSV (Comma-Separated Values)
Column Descriptors/Key Features: 1. ID: Unique identifier for each movie. 2. Title: The official title of the movie. 3. Vote Average: Average rating received by the movie. 4. Vote Count: Number of votes the movie has received. 5. Status: Current status of the movie (e.g., Released, Post-Production). 6. Release Date: Official release date of the movie. 7. Revenue: Box office revenue generated by the movie. 8. Runtime: Duration of the movie in minutes. 9. Adult: Indicates if the movie is for adults. 10. Genres: List of genres the movie belongs to. 11. Overview Sentiment: Sentiment analysis of the movie's overview text. 12. Cast: List of main actors in the movie. 13. Crew: List of key crew members, including directors, producers, and writers. 14. Genres List: Detailed genres in list format. 15. Keywords: List of relevant keywords associated with the movie. 16. Director of Photography: Name of the cinematographer. 17. Producers: Names of the producers. 18. Music Composer: Name of the music composer.
Additional Features:
Potential Use Cases: - Sentiment Analysis: Analyze audience sentiment towards movies based on reviews and ratings. - Recommendation Systems: Build models to recommend movies based on user preferences and viewing history. - Market Analysis: Study trends in the movie industry, including genre popularity and revenue patterns. - Content Analysis: Investigate the thematic content and diversity of movies over time. - Data Visualization: Create visual representations of movie data to uncover hidden insights.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterBy Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...