Facebook
TwitterThe FCC political ads public inspection files dataset contains political ad file information that broadcast stations have uploaded to their public inspection files, which are housed on the FCC website. This data includes all political ad files that have been provided by TV and radio broadcast stations, which dates back to 2012 when the FCC started requiring digital uploads of files to its website. Broadcasters are required to maintain this data in their public inspection files for two years, after which the stations are permitted to remove them from the FCC website. This information is uploaded to the FCC’s website in PDF form and not machine-readable. However, this dataset includes a content_info table that contains manual annotations of some data fields like advertiser, gross spend, ad air dates and a link to a copy of the PDF, which can be found on Google Cloud Storage. The manual annotations, which are included only for a subset of the PDFs, come from either ProPublica’s Free the Files effort or from Google and are an experimental dataset. This dataset is a work in progress, with additional PDFs continually annotated. All tables in this dataset are updated monthly. For more information about the dataset, visit the FCC website. To provide feedback on this dataset, please contact padl-feedback@googlegroups.com This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterCyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.
Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.
Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.
Identify How It's Organized
Sort and Filter The Data and Determine The Credibility of The Data
Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.
--no misspelling in either member or casual. This ensures that all results will not have missing information.
SELECT
DISTINCT member_casual
FROM
table
--This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type
--Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table
--Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table
Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery
select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17
--This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.
US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations
"US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.
Banner photo by João Silas on Unsplash
Facebook
TwitterThis data set is a subset of the Google BigQuery public datasets - Nyc yellow taxi cab trips data set containing a random 10,000,000 rows of data.
The data has not been cleaned or altered in any way before uploading to Kaggle. I left this up to the notebook creator to accomplish on their own.
This data is not going to be updated in any way in the future and will remain "as-is"
This data was pulled at random using "ORDER BY RAND() LIMIT 10,000,000"
This dataset was extracted and uploaded for the purpose of experimenting with and learning regression models for price prediction. There is also a lot of room for data cleaning, outliers in the data, and plenty of data to work with for more realistic model training, testing, and validation.
| column | type | nullable | description |
|---|---|---|---|
| vendor_id | text | required | A code indicating the TPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc |
| pickup_datetime | datetime | nullable | The date and time when the meter was engaged. |
| dropoff_datetime | datetime | nullable | The date and time when the meter was disengaged. |
| passenger_count | integer | nullable | The number of passengers in the vehicle. This is a driver-entered value |
| trip_distance | numeric | nullable | The elapsed trip distance in miles reported by the taximeter. |
| rate_code | string | nullable | The final rate code in effect at the end of the trip. 1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester 5=Negotiated fare 6=Group ride |
| store_and_fwd_flag | string | nullable | This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip |
| payment_type | string | nullable | A numeric code signifying how the passenger paid for the trip. 1= Credit card 2= Cash 3= No charge 4= Dispute 5= Unknown 6= Voided trip |
| fare_amount | numeric | nullable | The time-and-distance fare calculated by the meter |
| extra | numeric | nullable | Miscellaneous extras and surcharges. Currently, this only includes the \$0.50 and \$1 rush hour and overnight charges. |
| mta_tax | numeric | nullable | \$0.50 MTA tax that is automatically triggered based on the metered rate in use |
| tip_amount | numeric | nullable | Tip amount – This field is automatically populated for credit card tips. Cash tips are not included |
| tolls_amount | numeric | nullable | Total amount of all tolls paid in the trip. |
| imp_surcharge | numeric | nullable | \$0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015. |
| total_amount | numeric | nullable | The total amount charged to passengers. Does not include cash tips |
| pickup_location_id | string | nullable | TLC Taxi Zone in which the taximeter was engaged |
| dropoff_location_id | string | nullable | TLC Taxi Zone in which the taximeter was disengaged |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe FCC political ads public inspection files dataset contains political ad file information that broadcast stations have uploaded to their public inspection files, which are housed on the FCC website. This data includes all political ad files that have been provided by TV and radio broadcast stations, which dates back to 2012 when the FCC started requiring digital uploads of files to its website. Broadcasters are required to maintain this data in their public inspection files for two years, after which the stations are permitted to remove them from the FCC website. This information is uploaded to the FCC’s website in PDF form and not machine-readable. However, this dataset includes a content_info table that contains manual annotations of some data fields like advertiser, gross spend, ad air dates and a link to a copy of the PDF, which can be found on Google Cloud Storage. The manual annotations, which are included only for a subset of the PDFs, come from either ProPublica’s Free the Files effort or from Google and are an experimental dataset. This dataset is a work in progress, with additional PDFs continually annotated. All tables in this dataset are updated monthly. For more information about the dataset, visit the FCC website. To provide feedback on this dataset, please contact padl-feedback@googlegroups.com This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .