https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
Various population statistics, including structured demographics data.
https://data.gov.tw/licensehttps://data.gov.tw/license
Miaoli County's 110-year tax-free vehicle statistics table
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 2022-2023 regular season NBA player stats per game. Note that there are duplicate player names resulted from team changes.
+500 rows and 30 columns. Columns' description are listed below.
Data from Basketball Reference. Image from Clutch Points.
If you're reading this, please upvote.
The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.
The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).
The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.
A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.
Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
This dataset contains a single CSV file with lifetime statistics for NBA players. The data includes various box score stats and personal information for each player's career.
Data Fields
The CSV file contains the following columns:
FULL_NAME: The player's full name AST: Total career assists BLK: Total career blocks DREB: Total career defensive rebounds FG3A: Total 3-point field goal attempts FG3M: Total 3-point field goals made FG3_PCT: 3-point field… See the full description on the dataset page: https://huggingface.co/datasets/Hatman/NBA-Player-Career-Stats.
http://1.data.gov.hk/en/terms-and-conditionshttp://1.data.gov.hk/en/terms-and-conditions
Please visit https://www.censtatd.gov.hk/en/EIndexbySubject.html?scode=340&pcode=FA100070 for the historical issues, related publications, concept, methods, definitions of terms, and notes of this dataset. User can download, distribute and reproduce free of charge for both commercial and non-commercial purposes subject to the Terms and Conditions of Use as stipulated under DATA.GOV.HK.
Data tables containing aggregated information about vehicles in the UK are also available.
A number of changes were introduced to these data files in the 2022 release to help meet the needs of our users and to provide more detail.
Fuel type has been added to:
Historic UK data has been added to:
A new datafile has been added df_VEH0520.
We welcome any feedback on the structure of our data files, their usability, or any suggestions for improvements; please contact vehicles statistics.
CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).
When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.
df_VEH0120_GB: https://assets.publishing.service.gov.uk/media/68494aca74fe8fe0cbb4676c/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 58.1 MB)
Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0120_UK: https://assets.publishing.service.gov.uk/media/68494acb782e42a839d3a3ac/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 34.1 MB)
Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0160_GB: https://assets.publishing.service.gov.uk/media/68494ad774fe8fe0cbb4676d/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 24.8 MB)
Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
df_VEH0160_UK: https://assets.publishing.service.gov.uk/media/68494ad7aae47e0d6c06e078/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 8.26 MB)
Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
In order to keep the datafile df_VEH0124 to a reasonable size, it has been split into 2 halves; 1 covering makes starting with A to M, and the other covering makes starting with N to Z.
df_VEH0124_AM: <a class="govuk-link" href="https://assets.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.
The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.
The dataset was created using the github repository above. As input data, three public datasets where used.
Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.
Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.
Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.
All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above
The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.
The languages that are going to be considered for the project can be seen here:
- Python
- C
- C++
- Java
- C#
- JavaScript
- PHP
- SQL
- Assembly
- Scratch
- Fortran
- Go
- Kotlin
- Delphi
- Swift
- Rust
- Ruby
- R
- COBOL
- F#
- Perl
- TypeScript
- Haskell
- Scala
This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.
TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.
Thanks go out to
- stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.
- the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.
- Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.
TEMPO-Online provides the following functions and services: Free access to statistical information.Export of tables in .csv and .xls formats and its printing. What is the content of TEMPO-Online? The National Institute of Statistics offers a statistical database, TEMPO-Online, that gives the possibility to access a large range of information.The content of the above-mentioned database consists of:Approximately 1100 statistical indicators, divided in socio-economical fields and sub-fields; Metadata associated to the statistical indicators (definition, starting and ending year of the time series, the last period of data loading, statistical methodology, the last updating); Detailed indicators at statistical characteristics group and/or sub-group level ( ex. The total number of employees at the end of the year by employee category, activities of the national economy - sections, sexes, areas and counties); Time series starting with 1990 - till today: With a monthly, quarterly, semi-annual and annual frequency; At national level, development region level, county and commune level. Search according to key words The search key words allows the finding of various objects (tables with statistical variables divided on time series). The search will give back results based on the matrix code and on the key words in the title or in the definition of a matrix. The result of the search will show on a list with specific objects. For a key word, one can use the searching section from the menu bar on the left.Tables As a whole, the tables that result following an interrogation have a flexible structure. For instance, the user may select the variables and attributes with the help of the interrogation interface, according to his needs.The user can save the table that results following an interrogation in .csv and .xls formats and its printingNote: in order to access tables at place level (very large), the user has to select each county with the respective places, so that the access be faster and avoid technical blocks.
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
This dataset consists of the Premier League team stats for seasons 2022/2023, 2021/2022 and 2022/2021. The data was scraped from fbref.com and formatted into a csv file.
Columns:
date = Date of the match time = Kick-off time of the match comp = Competition of the match (i.e English Premier League) round = The match week the match took place on day = The day the match took place on (i.e Monday, Tuesday etc) venue = Whether team was Home, Away or Neutral venue result = Whether the team Won, Lost or Drew (W, L, D) gf = How many goals the team scored ga = How many goals the team conceded opponent = Who the team faced that day xg = Expected goals xa = Expected goals allowed poss = Possession attendance = How many people attended the match captain = Captain of the team for match formation = Formation the team used for match referee = The referee for the match match report = Please ignore notes = Please ignore sh = Shots total sot = Shots on target dist = average distance by shot fk = shots from free kicks pk = Penalty kicks made pkatt= Penalty kicks attempted season = The year the season took place (i.e for 2022/2023 season year would be 2023) team = The team the stats belong to (i.e Manchester City)
This dataset is for basic data analysis. Student Statisticians or Data-Analysists (like myself) could use this as a basic learning point. Even ML students could predict future prices and speeds of computers.
Unfortunately, this dataset doesn't come with dates. (which are a pain to work with anyway), But the computers are in order from earliest to latest.
I will be uploading another version with this and a more detailed CSV that has the computer name, date, and other stats. This dataset is free to use for any purpose.
This is simply to gain understanding in analyzing data. At least for me.
price, speed, hd, ram, screen, cd, multi, premium, ads, trend
The largest computer CSV? Maybe? Maybe im scrapping it right now? Who knows? ;)
http://data.gov.hk/en/terms-and-conditionshttp://data.gov.hk/en/terms-and-conditions
Please visit https://www.censtatd.gov.hk/en/EIndexbySubject.html?scode=180&pcode=B1130303 for the historical issues, related publications, concept, methods, definitions of terms, and notes of this dataset. User can download, distribute and reproduce free of charge for both commercial and non-commercial purposes subject to the Terms and Conditions of Use as stipulated under DATA.GOV.HK.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Basque Youth Observatory is an instrument of the Basque Government that allows to have a global and permanent vision of the situation and evolution of the youth world that allows to evaluate the impact of the actions carried out in the CAPV by the different administrations in the field of youth.The Basque Youth Observatory regularly publishes more than 100 statistical indicators that can be consulted in euskadi.eus, along with other research and reports. Statistics are provided in various formats (csv, excel).
https://data.gov.tw/licensehttps://data.gov.tw/license
Provide "Statistics of Import and Export Trade Volume of Each Park" to let the public understand the import and export and its growth trend of each park. In addition to updating this information every month, CSV file format is also provided for free download and use by the public.The dataset includes statistics on the import and export trade volume of parks such as Nanzih, Kaohsiung, Taichung, Zhonggang, Pingtung, and other parks (Lingguang, Chenggong, Gaoruan), with main fields including "Park, Import and Export (This Month, Year-to-Date)", "Export (This Month, Year-to-Date)", "Import (This Month, Year-to-Date)", and other important information.
http://data.gov.hk/en/terms-and-conditionshttp://data.gov.hk/en/terms-and-conditions
Please visit https://www.censtatd.gov.hk/en/EIndexbySubject.html?scode=240&pcode=FA100034 for the historical issues, related publications, concept, methods, definitions of terms, and notes of this dataset. User can download, distribute and reproduce free of charge for both commercial and non-commercial purposes subject to the Terms and Conditions of Use as stipulated under DATA.GOV.HK.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Basque Youth Observatory is an instrument of the Basque Government that allows to have a global and permanent vision of the situation and evolution of the youth world that allows to evaluate the impact of the actions carried out in the CAPV by the different administrations in the field of youth.The Basque Youth Observatory regularly publishes more than 100 statistical indicators that can be consulted in euskadi.eus, along with other research and reports. Statistics are provided in various formats (csv, excel).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example data to understand the implementation of K Means
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains OTT + Video Streaming Platforms - Revenue and User Stats 2011-21
https://empireweekly.com/wp-content/uploads/2022/01/fortuneindia_2020-11_9cc704de-6f70-4a3f-b3e2-92991dfb24e3_netflix.jpeg" alt="">
OTT stands for “over-the-top,” which refers to any TV or video content that's streamed over the internet. This includes any web or app-based streaming service, like Netflix, YouTube, Disney Plus and many more. There's a wide range of OTT platforms, including Netflix, Disney+, Hulu, Amazon Prime Video, Hulu, Peacock, CuriosityStream, Pluto TV, and so many more. Unlike OTT platforms, YouTube is a social video platform that was originally designed to allow everyday consumers to share moments caught on video. YouTube has attempted to enter the OTT market a number of times with limited success, since the market clearly sees YouTube as a place for free content.
| | File | File Type | | -- | ---------------------------- | --------- | | 1 | LibrarySize.csv | CSV file | | 2 | MinuteSharing.csv | CSV file | | 3 | AppUsage.csv | CSV file | | 4 | NumSubscribers.csv | CSV file | | 5 | Revenue.csv | CSV file | | 6 | Revenue.csv | CSV file | | 7 | AdRevenue.csv | CSV file | | 8 | LiveTVSubscribers.csv | CSV file | | 9 | NumSubscribers.csv | CSV file | | 10 | Profit.csv | CSV file | | 11 | Revenue.csv | CSV file | | 12 | SubscriptionRevenue.csv | CSV file | | 13 | Valuation.csv | CSV file | | 14 | ContentSpend.csv | CSV file | | 15 | NumSubscribers.csv | CSV file | | 16 | NumSubscribersByRegion.csv | CSV file | | 17 | Profit.csv | CSV file | | 18 | Revenue.csv | CSV file | | 19 | RevenueByRegion.csv | CSV file | | 20 | Revenue.csv | CSV file | | 21 | Users.csv | CSV file | | 22 | AdRevenue.csv | CSV file | | 23 | ConcurrentViewers.csv | CSV file | | 24 | HoursWatched.csv | CSV file | | 25 | MostViewedGamesOnTwitch.csv | CSV file | | 26 | Revenue.csv | CSV file | | 27 | TwitchAgeDemographics.csv | CSV file | | 28 | TwitchGenderDemographics.csv | CSV file | | 29 | TwitchStreamers.csv | CSV file | | 30 | AppUsage.csv | CSV file | | 31 | NumSubscribers.csv | CSV file | | 32 | Revenue.csv | CSV file | | 33 | AppUsage.csv | CSV file | | 34 | NumSubscribers.csv | CSV file | | 35 | Revenue.csv | CSV file | | 36 | TopPlatforms.csv | CSV file | | 37 | PremiumSubscribers.csv | CSV file | | 38 | Revenue.csv | CSV file | | 39 | Users.csv | CSV file |
!kaggle datasets download -d azminetoushikwasi/ott-video-streaming-platforms-revenue-and-users
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.