Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 2021-2022 football player stats per 90 minutes. Only players of Premier League, Ligue 1, Bundesliga, Serie A and La Liga are listed.
+2500 rows and 143 columns. Columns' description are listed below.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis.
Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.
The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:
The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis.
This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Welcome to the NBA Statistics Repository for teams and players. This repository contains a rich and diverse dataset spanning from 1996 to 2023, drawn from NBA game statistics. It's ideal for data analysts, basketball fans, researchers, and anyone interested in the detailed numbers behind the sport.
This repository contains a series of CSV files detailing the performances of teams and players from 1996 to 2023. A list of these files is provided below:
player_index.csv: An index of all players with general information.player_stats_advanced_po.csv and player_stats_advanced_rs.csv: Advanced statistics for players during playoffs (po) and regular season (rs).player_stats_defense_po.csv and player_stats_defense_rs.csv: Defensive statistics for players during the playoffs and regular season.player_stats_misc_po.csv and player_stats_misc_rs.csv: Miscellaneous player statistics for the playoffs and regular season.player_stats_scoring_po.csv and player_stats_scoring_rs.csv: Scoring statistics for players during the playoffs and regular season.player_stats_traditional_po.csv and player_stats_traditionnal_rs.csv: Traditional player statistics during the playoffs and regular season.player_stats_usage_po.csv and player_stats_usage_rs.csv: Player usage statistics during the playoffs and regular season.team_stats_advanced_po.csv and team_stats_advanced_rs.csv: Advanced team statistics during the playoffs and regular season.team_stats_defense_po.csv and team_stats_defense_rs.csv: Defensive team statistics during the playoffs and regular season.team_stats_four_factors_po.csv and team_stats_four_factors_rs.csv: Four factors team statistics during the playoffs and regular season.team_stats_misc_po.csv and team_stats_misc_rs.csv: Miscellaneous team statistics during the playoffs and regular season.team_stats_opponent_po.csv and team_stats_opponent_rs.csv: Team opponent statistics during the playoffs and regular season.team_stats_scoring_po.csv and team_stats_scoring_rs.csv: Scoring team statistics during the playoffs and regular season.team_stats_traditional_po.csv and team_stats_traditional_rs.csv: Traditional team statistics during the playoffs and regular season.To use this data, simply clone this repository and use a software capable of reading CSV files, such as Excel, R, Python (with pandas), etc.
Contributions to this repo are welcome. If you have additional data to add or corrections to make, please feel free to open a pull request.
These data are released under the MIT License. See the LICENSE file for more information.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/CVJZHBhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/CVJZHB
This dataset represents a group of paper records (a "series") within the Harvard School of Public Health Harvard Prevention Research Center records, 1992-2003 (inclusive), 1994-2003 (bulk), which can be accessed on-site at the Center for the History of Medicine at the Francis A. Countway Library of Medicine in Boston, Massachusetts. The series consists of raw data surveys completed by middle and high school students in Boston, Massachusetts, during the Harvard Prevention Research Center's Play Across Boston study. Surveys concern youth physical activity and participation in community physical fitness programs. The bulk of the surveys were administered during the 2002-2003 school year. An earlier version of the survey (omitting a question regarding neighborhood of residence) was administered at two schools during the 2001-2002 school year. The series also includes student lists and parental consent forms for student participation. Survey topics include: students' demographics; height and weight; health or physical conditions that hinder physical activity; neighborhood of residence; weekly time spent engaged in physical activity; participation in sports teams, lessons, camps, or other organized physical activities; issues that hinder participation in physical activity; types of locations and facilities visited to engage in physical activity; availability of physical activities near home; proximity of home to physical fitness locations and facilities; proximity of home to retail locations selling healthy or unhealthy foods; methods of transportation to and from school; participation in physical education classes; students' self-assessed sports ability; television viewing habits; parents' education and exercise habits; whether a doctor or nurse discussed health, nutrition, and exercise habits at a recent physical examination; and the likelihood of changing health habits as a result of nutrition and exercise advice from a doctor or nurse. Attached to the “Student Surveys, 2002-2003” dataset are electronic dataset files created from the original paper surveys using Survey Monkey. Because the complete dataset includes identifying information for subjects, only summary data is provided. To illustrate potential analysis that can be conducted using the complete dataset, a filtered summary dataset is also provided comparing responses for subjects from three geographically diverse neighborhoods with high response rates (Roxbury, Allston-Brighton, and Charlestown). The summary dataset and the comparison summary dataset are both provided in three filetype choices: Microsoft Excel (XLS), portable document format (PDF), and Microsoft Powerpoint (PPT). In addition to these files, a methodology is also included explaining the process used by the archivist to convert the paper surveys into a digital dataset. Additional data and associated records are accessible onsite at the Center for the History of Medicine per the conditions governing access described below. Conditions Governing Access to the Complete Electronic Dataset: Researchers may apply for access to the complete dataset. Please contact the Center for History of Medicine's Public Services for more information. Conditions Governing Access to Original Collection Materials: The series represented by this dataset includes health information that is restricted for 80 years from the date of record creation. Researchers should contact Public Services for more information. The Harvard School of Public Health Harvard Prevention Research Center records were processed with grant funding from the Andrew W. Mellon Foundation, as awarded and administered by the Council on Library and Information Resources (CLIR) in 2016. View the Harvard Prevention Research Center Records finding aid for a full collection inventory of both paper and digital records, and for more information about accessing and using the collection.
Facebook
Twitterhttps://datos.madrid.es/egob/catalogo/aviso-legalhttps://datos.madrid.es/egob/catalogo/aviso-legal
List of urban furniture elements, which are approved for installation on public roads within the municipality of Madrid. The General Ordinance of Urban Furniture for the purposes of homologation, divides the municipal territory into three zones: Zone 1: The one delimited by the Cerca and Arrabal de Felipe II, the Historical Parks and gardens of interest defined in the PGOUM 1997, and the Paseo del Prado-Castellana axis, between the Glorieta de Atocha and the Plaza de San Juan de la Cruz. Zone 2: The one comprised by the rest of the Specific Planning Area (SPA) 00.01, as well as the SPAs of Historical Colonies and Historic Helmets of the Peripheral Districts, defined in the UN of the PGOUM 1997. Zone 3: Rest of urban land. Source: Geoportal City Council of Madrid. The dataset consists of several excel files, in order to facilitate your consultation in a simple and intuitive way: Municipal conservation furniture: It includes the municipal conservation elements of urbanization works, such as banks, bollards, bins, etc. Private conservation furniture: It includes elements of private conservation, except for the elements of terraces of candlesticks. Includes newsagents, flower kiosks, lottery kiosks, telephone booths, islets for carriage passages, etc. Elements of children's areas: It includes the elements of the games of the municipal conservation children's areas. Elements of older areas: It includes the elements of the areas for major municipal conservation. Elements of sports areas: Includes the exercise elements of municipal conservation sports areas. Other datasets related to street furniture are also available on this portal: Urban furniture. Games in areas activities of major urban furniture. Games in sports areas Urban furniture. Games in children's areas
Facebook
TwitterThe table below lists links to ad hoc statistical analyses on the Community Life Survey that have not been included in our standard publications.
https://assets.publishing.service.gov.uk/media/690cd9c3f5db1b22dad3e6cc/Community_Life_Survey_-_Estimates_on_Volunteering_trends_in_England.ods">Community Life Survey: Estimates on volunteering trends in England, 2013/14 to 2023/24 (ODS, 25.1 KB)
https://assets.publishing.service.gov.uk/media/67ae2b74e270ceae39f9e1c0/Community_Life_Survey_-_reasons_for_pride_in_local_area_2023_24.ods">Community Life Survey: Reasons for pride/lack of pride in local area by age group, 2023/24 (ODS, 15.1 KB)
https://assets.publishing.service.gov.uk/media/64771a925f7bb700127fa20c/Community_Life_Survey_-_Influencing_local_decisions.ods">Community Life Survey: Feeling able to influence decisions affecting the local area by citizenship and household income, 2019/20 to 2021/22 (ODS, 10.9 KB)
https://assets.publishing.service.gov.uk/media/6436be8c877741000c68d874/Community_Life_Survey_-_Strength_of_community_variables_by_Output_Area_Classifications_2017_18_to_2020_21.ods">Community Life Survey: Strength of community variables by Output Area Classification, 2017/18 to 2020/21 (ODS, 111 KB)
https://assets.publishing.service.gov.uk/media/6423fb862fa8480013ec0e2c/Community_Life_Survey_-_Volunteering_in_the_Heritage_Sector.ods">Community Life Survey: Volunteering in the Heritage Sector, 2021/22 (ODS, 10.8 KB)
https://assets.publishing.service.gov.uk/media/62a9bd3e8fa8f50390d45147/CLS_ad_hoc_Volunteering_Final_220609_.ods">Community Life Survey: Further estimates on volunteering trends in England (ODS, 62.2 KB)
https://assets.publishing.service.gov.uk/media/618cde40e90e070440c8b97e/CLS_Ad_hoc_-_member_of_public_-_Nov_2021.xlsx">Community Life Survey: Formal volunteering in groups, clubs or organisations, 2019/20 to 2020/21 (MS Excel Spreadsheet, 67.5 KB)
https://assets.publishing.service.gov.uk/media/618cddd28fa8f50379269bef/ONS_ad_hoc_Nov_2021.xlsx">Community Life Survey: Feeling of being able to influence decisions that affect your local area, 2020/2021 (MS Excel Spreadsheet, 70.6 KB)
https://assets.publishing.service.gov.uk/media/5d010983e5274a3cf94f84ea/Community_Life_Survey_further_estimates_of_levels_of_loneliness_in_London_and_England_2017-18.xlsx">Community Life Survey: Further estimates of levels of loneliness in London and England 2017-18 (MS Excel Spreadsheet, 83.3 KB)
https://assets.publishing.service.gov.uk/media/5d010905e5274a3cfb11188d/Community_Life_Survey_Frequency_of_chatting_to_neighbours_2017-18.xlsx">Community Life Survey: Frequency of chatting to neighbours 2017-18</a
Facebook
TwitterThe Sport Fish Division of the Alaska Department of Fish and Game conducts an annual mail survey to estimate sport fish harvest and catch by fisheries, areas, regions in which the fish were caught, and species. Obtaining these data provides decision-makers with the information needed to maintain, protect, and improve recreational fisheries.The survey was first implemented in 1977, capturing harvest only metrics, but was expanded to include harvest plus fish caught and released in 1990. The current version of the survey was last revised in 2011 and is now designed to capture guided and non-guided fishing activity throughout the state of Alaska. The Alaska Sport Fishing Survey database made data from 1996 to 2016 available through a query application. Prior years data have not been included in this dataset, but can be found in historic publications (https://www.adfg.alaska.gov/sf/sportfishingsurvey/index.cfm?ADFG=main.historic). In accordance with guidelines established by Mills and Howe, other than to document that sport fishing occurred, estimates based on fewer than 12 responses should not be used. Estimates based on 12 to 29 responses can be useful in indicating relative orders of magnitude and for assessing long-term trends. Estimates based on 30 or more responses are generally useable. More information about the survey is available at the Alaska Sport Fishing Survey webpage (http://www.adfg.alaska.gov/sf/sportfishingsurvey/). In this dataset is the original excel file, an R script that reformats this file, and 3 csv files derived from the original file. Also included are two ADFG reports, in pdf format, which explain how these data are gathered and processed by ADFG.
Facebook
Twitterhttps://github.com/etalab/licence-ouverte/blob/master/LO.mdhttps://github.com/etalab/licence-ouverte/blob/master/LO.md
Base de donnée officielle des équipements sportifs en FranceData ES est la base de données des équipements sportifs et des lieux de pratiques du Ministère chargé des Sports. Elle est mise à jour quotidiennement. Cette donnée découle d'une obligation légale où "tout propriétaire d'un équipement sportif est tenu d'en faire la déclaration à l'administration en vue de l'établissement d'un recensement des équipements" (Code du Sport L312-2). Depuis 2005, ce dispositif contribue à documenter et éclairer le développement de la pratique sportive en France. Aujourd'hui, avec plus de 330 000 lieux de pratiques recensés en France métropolitaine et dans les territoires d'outre-mer, DATA ES constitue un référentiel exhaustif, mis à jour quotidiennement. Chaque équipement recensé est associé à un code national unique, garantissant une identification précise et homogène. Data ES - Complet Ce jeu de données constitue l’agrégation complète des équipements sportifs, de leurs installations associées, et des activités sportives pratiquées sur chaque site. Il s'agit du fichier principal pour les analyses territoriales ou cartographiques à l’échelle nationale.Méthodologie et définitions Informations clés
Fréquence de mise à jour : quotidienne Format : CSV, JSON, via API Licence : Licence Ouverte / Etalab 2.0 Source : Ministère des Sports Contact : contact-equipements[AT]sports.gouv.fr
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The datasets provided include the players data for the Career Mode from FIFA 15 to FIFA 23. The data allows multiple comparisons for the same players across the last 9 versions of the videogame.
Some ideas of possible analysis:
Historical comparison between Messi and Ronaldo (what skill attributes changed the most during time - compared to real-life stats);
Ideal budget to create a competitive team (at the level of top n teams in Europe) and at which point the budget does not allow to buy significantly better players for the 11-men lineup. An extra is the same comparison with the Potential attribute for the lineup instead of the Overall attribute;
Sample analysis of top n% players (e.g. top 5% of the player) to see if some important attributes as Agility or BallControl or Strength have been popular or not acroos the FIFA versions.
Every player, coach, and team available in FIFA 15, 16, 17, 18, 19, 20, 21, 22, and also FIFA 23
All FIFA updates from 10th September 2015 until 13th January 2023
110 attributes for players, 8 attributes for coaches, and 54 attributes for teams
URL of the scraped players, coaches, and teams
Player positions, with the role in the club and in the national team
Player attributes with statistics as Attacking, Skills, Defense, Mentality, GK Skills, etc.
Player personal data like Nationality, Club, DateOfBirth, Wage, Salary, etc.
Team data regarding their coaches, their overall value, and tactics
Updates from previous FIFA 22 dataset are the following:
Inclusion of FIFA 23 data
Inclusion of all FIFA updates for each version from 15 to 23
Addition of coaches and teams data
Data provided only with the 2nd yearly FIFA update - like done in the previous datasets - is still available in files called 'male_players (legacy)' and 'female_players (legacy)'
Players data for each gender is now combined in a single file for all FIFA versions and updates
Only CSV files uploaded (Excel files excluded)
Other minor changes (e.g. field in players' data called 'sofifa_id' has been updated to 'player_id', since teams and coaches are now included too)
Data has been scraped from the publicly available website sofifa.com.
As described in https://sofifa.com/robots.txt, there is no limitation at the time of scraping for collecting data for FIFA players, coaches, and teams.
Limitations to scraping the website only relate to player comparisons and API.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel file contains multiple sheets: a README sheet; a Time Series sheet with conference-level performance, online media mentions, and net sentiment; and additional sheets showing annual game-level performance by conference. (XLSX)
Facebook
TwitterRaw GPS data collected from NRL referees during 2018 NRL season was obtained from the NRL Referees Association and shared with the researcher. The raw GPS data .csv files were then processed in Microsoft Excel and presented in a Master of Science thesis.
For enquiries about the dataset please email David Boyle: dboyle@nrl.com.au
Facebook
TwitterAny aspiring datascientist will look everything in view of data. Even when chilling with friends, watching cricket live and cheering for the favorite team.
It includes ODI, Test, t20 statistics of all the players in all the three category (batting ,bowling and fielding).
We wouldn't be here without the help of cricket. Thank you for all the great cricketers for the wonderful contribution.
Facebook
TwitterArchery is my favorite sport and I enjoy this as it keeps calming and keep me focused. I have played archery for over 4 years, 2 years in the UK in my local club Epping Archers and 2.5 years in Hong Kong with Target X.
Recently I decided to take the opportunity to enter the open competition at 18 meters with 6 ringed 80cm target face. To quality further to other competition you will need to pass the minimum score requirement. Example, beginners’ competition 18 meters. I would have to score minimum 600 points to qualify in the 30 meters tournaments.
The type of bow I use is a recurve bow it is most recognisable in the Olympics. The target face is 80cm and there is 6 ring scoring from 10, 9, 8, 7, 6, 5 with 10 being the highest point per arrow. The lowest is point is 5 and arrows that hits the outside is zero points known as ‘M’ a miss. In the competition there are total of 72 arrows, it is split into 2 rounds of 36 arrows. Each round has 6 ends a one end consists of 6 arrows. The maximum number of point per arrow is 10 points and a total of 72 arrows totals to 720 points. The minimum points to qualify for 18 meter shoot is 600 points.
I am using this opportunity to complete an analysis to calculate averages, maximum and minimum points scored. This will be followed by using R programming, Tableau and Excel spreadsheet to produce graphs, charts and dashboard.
The data collected are my personal best, it is recorded on a hand written sheet and it is transferred on to a spreadsheet and saved as a CSV file format. The time period is from 13th Dec 2020 to 28th Sep 2021, there are total of 13 games played during this time. Also with the Covid-19 social distancing, the games played are not consistent, i.e 1 game every Sunday week.
The dataset has index column which is my primary key, the days and date. The games represent all the 72 arrows, The rounds have been recorded as 1 and 2 The ends have been 1 to 12
The shots_1, shot_2 and so on… are the actual scoring, ‘M’ is recorded as a number ‘0’. Then total columns represents each end actual point scored. The hits represents the number of arrows that hit the 6 ring target face. Any arrows that does not land on the target face is not counted as it represents a miss. The tens columns represents the number of times I have hit 10.
Thank you to Target X for maintaining social distance in order to keep the club running.
Facebook
TwitterThis dataset was created by David Gladson
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains IPL match statistics, player details, team aliases, and team data from 2008 to the latest season.
It is useful for cricket analytics, performance tracking, and predictive modeling.
This dataset is compiled using publicly available IPL match data from Cricsheet.
Cricsheet provides structured ball-by-ball cricket data in JSON format for various tournaments, including IPL.
🔗 Cricsheet Official Website: https://cricsheet.org/
The player and team details have been enriched using additional sources like ESPN Cricinfo and IPL official stats.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset is completed! Data was updated daily during the Olympic!
You can support the dataset via the upvote button!
The Paris 2024 Olympic Summer Games dataset provides comprehensive information about the Summer Olympics held in 2024. It covers various aspects of the event, including participating countries, athletes, sports disciplines, medal standings, and key event details. More about the Olympic Games on the official site Olympics Paris 2024 and Wiki.
| Table | Description | Note |
|---|---|---|
athletes.csv | personal information about all athletes | released |
coaches.csv | personal information about all coaches | released |
events.csv | all events that had a place | released |
medals.csv | all medal holders | released |
medals_total.csv | all medals (grouped by country) | released |
medalists.csv | all medalists | released |
nocs.csv | all nocs (code, country, country_long ) | released |
schedule.csv | day-by-day schedule of all events | released |
schedule_preliminary.csv | preliminary schedule of all events | released |
teams.csv | all teams | released |
technical_officials.csv | all technical_officials (referees, judges, jury members) | released |
results | all results | released |
torch_route.csv | torch relay places | released |
vanues.csv | all Olympic venues | released |
I am very thankful to Luca Fontana, zenzombie and others for their efforts in helping me to make the dataset better. Luca Fontana did a manual check medalist.csv table and zenzombie cover dataset with tests.
If you have any questions or suggestions please start a discussion.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Comprehensive dataset containing detailed information on batting and bowling performances, as well as the schedule and results of matches from the ICC Cricket World Cup 2023. The dataset covers player statistics, match details, and more, providing a rich resource for cricket enthusiasts, analysts, and data scientists interested in exploring the dynamics of the tournament.
Content - batting_summary.csv: Player-wise batting statistics. - bowling_summary.csv: Player-wise bowling statistics. - matches_schedule_results.csv: Schedule and results of World Cup 2023 matches.
Facebook
TwitterExcel que muestra la tabla histórica de la Primera División de Chile, años seguidos en Primera División, las veces que han descendido a Primera B, la tabla histórica de equipos chilenos en Copa Libertadores y Copa Sudamericana, que clubes son lo más seguidos en la red social Instagram y el Palmarés de los equipos chilenos.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains all the data of the players listed in the NBA 2K25 games developed by 2K. The dataset is made out of three big parts, the biodata of the player, the attributes, and the badges. At this moment, there is no more missing data that is due to the connection issue of the web scraper as I added the validation process algorithm on my code. Soon I will be adding the data for the hot zones of every player.
Please kindly upvote this dataset! 😉
I do not own any part of the data, the data is scraped from https://www.2kratings.com/. So hopefully this dataset can you help you in some way especially in learning about data science or statistics.
If you have any question or suggestion, please let me know. Thanks a lot!
Columns explanation: 1. name : Player's full name (suffix included). 2. nationality_1 : Player's main nationality. 3. nationality_2 : Player's second nationality for player's with multiple nationality. 4. team : Player's current NBA team. 5. jersey : Player's current jersey number. 6. position_1 : Player's main position on the team lineup. 7. position_2 : Player's secondary position on the team lineup. 8. archetype : Player's type/characteristics of gameplay in a game. 9. height_feet : Player's height in feet and inch (separated by ' and "). 10. height_cm : Player's height in centimeters. 11. weight_lbs : Player's weight in pounds. 12. weight_kg : Player's weight in kilograms. 13. wingspan_feet : Player's wingspan in feet and inch (separated by ' and "). 14. wingspan_cm : Player's wingspan in centimeters. 15. season_salary : Player's salary in a season (numerical value). 16. years_in_the_nba : Player's professional years in the NBA. 17. birthdate : Player's birthdate (format : "Month dd, Year"). 18. hometown : Player's hometown (format : "City, State/Country"). 19. prior_to_nba : Player's prior to NBA (city, state, Uni, or country).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a collection of Arabic texts, which covers modern Arabic language used in newspapers articles. The text contains alphabetic, numeric and symbolic words. The existence of numeric and symbolic words in this dataset could tell the efficiency and robustness of many Arabic text classification and indexing documents.
The dataset consists of 111,728 documents (cf. Table 1) and 319,254,124 words (cf. Table 2) structured in text files, and collected from 3 Arabic online newspapers: Assabah [9], Hespress [10] and Akhbarona [11] using semi-automatic web crawling process. The documents in the dataset are categorized into 5 classes: sport, politic, culture, economy and diverse. The number of documents and words for each class varies from one class to another (cf. Tables 1-2).
BINIZ, mohamed (2018), “DataSet for Arabic Classification”, Mendeley Data, V2, doi: 10.17632/v524p5dhpj.2
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 2021-2022 football player stats per 90 minutes. Only players of Premier League, Ligue 1, Bundesliga, Serie A and La Liga are listed.
+2500 rows and 143 columns. Columns' description are listed below.