11 datasets found
  1. Adventure Works 2022 CSVs

    • kaggle.com
    zip
    Updated Nov 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
    Explore at:
    zip(567646 bytes)Available download formats
    Dataset updated
    Nov 2, 2022
    Authors
    Algorismus
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Adventure Works 2022 dataset

    How this Dataset is created?

    On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

    How this Dataset may help you?

    this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

    How to use this Dataset?

    Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.

  2. FitBit Fitness Tracker Data (revised)

    • kaggle.com
    zip
    Updated Dec 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    duart2688 (2022). FitBit Fitness Tracker Data (revised) [Dataset]. https://www.kaggle.com/duart2688/fitabase-data-cleaned-using-sql
    Explore at:
    zip(12763010 bytes)Available download formats
    Dataset updated
    Dec 17, 2022
    Authors
    duart2688
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.

    Main modifications

    This is the list of manipulations performed on the original dataset, published by Möbius. All the cleaning process and rearrangements were performed in BigQuery, using SQL functions. 1) After I took a closer look at the source dataset, I realized that for my case study, I did not need some of the tables contained in the original archive. Therefore, I decided not to import - dailyCalories_merged.csv, - dailyIntensities_merged.csv, - dailySteps_merged.csv. as they proved redundant, their content could be found in the dailyActivity_merged.csv file. In addition, the files - minutesCaloriesWide_merged.csv, - minutesIntensitiesWide_merged.csv, - minuteStepsWide_merged.csv.
    were not imported, as they presented the same data contained in other files in a wide format. Hence, only the files with long format containing the same data were imported in the BigQuery database.

    2) To be able to compare and measure the correlation among different variables based on hourly records, I decided to create a new table based on LEFT JOIN function and columns Id and ActivityHour. I repeated the same JOIN on tables with minute records. Hence I obtained 2 new tables: - hourly_activity.csv, - minute_activity.csv.

    3) To validate most of the columns containing DATE and DATETIME values that were imported as STRING data type, I used the PARSE_DATE() and PARSE_DATETIME() commands. While importing the - heartrate_seconds_merged.csv, - hourlyCalories_merged.csv, - hourlyIntensities_merged.csv, - hourlySteps_merged.csv, - minutesCaloriesNarrow_merged.csv, - minuteIntensitiesNarrow_merged.csv, - minuteMETsNarrow_merged.csv, - minuteSleep_merged.csv, - minuteSteps_merged.csv, - sleepDay_merge.csv, - weigthLog_Info_merged.csv files to BigQuery, it was necessary to import the DATETIME and DATE type columns as STRING, because the original syntax, used in the CSV files, couldn’t be recognized as a correct DATETIME data type, due to “AM” and “PM” text at the end of the expression.

    Acknowlegement

    1. Möbius' version of the data set can be found here.
    2. Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa https://zenodo.org/record/53894#.YMoUpnVKiP9-
  3. d

    Health and Retirement Study (HRS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

  4. Real-Estate Dashboard

    • kaggle.com
    zip
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramy Elbouhy (2025). Real-Estate Dashboard [Dataset]. https://www.kaggle.com/datasets/ramyelbouhy/real-estate-dashboard/data
    Explore at:
    zip(10488043 bytes)Available download formats
    Dataset updated
    May 23, 2025
    Authors
    Ramy Elbouhy
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Introduction

    Objective:

    Improve understanding of real estate performance.

    Leverage data to support business decisions.

    Scope:

    Track property sales, visits, and performance metrics.

    Technical Steps

    Step 1: Creating an Azure SQL Database

    Action: Provisioned an Azure SQL Database to host real estate data.

    Why Azure?: Scalability, security, and integration with Power BI.

    Step 2: Importing Data

    Action: Imported datasets (properties, visits, sales, agents, etc.) into the SQL database.

    Tools Used: SQL Server Management Studio (SSMS) and Azure Data Studio.

    Step 3: Data Transformation in SQL

    Normalized Data: Ensured data consistency by normalizing the formats of dates and categorical fields.

    Calculated Fields:

    Time on Market: DATEDIFF function to calculate the difference between listing and sale dates.

    Conversion Rate: Aggregated sales and visits data using COUNT and SUM to calculate conversion rates per agent and property.

    Buyer Segmentation: Identified first-time vs repeat buyers using JOINs and COUNT functions.

    Data Cleaning: Removed duplicates, handled null values, and standardized city names and property types.

    Step 4: Connecting Power BI to Azure SQL

    Action: Established a live connection to Azure SQL Database in Power BI.

    Benefit: Real-time data updates and efficient analysis.

    Step 5: Data Modeling in Power BI

    Relationships:

    Defined relationships between tables (e.g., Sales, Visits, Properties, Agents) using primary and foreign keys.

    Utilized active and inactive relationships for dynamic calculations like time-based comparisons.

    Calculated Columns and Measures:

    Time on Market: Created a calculated measure using DATEDIFF.

    Conversion Rates: Used DIVIDE and CALCULATE for accurate per-agent and per-property analysis.

    Step 6: Creating Visualizations

    Key Visuals:

    Sales Heatmap by City: Geographic visualization to highlight sales performance.

    Conversion Rates: Bar charts and line graphs for trend analysis.

    Time on Market: Boxplots and histograms for distribution insights.

    Buyer Segmentation: Pie charts and bar graphs to show buyer profiles.

    Step 7: Building Dashboards

    Structure:

    Page 1: Overview (Key Metrics and Sales Heatmap).

    Page 2: Performance Analysis (Conversion Rates, Time on Market).

    Page 3: Buyer Insights (First-Time vs Repeat Buyers, Property Distribution).

    Insights Gained

    Insight 1: Sales Performance by City

    Cities highest sales volume.

    City low performance, requiring further investigation.

    Insight 2: Conversion Rates

    Agent highest conversion rate.

    Certain properties (e.g., luxury villas) outperform others in conversion.

    Insight 3: Time on Market

    Average time on market.

    Insight 4: Buyer Trends

    Repeat Buyers make up 60% of purchases.

    First-Time Buyers prefer apartments over villas.

    Recommendation

    Focus on High-Performing Cities Recommendation 2: Support Low-Performing Areas

    Investigate challenges to develop targeted marketing strategies.

    Enhance Conversion Rates

    Train agents based on techniques used by top performers.

    Prioritize marketing for properties with high conversion rates.

    Engage First-Time Buyers

    Create specific campaigns for apartments to attract first-time buyers.

    Offer financial guidance programs to boost their confidence.

    Summary:

    Built a robust data solution from Azure SQL to Power BI.

    Derived actionable insights that can drive real estate growth.

  5. Z

    Qualisign: Software Metrics and GoF Design Patterns of the Maven Central...

    • data.niaid.nih.gov
    Updated Sep 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aichberger, Johann (2020). Qualisign: Software Metrics and GoF Design Patterns of the Maven Central Repository [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3731871
    Explore at:
    Dataset updated
    Sep 24, 2020
    Authors
    Aichberger, Johann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains software metric and design pattern data for around 100,000 projects from the Maven Central repository. The data was collected and analyzed as part of my master's thesis "Mining Software Repositories for the Effects of Design Patterns on Software Quality" (https://www.overleaf.com/read/vnfhydqxmpvx, https://zenodo.org/record/4048275).

    The included qualisign.* files all contain the same data in different formats: - qualisign.sql: standard SQL format (exported using "pg_dump --inserts ..."), - qualisign.psql: PostgreSQL plain format (exported using "pg_dump -Fp ..."), - qualisign.csql: PostgreSQL custom format (exported using "pg_dump -Fc ...").

    create-tables.sql has to be executed before importing one of the qualisign.* files. Once qualisign.*sql has been imported, create-views.sql can be executed to preprocess the data, thereby creating materialized views that are more appropriate for data analysis purposes.

    Software metrics were calculated using CKJM extended: http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm/

    Included software metrics are (21 total): - AMC: Average Method Complexity - CA: Afferent Coupling - CAM: Cohesion Among Methods - CBM: Coupling Between Methods - CBO: Coupling Between Objects - CC: Cyclomatic Complexity - CE: Efferent Coupling - DAM: Data Access Metric - DIT: Depth of Inheritance Tree - IC: Inheritance Coupling - LCOM: Lack of Cohesion of Methods (Chidamber and Kemerer) - LCOM3: Lack of Cohesion of Methods (Constantine and Graham) - LOC: Lines of Code - MFA: Measure of Functional Abstraction - MOA: Measure of Aggregation - NOC: Number of Children - NOM: Number of Methods - NOP: Number of Polymorphic Methods - NPM: Number of Public Methods - RFC: Response for Class - WMC: Weighted Methods per Class

    In the qualisign.* data, these metrics are only available on the class level. create-views.sql additionally provides averages of these metrics on the package and project levels.

    Design patterns were detected using SSA: https://users.encs.concordia.ca/~nikolaos/pattern_detection.html

    Included design patterns are (15 total): - Adapter - Bridge - Chain of Responsibility - Command - Composite - Decorator - Factory Method - Observer - Prototype - Proxy - Singleton - State - Strategy - Template Method - Visitor

    The code to generate the dataset is available at: https://github.com/jaichberg/qualisign

    The code to perform quality analysis on the dataset is available at: https://github.com/jaichberg/qualisign-analysis

  6. Z

    Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...

    • data.niaid.nih.gov
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V. (2024). Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and KDE [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_400614
    Explore at:
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    Ryerson University
    Authors
    Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects.

    File Descriptions

    apache.csv - Apache Defect Rediscovery dataset

    eclipse.csv - Eclipse Defect Rediscovery dataset

    kde.csv - KDE Defect Rediscovery dataset

    apache.relations.csv - Inter-relations of rediscovered defects of Apache

    eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse

    kde.relations.csv - Inter-relations of rediscovered defects of KDE

    create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping

    create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files

    rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database

    neo4j_examples.txt - Sample Neo4j queries

    mysql_examples.txt - Sample MySQL queries

    rediscovery_eclipse_6325.png - Output of Neo4j example #1

    distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project

  7. US National Flight Data 2015 - 2020

    • kaggle.com
    zip
    Updated Feb 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BingeCode (2021). US National Flight Data 2015 - 2020 [Dataset]. https://www.kaggle.com/bingecode/us-national-flight-data-2015-2020
    Explore at:
    zip(890115594 bytes)Available download formats
    Dataset updated
    Feb 18, 2021
    Authors
    BingeCode
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Area covered
    United States
    Description

    Context

    This data set was retrieved from the Transtats webpage of the Bureau of Transportation Statistics of the US Department of Transportation. This data was cleaned and made ready for use in an university project where the goal was to compare different database engines in terms of performance and more.

    NOTE: December 2020 was not included in the data set since it was not made available by the BTS as of today, 18th Feb 2021.

    Content

    The data is split into CSV files for each year 2015 to 2020. flights.csv contains all the data in one file. The CSV files contain no headers for the columns. The headers are as follows:

    'YEAR', 'MONTH', 'DAY_OF_MONTH', 'DAY_OF_WEEK', 'OP_UNIQUE_CARRIER', 'ORIGIN_CITY_NAME',
    'ORIGIN_STATE_ABR', 'DEST_CITY_NAME', 'DEST_STATE_ABR', 'CRS_DEP_TIME', 'DEP_DELAY_NEW',
    'CRS_ARR_TIME', 'ARR_DELAY_NEW', 'CANCELLED', 'CANCELLATION_CODE', 'AIR_TIME', 'DISTANCE'
    

    NOTE: The headers were removed due to the requirement of easily importing the data into SQL

    Other

    If you have any questions about how I retrieved/cleaned the data or anything about my project, feel free to check out my Github repository or shoot me a message.

  8. Export time comparison between PFB and Gen3.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Lukowski; Andrew Prokhorenkov; Robert L. Grossman (2023). Export time comparison between PFB and Gen3. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010944.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael Lukowski; Andrew Prokhorenkov; Robert L. Grossman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.

  9. International Student Mobility 2020-2023

    • kaggle.com
    zip
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Rivas (2025). International Student Mobility 2020-2023 [Dataset]. https://www.kaggle.com/datasets/danielarivasu/international-student-mobility
    Explore at:
    zip(2794 bytes)Available download formats
    Dataset updated
    Nov 17, 2025
    Authors
    Daniela Rivas
    License

    https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets

    Description

    This dataset contains information on international student mobility and economic indicators for multiple countries and territories during 2020-2023. In addition, it includes data on Importing Countries and Exporting Countries, which refer to countries that receive (import) or send (export) international students.

    The data was obtained from official open-data sources, mainly the UNESCO Institute for Statistics (UIS) and the World Bank open data portal. The original files were downloaded as CSV directly from these platforms. After downloading, the datasets were merged, filtered, and cleaned using SQL and excel (for example, removing duplicates, selecting specific years such as 2023 and 2024). No web scraping was used; all data comes from publicly available official databases.

    Conclusions / Insights

    • Countries with negative average net flows, such as China and India, are net exporters of students, meaning they send more students abroad than they receive.

    • Countries with positive average net flows, such as the United States and other high-income countries, are net importers of students, attracting more international students than they send.

    • There appears to be a relationship between GDP per capita (PPP, 2023 USD) and student mobility patterns: countries with higher PPP tend to attract more international students, while countries with lower PPP tend to send more students abroad.

    • This dataset can be used to analyze trends in international student mobility, compare countries’ economic contexts, and identify patterns between student flows and national wealth.

    • The classification of countries as Importing or Exporting provides a quick way to group and compare countries in terms of international student dynamics.

    Practical application: Universities and educational institutions can use this data to better understand potential target markets, identify countries that send or receive more students, and develop strategies for recruitment and international collaborations.

  10. Cleaned Contoso Dataset

    • kaggle.com
    zip
    Updated Aug 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanu (2023). Cleaned Contoso Dataset [Dataset]. https://www.kaggle.com/datasets/bhanuthakurr/cleaned-contoso-dataset
    Explore at:
    zip(487695063 bytes)Available download formats
    Dataset updated
    Aug 27, 2023
    Authors
    Bhanu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data was imported from the BAK file found here into SQL Server, and then individual tables were exported as CSV. Jupyter Notebook containing the code used to clean the data can be found here

    Version 6 has a some more cleaning and structuring that was noticed after importing in Power BI. Changes were made by adding code in python notebook to export new cleaned dataset, such as adding MonthNumber for sorting by month number, similar for WeekDayNumber.

    Cleaning was done in python while also using SQL Server to quickly find things. Headers were added separately, ensuring no data loss.Data was cleaned for NaN, garbage values and other columns.

  11. 2020 NFL Statistics (Active and Retired Players)

    • kaggle.com
    zip
    Updated Feb 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trevor Youngquist (2021). 2020 NFL Statistics (Active and Retired Players) [Dataset]. https://www.kaggle.com/datasets/trevyoungquist/2020-nfl-stats-active-and-retired-players
    Explore at:
    zip(3930921 bytes)Available download formats
    Dataset updated
    Feb 8, 2021
    Authors
    Trevor Youngquist
    Description

    2020 NFL Stats Web Scrape

    This dataset consists of basic statistics and career statistics provided by the NFL on their official website (http://www.nfl.com) for all players, active and retired.

    Summary

    All of the data was web scraped using Python code, which can be found and downloaded here: https://github.com/ytrevor81/NFL-Stats-Web-Scrape

    Explanation of Data

    Before we go into the specifics, it's important to note in the basic statistics and career statistics CSV files that all players are assigned a 'Player_Id'. This is the same ID used by the official NFL website to identify each player. This is useful in case of, for example, importing these CSV files in a SQL database for an app.

    1. The first main group of stats is the basic stats provided for each player. This data is stored in the CSV file titled Active_Player_Basic_Stats.csv and Retired_Player_Basic_Stats.csv.

    The data pulled for each player in Active_Player_Basic_Stats.csv is as follows: a. Player ID b. Full Name c. Position d. Number e. Current Team f. Height g. Height h. Weight i. Experience j. Age k. College

    The data pulled for each player in Retired_Player_Basic_Stats.csv differs slightly from the previous data set. The data is as follows: a. Player ID b. Full Name c. Position f. Height g. Height h. Weight j. College k. Hall of Fame Status

    1. The second main group of stats gathered for each player are their career statistics. Due to the NFL having a various amount of positions that players occupy, the career statistics are divided into statistics categories. The stats for active players and retired players are structured the same, but are stored in separate CSV files (ActivePlayer_(category)_Stats.csv and RetiredPlayer_(category)_Stats.csv). The following are the career statistics categories and accompanying CSV file names: a. Defensive Stats - ..._Defense_Stats.csv b. Fumbles Stats - ..._Fumbles_Stats.csv c. Kick Returns Stats - ..._KickReturns_Stats.csv d. Field Goal Kicking Stats - ..._Kicking_Stats.csv e. Passing Stats - ..._Passing_Stats.csv f. Punt Returns Stats - ..._PuntReturns_Stats.csv g. Punting Stats - ..._Punting_Stats.csv h. Receiving Stats - ..._Receiving_Stats.csv i. Rushing Stats - ..._Rushing_Stats.csv
  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
Organization logo

Adventure Works 2022 CSVs

Dataset of Adventure Works from SQL to CSVs (useful for PL-300 exam)

Explore at:
zip(567646 bytes)Available download formats
Dataset updated
Nov 2, 2022
Authors
Algorismus
License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

Adventure Works 2022 dataset

How this Dataset is created?

On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

How this Dataset may help you?

this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

How to use this Dataset?

Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.

Search
Clear search
Close search
Google apps
Main menu