8 datasets found
  1. o

    Tidy Tuesday Work

    • osf.io
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brendan Jamieson (2023). Tidy Tuesday Work [Dataset]. https://osf.io/7hvd3
    Explore at:
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    Center For Open Science
    Authors
    Brendan Jamieson
    Description

    No description was included in this Dataset collected from the OSF

  2. Beach Volleyball

    • kaggle.com
    Updated May 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Mostipak (2020). Beach Volleyball [Dataset]. https://www.kaggle.com/jessemostipak/beach-volleyball/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jesse Mostipak
    Description

    Beach Volleyball

    The data this week comes from Adam Vagnar who also blogged about this dataset. There's a LOT of data here - match-level results, player details, and match-level statistics for some matches. For all this dataset all the matches are played 2 vs 2, so there are columns for 2 winners (1 team) and 2 losers (1 team). The data is relatively ready for analysis and clean, although there are some duplicated columns and the data is wide due to the 2-players per team.

    Check out the data dictionary, or Wikipedia for some longer-form details around what the various match statistics mean.

    Most of the data is from the international FIVB tournaments but about 1/3 is from the US-centric AVP.

    The FIVB Beach Volleyball World Tour (known between 2003 and 2012 as the FIVB Beach Volleyball Swatch World Tour for sponsorship reasons) is the worldwide professional beach volleyball tour for both men and women organized by the Fédération Internationale de Volleyball (FIVB). The World Tour was introduced for men in 1989 while the women first competed in 1992.

    Winning the World Tour is considered to be one of the highest honours in international beach volleyball, being surpassed only by the World Championships, and the Beach Volleyball tournament at the Summer Olympic Games.

    FiveThirtyEight examined the disadvantage of serving in beach volleyball, although they used Olympic-level data. Again, Adam Vagnar also covered this data on his blog.

    What is Tidy Tuesday?

    TidyTuesday A weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the R for Data Science textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem. However, any code-based methodology is welcome - just please remember to share the code used to generate the results.

    Join the R4DS Online Learning Community in the weekly #TidyTuesday event! Every week we post a raw dataset, a chart or article related to that dataset, and ask you to explore the data. While the dataset will be “tamed”, it will not always be tidy!

    We will have many sources of data and want to emphasize that no causation is implied. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our guidelines are to use the data provided to practice your data tidying and plotting techniques. Participants are invited to consider for themselves what nuancing factors might underlie these relationships.

    The intent of Tidy Tuesday is to provide a safe and supportive forum for individuals to practice their wrangling and data visualization skills independent of drawing conclusions. While we understand that the two are related, the focus of this practice is purely on building skills with real-world data.

  3. Broadway Weekly Grosses

    • kaggle.com
    Updated Apr 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Mostipak (2020). Broadway Weekly Grosses [Dataset]. https://www.kaggle.com/jessemostipak/broadway-weekly-grosses/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jesse Mostipak
    Description

    Context

    #TidyTuesday is a weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the R for Data Science textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem. However, any code-based methodology is welcome - just please remember to share the code used to generate the results.

    Content

    This data comes from Playbill. Weekly box office grosses comprise data on revenue and attendance figures for theatres that are part of The Broadway League, an industry association for, you guessed it, Broadway theatre.

    CPI data is from the U.S. Bureau of Labor Statistics. There are many, many measures of CPI, so the one used here is "All items less food and energy in U.S. city average, all urban consumers, seasonally adjusted" (table CUSR0000SA0L1E).

    Acknowledgements

    Huge thanks to Alex Cookson who provided ALL of this week's data, cleaning script, and readme! You can check out his recent blog post on the same data here, and explore all of the raw data and other details on Alex's GitHub.

  4. Tuskegee Airmen for Tidy Tuesday

    • kaggle.com
    Updated Feb 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katie Press (2022). Tuskegee Airmen for Tidy Tuesday [Dataset]. https://www.kaggle.com/datasets/katiepress/Tuskegee-airmen/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Katie Press
    Description

    Dataset

    This dataset was created by Katie Press

    Contents

  5. A

    ‘Animal Crossing Reviews’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Animal Crossing Reviews’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-animal-crossing-reviews-39f4/9af6d545/?iid=021-327&v=presentation
    Explore at:
    Dataset updated
    Nov 21, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Animal Crossing Reviews’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jessemostipak/animal-crossing on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context and Content

    The data this week comes from the VillagerDB and Metacritic. VillagerDB brings info about villagers, items, crafting, accessories, including links to their images. Metacritic brings user and critic reviews of the game (scores and raw text).

    Per Wikipedia:

    Animal Crossing: New Horizons is a 2020 life simulation video game developed and published by Nintendo for the Nintendo Switch. It is the fifth main series title in the Animal Crossing series. New Horizons was released in all regions on March 20, 2020.

    New Horizons sees the player assuming the role of a customizable character who moves to a deserted island after purchasing a package from Tom Nook, a tanuki character who has appeared in every entry in the Animal Crossing series. Taking place in real-time, the player can explore the island in a nonlinear fashion, gathering and crafting items, catching insects and fish, and developing the island into a community of anthropomorphic animals.

    Animal Crossing as explained by a Polygon opinion piece.

    With just a few design twists, the work behind collecting hundreds or even thousands of items over weeks and months becomes an exercise of mindfulness, predictability, and agency that many players find soothing instead of annoying.

    Games that feature gentle progression give us a sense of progress and achievability, teaching us that putting in a little work consistently while taking things one step at a time can give us some fantastic results. It’s a good life lesson, as well as a way to calm yourself and others, and it’s all achieved through game design.

    Some potential context for user_reviews.tsv from 538 and a point of potential strife via Animal Crossing World, and lastly a spoiler article analyzing the reviews in R by Boon Tan.

    PS there is an easter egg somewhere in the readme - something to do with... turnips.

    Acknowledgements

    The data was downloaded and cleaned by Thomas Mock for #TidyTuesday during the week of May 4th, 2020. You can see the code used to clean the data in the #TidyTuesday GitHub repository.

    Inspiration

    Potential Analyses:

    • Reviews: Sentiment analysis, text analysis, scores, date effect
    • Villagers/Items: Gender, species, sayings, personality, price, recipe, what about a star sign based off the birthday column?

    --- Original source retains full ownership of the source dataset ---

  6. Volcano Eruptions

    • kaggle.com
    Updated May 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Mostipak (2020). Volcano Eruptions [Dataset]. https://www.kaggle.com/jessemostipak/volcano-eruptions/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jesse Mostipak
    Description

    Context

    The data this week comes from The Smithsonian Institution.

    Axios put together a lovely plot of volcano eruptions since Krakatoa (after 1883) by elevation and type.

    For more information about volcanoes check out the below Wikipedia article or specifically about VEI (Volcano Explosivity Index) see the Wikipedia article here. Lastly, Google Earth has an interactive site on "10,000 Years of Volcanoes"!

    Content

    Per Wikipedia:

    A volcano is a rupture in the crust of a planetary-mass object, such as Earth, that allows hot lava, volcanic ash, and gases to escape from a magma chamber below the surface.

    Earth's volcanoes occur because its crust is broken into 17 major, rigid tectonic plates that float on a hotter, softer layer in its mantle. Therefore, on Earth, volcanoes are generally found where tectonic plates are diverging or converging, and most are found underwater.

    Erupting volcanoes can pose many hazards, not only in the immediate vicinity of the eruption. One such hazard is that volcanic ash can be a threat to aircraft, in particular those with jet engines where ash particles can be melted by the high operating temperature; the melted particles then adhere to the turbine blades and alter their shape, disrupting the operation of the turbine. Large eruptions can affect temperature as ash and droplets of sulfuric acid obscure the sun and cool the Earth's lower atmosphere (or troposphere); however, they also absorb heat radiated from the Earth, thereby warming the upper atmosphere (or stratosphere). Historically, volcanic winters have caused catastrophic famines.

    VEI Volcano Explosivity Index: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4476084%2F27b6f67938591a3fd463bc11dcafd797%2Fvei.svg?generation=1589216563726306&alt=media" alt="">

    Volcano eruptions also can affect the global climate, a Nature Article has open-access data for a specific time-period of eruptions along with temperature anomalies and tree growth. More details can be found from NASA and the UCAR. A summary of the pay-walled Nature article can be found via the Smithsonian.

    The researchers detected 238 eruptions from the past 2,500 years, they report today in Nature. About half were in the mid- to high-latitudes in the northern hemisphere, while 81 were in the tropics. (Because of the rotation of the Earth, material from tropical volcanoes ends up in both Greenland and Antarctica, while material from northern volcanoes tends to stay in the north.) The exact sources of most of the eruptions are as yet unknown, but the team was able to match their effects on climate to the tree ring records.

    The analysis not only reinforces evidence that volcanoes can have long-lasting global effects, but it also fleshes out historical accounts, including what happened in the sixth-century Roman Empire. The first eruption, in late 535 or early 536, injected large amounts of sulfate and ash into the atmosphere. According to historical accounts, the atmosphere had dimmed by March 536, and it stayed that way for another 18 months.

    Tree rings, and people of the time, recorded cold temperatures in North America, Asia and Europe, where summer temperatures dropped by 2.9 to 4.5 degrees Fahrenheit below the average of the previous 30 years. Then, in 539 or 540, another volcano erupted. It spewed 10 percent more aerosols into the atmosphere than the huge eruption of Tambora in Indonesia in 1815, which caused the infamous “year without a summer”. More misery ensued, including the famines and pandemics. The same eruptions may have even contributed to a decline in the Maya empire, the authors say.

    There are additional datasets from the Nature article available as Excel files, but they are a bit more complicated - feel free to explore at your own discretion! If you use any of the Nature data, please cite w/ DOI: https://doi.org/10.1038/nature14565.

    Acknowledgements

    The data was downloaded and cleaned by [T...

  7. NASA Meteorites Dataset

    • kaggle.com
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). NASA Meteorites Dataset [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/meteorites-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sujay Kapadnis
    Description

    This week's dataset is a dataset all about meteorites, where they fell and when they fell! Data comes from the Meteoritical Society by way of NASA. H/t to #TidyTuesday community member Malin Axelsson for sharing this data as an issue on GitHub!

    If you want to find out more about meteorite classifications, Malin was kind enough to share a wikipedia article as well!

    Data Dictionary

    meteorites.csv

    variableclassdescription
    namecharacterMeteorite name
    iddoubleMeteorite numerical ID
    name_typecharacterName type either valid or relict, where relict = a meteorite that cannot be assigned easily to a class
    classcharacterClass of the meteorite, please see Wikipedia for full context
    massdoubleMass in grams
    fallcharacterFell or Found meteorite
    yearintegerYear found
    latdoubleLatitude
    longdoubleLongitude
    geolocationcharacterGeolocation

    @misc{tidytuesday, title = {Tidy Tuesday: A weekly social data project}, author = {R4DS Online Learning Community}, url = {https://github.com/rfordatascience/tidytuesday}, year = {2023} }

  8. Tour de France

    • kaggle.com
    Updated Dec 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PabloMonleon (2020). Tour de France [Dataset]. https://www.kaggle.com/datasets/pablomonleon/tour-de-france-historic-stages-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PabloMonleon
    Area covered
    France
    Description

    The Tour de France today is an annual men's multiple stage bicycle race primarily held in France, while also occasionally passing through nearby countries. Like the other Grand Tours, it consists of 21 day-long stages over the course of 23 days.

    The race started in 1903 and it has changed over the years. Here are 3 datasets with very obvious column names with information about TDF winners, stage general info and detailed stages data for every rider on every edition. I hope you enjoy this data.

    This datasets were put together by Thomas Mock in this github: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-04-07/readme.md

    stage_data.csv - edition: Race edition - year: Year of race - stage_results_id: Stage ID - rank: Rank of racer for stage - time: Time of racer - rider: Rider name - age: Age of racer - team: Team (NA if not on team) - points: Points for the stage - elapsed: Time elapsed stored as lubridate::period - bib_number: Bib number

    tdf_stages.csv - Stage: Stage Number - Date: Date of stage - Distance: Distance in KM - Origin: Origin city - Destination: Destination city - Type: Stage Type - Winner: Winner of the stage - Winner_Country: Winner's nationality

    tdf_winners.csv - edition: Edition of the Tour de France - start_date: Start date of the Tour - winner_name: Winner's name - winner_team: Winner's team (NA if not on a team) - distance: Distance traveled in KM across the entire race - time_overall: Time in hours taken by the winner to complete the race - time_margin: Difference in finishing time between the race winner and the runner up - stage_wins: Number of stage wins (note that it is possible to win the GC without winning any stages at all) - stages_led: Stages led is the number of stages spent as the race leader (wearing the yellow jersey) by the eventual winner - height: Height in meters - weight: Weight in kg - age: Age as winner - born: year born - died: Year died - full_name: Full name - nickname: Nickname - birth_town: Birth town - birth_country: Birth country - nationality: Nationality

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Brendan Jamieson (2023). Tidy Tuesday Work [Dataset]. https://osf.io/7hvd3

Tidy Tuesday Work

Explore at:
Dataset updated
Sep 18, 2023
Dataset provided by
Center For Open Science
Authors
Brendan Jamieson
Description

No description was included in this Dataset collected from the OSF

Search
Clear search
Close search
Google apps
Main menu