Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
CSVvsVizQA
Dataset to compare question answering ability from CSV Data vs Data Visualization images.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CSV file used to create the Gephi file / visualization, "All Artists at the Tate Modern". Original data set retrieved from: https://github.com/tategallery/collection
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of files used for a data visualization project for the Digital Humanities Praxis course at the Graduate Center, CUNY. The files represent raw data (csv), data used for the visualization(s) (gephi), and the visualizations themselves (pdf). A write-up on the project can be located at the GC Academic Commons site: http://dhpraxis14.commons.gc.cuny.edu/2014/11/12/its-big-data-to-me-data-viz-part-2
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Muhammad Ehab Muhammad
Released under Apache 2.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CSV file used for the visualization, "Data Visualization: Claus Oldenberg and Josepf Bueys" (PDF). Original data set retrieved from: https://github.com/tategallery/collection
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data was taken raw from https://database.lichess.org/ and converted to a readable CSV file in this notebook: https://www.kaggle.com/ironicninja/converting-raw-chess-pgn-to-readable-data.
The CSV file includes the actual moves that were played in the chess game, as well as different features like the time control and termination type which also describe the context of the chess game.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I created these files and analysis as part of working on a case study for the Google Data Analyst certificate.
Question investigated: Do annual members and casual riders use Cyclistic bikes differently? Why do we want to know?: Knowing bike usage/behavior by rider type will allow the Marketing, Analytics, and Executive team stakeholders to design, assess, and approve appropriate strategies that drive profitability.
I used the script noted below to clean the files and then added some additional steps to create the visualizations to complete my analysis. The additional steps are noted in corresponding R Markdown file for this data set.
Files: most recent 1 year of data available, Divvy_Trips_2019_Q2.csv, Divvy_Trips_2019_Q3.csv, Divvy_Trips_2019_Q4.csv, Divvy_Trips_2020_Q1.csv Source: Downloaded from https://divvy-tripdata.s3.amazonaws.com/index.html
Data cleaning script: followed this script to clean and merge files https://docs.google.com/document/d/1gUs7-pu4iCHH3PTtkC1pMvHfmyQGu0hQBG5wvZOzZkA/copy
Note: Combined data set has 3,876,042 rows, so you will likely need to run R analysis on your computer (e.g., R Console) rather than in the cloud (e.g., RStudio Cloud)
This was my first attempt to conduct an analysis in R and create the R Markdown file. As you might guess, it was an eye-opening experience, with both exciting discoveries and aggravating moments.
One thing I have not yet been able to figure out is how to add a legend to the map. I was able to get a legend to appear on a separate (empty) map, but not on the map you will see here.
I am also interested to see what others did with this analysis - what were the findings and insights you found?
Facebook
Twitterhttps://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
The Hubway trip history data includes every trip taken through Nov 2013 ? with date, time, origin and destination stations, plus the bike number and more. Data from 2011/07 through 2013/11 The Hubway trip history data Every time a Hubway user checks a bike out from a station, the system records basic information about the trip. Those anonymous data points have been exported into the spreadsheet. Please note, all private data including member names have been removed from these files. What can the data tell us? The CSV file contains data for every Hubway trip from the system launch on July 28th, 2011, through the end of September, 2012. The file contains the data points listed below for each trip. We ve also posed some of the questions you could answer with this dataset - we re sure you.ll have lots more of your own. Duration - Duration of trip. What s the average trip duration for annual members vs. casual users? Start date - Includes start date and time. What are the peak Hubway hours?
Facebook
TwitterThe whole data and source can be found at https://emilhvitfeldt.github.io/friends/
"The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."
friends.csv - Contains the scenes and lines for each character, including season and episodes.friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Geospatial_Coordinates.csv [Postal code, latitude & longitude of data points in Toronto] FourSquareCategories.json [Categories and category IDs of FourSquare API] Processed_data_for_analysis.csv [Data file post data preparation and available for analysis]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the raw experimental data and supplementary materials for the "Asymmetry Effects in Virtual Reality Rod and Frame Test". The materials included are:
• Raw Experimental Data: older.csv and young.csv
• Mathematica Notebooks: a collection of Mathematica notebooks used for data analysis and visualization. These notebooks provide scripts for processing the experimental data, performing statistical analyses, and generating the figures used in the project.
• Unity Package: a Unity package featuring a sample scene related to the project. The scene was built using Unity’s Universal Rendering Pipeline (URP). To utilize this package, ensure that URP is enabled in your Unity project. Instructions for enabling URP can be found in the Unity URP Documentation.
Requirements:
• For Data Files: software capable of opening CSV files (e.g., Microsoft Excel, Google Sheets, or any programming language that can read CSV formats).
• For Mathematica Notebooks: Wolfram Mathematica software to run and modify the notebooks.
• For Unity Package: Unity Editor version compatible with URP (2019.3 or later recommended). URP must be installed and enabled in your Unity project.
Usage Notes:
• The dataset facilitates comparative studies between different age groups based on the collected variables.
• Users can modify the Mathematica notebooks to perform additional analyses.
• The Unity scene serves as a reference to the project setup and can be expanded or integrated into larger projects.
Citation: Please cite this dataset when using it in your research or publications.
Facebook
TwitterThis dataset was created by Equivel
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global CSV Editor market size reached USD 1.32 billion in 2024, reflecting the growing integration of data-centric processes across numerous industries. The market is experiencing robust expansion, supported by a CAGR of 10.4% from 2025 to 2033. By the end of the forecast period in 2033, the CSV Editor market is anticipated to achieve a value of USD 3.16 billion. The primary growth factor driving this market is the increasing reliance on structured data formats for business intelligence, analytics, and process automation, which has led to a surge in demand for advanced CSV editing tools globally.
The growth of the CSV Editor market is underpinned by the digital transformation initiatives adopted by enterprises across sectors such as BFSI, healthcare, IT, and retail. As organizations generate and handle exponentially larger volumes of data, the need to efficiently manage, clean, and manipulate CSV files has become crucial. CSV Editors, which enable users to modify, validate, and visualize large datasets, are now considered essential for data-driven decision-making. The proliferation of cloud computing and the rise of big data analytics have further accentuated the importance of robust CSV editing solutions, as businesses seek to streamline workflows and enhance productivity.
Another significant growth driver is the increasing adoption of automation and artificial intelligence in data management processes. Modern CSV Editors are evolving from simple file manipulation tools to sophisticated platforms that support scripting, automation, and integration with other enterprise software. This evolution is particularly evident in industries such as healthcare and finance, where the accuracy and consistency of data are paramount. The availability of both on-premises and cloud-based deployment modes has also broadened the market’s appeal, catering to organizations with varying security and compliance requirements. Furthermore, the growing trend of remote work and distributed teams has fueled demand for web-based CSV Editors that facilitate real-time collaboration and seamless access from multiple devices.
The CSV Editor market is also benefitting from the increasing focus on data governance and regulatory compliance. As governments and regulatory bodies implement stricter data protection laws, organizations are compelled to invest in tools that ensure data integrity and traceability. CSV Editors play a pivotal role in maintaining audit trails, validating data formats, and supporting compliance with standards such as GDPR and HIPAA. This regulatory backdrop, combined with the rise in cyber threats and data breaches, has made secure and feature-rich CSV Editors a necessity for enterprises seeking to mitigate risks and safeguard sensitive information.
Regionally, North America dominates the CSV Editor market, accounting for the largest revenue share in 2024, driven by the presence of leading technology firms and widespread adoption of data management solutions. Europe follows closely, with strong demand from the BFSI and healthcare sectors. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digitization, expanding IT infrastructure, and increased investments in data analytics. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions gradually embrace digital transformation and modern data management practices.
The CSV Editor market is segmented by component into Software and Services, with software solutions representing the lion's share of the market in 2024. The software segment encompasses standalone CSV editing applications, integrated development environments (IDEs), and plug-ins that facilitate the manipulation and validation of CSV files. These solutions are in high demand due to their ability to handle large datasets, support complex data transformations, and provide user-friendly interfaces for both technical and non-technical users. The continuous evolution of software features, such as real-time collaboration, version control, and advanced data visualization, is further propelling the adoption of CSV Editor software across industries.
The services segment, while smaller in comparison, is gaining traction as organizations seek
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a tool for multi-omics data analysis that enables simultaneous visualization of up to four types of omics data on organism-scale metabolic network diagrams. The tool’s interactive web-based metabolic charts depict the metabolic reactions, pathways, and metabolites of a single organism as described in a metabolic pathway database for that organism; the charts are constructed using automated graphical layout algorithms. The multi-omics visualization facility paints each individual omics dataset onto a different “visual channel” of the metabolic-network diagram. For example, a transcriptomics dataset might be displayed by coloring the reaction arrows within the metabolic chart, while a companion proteomics dataset is displayed as reaction arrow thicknesses, and a complementary metabolomics dataset is displayed as metabolite node colors. Once the network diagrams are painted with omics data, semantic zooming provides more details within the diagram as the user zooms in. Datasets containing multiple time points can be displayed in an animated fashion. The tool will also graph data values for individual reactions or metabolites designated by the user. The user can interactively adjust the mapping from data value ranges to the displayed colors and thicknesses to provide more informative diagrams.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This is a large-scale dataset with impedance and signal loss data recorded on volunteer test subjects using low-voltage alternate current sine-shaped signals. The signal frequencies are from 50 kHz to 20 MHz.
Applications: The intention of this dataset is to allow to investigate the human body as a signal propagation medium, and capture information related to how the properties of the human body (age, sex, composition etc.), the measurement locations, and the signal frequencies impact the signal loss over the human body.
Overview statistics:
Number of subjects: 30
Number of transmitter locations: 6
Number of receiver locations: 6
Number of measurement frequencies: 19
Input voltage: 1 V
Load resistance: 50 ohm and 1 megaohm
Measurement group statistics:
Height: 174.10 (7.15)
Weight: 72.85 (16.26)
BMI: 23.94 (4.70)
Body fat %: 21.53 (7.55)
Age group: 29.00 (11.25)
Male/female ratio: 50%
Included files:
experiment_protocol_description.docx - protocol used in the experiments
electrode_placement_schematic.png - schematic of placement locations
electrode_placement_photo.jpg - visualization on the experiment, on a volunteer subject
RawData - the full measurement results and experiment info sheets
all_measurements.csv - the most important results extracted to .csv
all_measurements_filtered.csv - same, but after z-score filtering
all_measurements_by_freq.csv - the most important results extracted to .csv, single frequency per row
all_measurements_by_freq_filtered.csv - same, but after z-score filtering
summary_of_subjects.csv - key statistics on the subjects from the experiment info sheets
process_json_files.py - script that creates .csv from the raw data
filter_results.py - outlier removal based on z-score
plot_sample_curves.py - visualization of a randomly selected measurement result subset
plot_measurement_group.py - visualization of the measurement group
CSV file columns:
subject_id - participant's random unique ID
experiment_id - measurement session's number for the participant
height - participant's height, cm
weight - participant's weight, kg
BMI - body mass index, computed from the valued above
body_fat_% - body fat composition, as measured by bioimpedance scales
age_group - age rounded to 10 years, e.g. 20, 30, 40 etc.
male - 1 if male, 0 if female
tx_point - transmitter point number
rx_point - receiver point number
distance - distance, in relative units, between the tx and rx points. Not scaled in terms of participant's height and limb lengths!
tx_point_fat_level - transmitter point location's average fat content metric. Not scaled for each participant individually.
rx_point_fat_level - receiver point location's average fat content metric. Not scaled for each participant individually.
total_fat_level - sum of rx and tx fat levels
bias - constant term to simplify data analytics, always equal to 1.0
CSV file columns, frequency-specific:
tx_abs_Z_... - transmitter-side impedance, as computed by the process_json_files.py script from the voltage drop
rx_gain_50_f_... - experimentally measured gain on the receiver, in dB, using 50 ohm load impedance
rx_gain_1M_f_... - experimentally measured gain on the receiver, in dB, using 1 megaohm load impedance
Acknowledgments: The dataset collection was funded by the Latvian Council of Science, project “Body-Coupled Communication for Body Area Networks”, project No. lzp-2020/1-0358.
References: For a more detailed information, see this article: J. Ormanis, V. Medvedevs, A. Sevcenko, V. Aristovs, V. Abolins, and A. Elsts. Dataset on the Human Body as a Signal Propagation Medium for Body Coupled Communication. Submitted to Elsevier Data in Brief, 2023.
Contact information: info@edi.lv
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
A Dataset which consists of the latitude and longitude information of the 29 Indian states.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4: CSV file of colon crypt bulk RNA-seq data used for GECO UMAP generation.
Facebook
TwitterThe data in the CSV file named RSF is a detailed dataset used to run the models testing relationships between movement behavior and atmospheric conditions. This dataset includes: detection date and rounded hour, status (0 or 1) showing if over-water movement occurring within nine hours was detected, number of hours since sunset, site ID of area weather station, Motus tag ID, wind speed , Temperature, precipitation, visibility, wind direction, the longitudinal component of wind speed and direction, the latitudinal component of wind speed and direction, pressure, wind speed of previous hour, temperature of previous hour, precipitation of previous hour, visibility of previous hour, wind direction of previous hour, the longitudinal component of wind speed and direction of previous hour, the latitudinal component of wind speed and direction of previous hour, pressure of previous hour, wind speed of previous night, temperature of previous night, number of precipitation hours during the previous night, visibility of previous night, wind direction of previous night, the longitudinal component of wind speed and direction of previous night, the latitudinal component of wind speed and direction of previous night, pressure of previous night, date, change in wind speed since previous hour, change in temperature since previous hour, change in precipitation since previous hour, difference in visibility since previous hour, change in pressure since previous hour, change in wind speed since previous 24 hours, change in temperature since previous 24 hours, change in precipitation since previous 24 hours, change in visibility since previous 24 hours, and change in pressure since previous 24 hours. The HMM data spreadsheet is the data used for the Hidden Markov Models portion of the paper. This dataset includes date and rounded-down hour, site ID for weather station area, the standard deviation of signal strength across all detections within one hour, rounded-down hour, number of hours since sunset, number of hours, wind speed, temperature, date, date and time of sunset, date and time of sunrise, sunset hour rounded down, sunrise hour rounded down, binary night or day based on time relative to sunset and sunrise times, sine(hours since sunset * 2 * pi / 24), cosine(hours since sunset * 2 * pi / 24), predicted state (roosting or foraging) based on HMM fit, probability that bat is roosting,probability that bat is foraging, unique identification sequence of individual bat tags,manipulated time-stamp so that all date times share the same year for data visualization, hour and minute of sunset in numeric form for data visualization, and hour and minute of sunrise in numeric form for data visualization. The detection summary data CSV combines capture data with summarized locations and times of detection of radio-tagged eastern red bats (Lasiurus borealis), silver-haired bats (Lasionycteris noctivagans), and Seminole bats (Lasiurus seminolus) captured in the Mid-Atlantic coastal region between August and October of 2019 and 2021 and provides unique dates that each bat was detected at a single tower and/or unique towers that each bat was detected at in a single night. This data set was used for plotting and visualizing individual paths of bats for the qualitative section of the paper (i.e. showing direction, pathway, movement distance, moving at all, etc.). Data contained include detection date, tag ID, tower receiver ID, tower receiver latitude, tower receiver longitude, species name, sex, age (adult or juvenile), weight (g), and forearm length (mm).
Facebook
TwitterThis section does not describe the methods of read-tv software development, which can be found in the associated manuscript from JAMIA Open (JAMIO-2020-0121.R1). This section describes the methods involved in the surgical work flow disruption data collection. A curated, PHI-free (protected health information) version of this dataset was used as a use case for this manuscript.
Observer training
Trained human factors researchers conducted each observation following the completion of observer training. The researchers were two full-time research assistants based in the department of surgery at site 3 who visited the other two sites to collect data. Human Factors experts guided and trained each observer in the identification and standardized collection of FDs. The observers were also trained in the basic components of robotic surgery in order to be able to tangibly isolate and describe such disruptive events.
Comprehensive observer training was ensured with both classroom and floor train...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The city of Austin has administered a community survey for the 2015, 2016, 2017, 2018 and 2019 years (https://data.austintexas.gov/City-Government/Community-Survey/s2py-ceb7), to “assess satisfaction with the delivery of the major City Services and to help determine priorities for the community as part of the City’s ongoing planning process.” To directly access this dataset from the city of Austin’s website, you can follow this link https://cutt.ly/VNqq5Kd. Although we downloaded the dataset analyzed in this study from the former link, given that the city of Austin is interested in continuing administering this survey, there is a chance that the data we used for this analysis and the data hosted in the city of Austin’s website may differ in the following years. Accordingly, to ensure the replication of our findings, we recommend researchers to download and analyze the dataset we employed in our analyses, which can be accessed at the following link https://github.com/democratizing-data-science/MDCOR/blob/main/Community_Survey.csv. Replication Features or Variables The community survey data has 10,684 rows and 251 columns. Of these columns, our analyses will rely on the following three indicators that are taken verbatim from the survey: “ID”, “Q25 - If there was one thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?", and “Do you own or rent your home?”
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
CSVvsVizQA
Dataset to compare question answering ability from CSV Data vs Data Visualization images.