100+ datasets found

f
Data_Sheet_1_Advanced large language models and visualization tools for data...
frontiersin.figshare.com
txt
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1418006.s001
Dataset updated
Aug 8, 2024
Dataset provided by
Frontiers
Authors
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.
SHARP - Shape Analysis Research Project
catalog.data.gov
s.cnmilf.com
+1more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). SHARP - Shape Analysis Research Project [Dataset]. https://catalog.data.gov/dataset/sharp-shape-analysis-research-project-9c966
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
We have applied 3D shape-based retrieval to various disciplines such as computer vision, CAD/CAM, computer graphics, molecular biology and 3D anthropometry. We have organized two workshops on 3D shape retrieval and two shape retrieval contests. We also have developed 3D shape benchmarks, performance evaluation software and prototype 3D retrieval systems. We have developed a robotic map quality assessment tool in collaboration with MEL) We also have developed different shape descriptors to represent 3D human bodies and heads efficiently and other work related to 3D anthropometry. Finally, we also have done some in a Structural Bioinformatics, Bio-Image analysis and retrieval.
Monday Coffee SQL Data Analysis Project
kaggle.com
Updated Nov 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Najir 0123 (2024). Monday Coffee SQL Data Analysis Project [Dataset]. https://www.kaggle.com/datasets/najir0123/monday-coffee-sql-data-analysis-project/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Najir 0123
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Najir 0123

Released under MIT

Contents
student data analysis
kaggle.com
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maira javeed (2023). student data analysis [Dataset]. https://www.kaggle.com/datasets/mairajaveed/student-data-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
maira javeed
Description
In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.

**********Key Objectives:*********

Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.

Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.

Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.

Dataset Details:

The dataset used in this analysis contains information about students, including their age, gender, parental education, lunch type, and test scores in subjects like mathematics, reading, and writing.

Analysis Highlights:

We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.

By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.

Why This Matters:

Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.

Acknowledgments:

We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.

Please Note:

This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.
D
Space Data Analytics Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Space Data Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/space-data-analytics-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Space Data Analytics Market Outlook

The global space data analytics market size was valued at approximately $3.2 billion in 2023 and is projected to reach around $11.8 billion by 2032, reflecting a robust CAGR of 15.6% over the forecast period. Driven by the increasing deployment of satellites and growing advancements in machine learning and data analytics technologies, the market is poised for substantial growth. The convergence of these technologies allows for more efficient data collection, processing, and utilization, which fuels the demand for space data analytics across various sectors.

The primary growth factor for the space data analytics market is the exponential increase in satellite deployments. Governments and private entities are launching satellites for diverse purposes such as communication, navigation, earth observation, and scientific research. This surge in satellite launches generates vast amounts of data that require sophisticated analytical tools to process and interpret. Consequently, the need for advanced analytics solutions to convert raw satellite data into actionable insights is driving the market forward. Additionally, advancements in artificial intelligence (AI) and machine learning (ML) are enhancing the capabilities of space data analytics, making them more accurate and efficient.

Another significant growth driver is the escalating demand for real-time data and analytics in various industries. Sectors such as agriculture, defense, and environmental monitoring increasingly rely on satellite data for applications like precision farming, border surveillance, and climate change assessment. The ability to obtain real-time data from satellites and analyze it promptly allows organizations to make informed decisions swiftly, thereby improving operational efficiency and outcomes. Furthermore, the growing awareness about the advantages of space data analytics in proactive decision-making is expanding its adoption across multiple sectors.

Moreover, international collaborations and government initiatives aimed at space exploration and satellite launches are propelling the market. Many countries are investing heavily in space missions and satellite projects, creating a fertile ground for the space data analytics market to thrive. These investments are accompanied by supportive regulatory frameworks and funding for research and development, further encouraging innovation and growth in the sector. Additionally, the commercialization of space activities and the emergence of private space enterprises are opening new avenues for market expansion.

Artificial Intelligence in Space is revolutionizing the way we approach space exploration and data analysis. By integrating AI technologies with space missions, scientists and researchers can process vast amounts of data more efficiently and accurately. This integration allows for real-time decision-making and predictive analytics, which are crucial for successful space missions. AI's ability to learn and adapt makes it an invaluable tool for navigating the complex and unpredictable environment of space. As AI continues to evolve, its applications in space exploration are expected to expand, offering new possibilities for understanding our universe and enhancing the capabilities of space data analytics.

From a regional perspective, North America holds the largest market share due to the presence of leading space agencies, like NASA, and prominent private space companies, such as SpaceX and Blue Origin. Europe follows closely, driven by robust investments in space research and development by the European Space Agency (ESA). The Asia Pacific region is expected to witness the fastest growth rate, attributed to increasing satellite launches by countries like China and India, alongside growing investments in space technology and analytics within the region.

Component Analysis

The space data analytics market can be segmented by component into software, hardware, and services. The software segment commands a significant share of the market due to the development of sophisticated analytics tools and platforms. These software solutions are crucial for processing and interpreting the vast amounts of data collected from satellites. Advanced algorithms and AI-powered analytics enable users to extract meaningful insights from raw data, driving the adoption of these solutions across various sectors. The continuous innovation in software capabilities, such as enhanced visualization t
EMEC Wildlife Data Analysis Project - Cleansed Data
dtechtive.com
find.data.gov.scot
zip
Updated Jan 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marine Scotland (2020). EMEC Wildlife Data Analysis Project - Cleansed Data [Dataset]. https://dtechtive.com/datasets/19784
Explore at:
zip(4.8759 MB), zip(1.2569 MB), zip(0.0296 MB), zip(0.0665 MB)Available download formats
Dataset updated
Jan 7, 2020
Dataset provided by
Marine Directoratehttps://www.gov.scot/about/how-government-is-run/directorates/marine-scotland/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
European Marine Energy Centre (EMEC) wildlife observation data has been collected from the marine renewable test sites at Billia Croo and Fall of Warness in Orkney. These data have been processed and cleansed and used in reports prepared by EMEC, for SNH and Marine Scotland.
d
Water Analysis Project Summary Sheet Updated 4-14-2023
catalog.data.gov
data.wa.gov
+1more
Updated Apr 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.wa.gov (2023). Water Analysis Project Summary Sheet Updated 4-14-2023 [Dataset]. https://catalog.data.gov/dataset/water-analysis-project-summary-sheet-2-02-2023
Explore at:
Dataset updated
Apr 22, 2023
Dataset provided by
data.wa.gov
Description
Provides access to content and methods developed for the 2022 State of Salmon Water Analysis Project as produced under RCO contract #22-1587 with SBGH-Partners, dated April 18, 2022.
m
An experiment on the reliability analysis of megaproject sustainability
data.mendeley.com
narcis.nl
Updated Jan 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhen Chen (2021). An experiment on the reliability analysis of megaproject sustainability [Dataset]. http://doi.org/10.17632/gy2h2ybtjg.1
Explore at:
Unique identifier
https://doi.org/10.17632/gy2h2ybtjg.1
Dataset updated
Jan 5, 2021
Authors
Zhen Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hypothesis: The reliability can be adopted to quantitatively measure the sustainability of mega-projects.

Presentation: This dataset shows two scenario based examples to establish an initial reliability assessment of megaproject sustainability. Data were gathered from the author’s assumption with regard to assumed differences between scenarios A and B. There are two sheets in this Microsoft Excel file, including a comparison between two scenarios by using a Fault Tree Analysis model, and a correlation analysis between reliability and unavailability.

Notable findings: It has been found from this exploratory experiment that the reliability can be used to quantitatively measure megaproject sustainability, and there is a negative correlation between reliability and unavailability among 11 related events in association with sustainability goals in the life-cycle of megaproject.

Interpretation: Results from data analysis by using the two sheets can be useful to inform decision making on megaproject sustainability. For example, the reliability to achieve sustainability goals can be enhanced by decrease the unavailability or the failure at individual work stages in megaproject delivery.

Implication: This dataset file can be used to perform reliability analysis in other experiment to access megaproject sustainability.
student-performance-data
kaggle.com
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Azam (2025). student-performance-data [Dataset]. http://doi.org/10.34740/kaggle/dsv/12160820
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/12160820
Dataset updated
Jun 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad Azam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Student Performance Data

This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.

The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.

Highlights:

Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects

File Format:

Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record

Applications

Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization
A
‘Your Voice Your Choice Project Ideas’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Your Voice Your Choice Project Ideas’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-your-voice-your-choice-project-ideas-cd5f/255c66f6/?iid=004-172&v=presentation
Explore at:
Dataset updated
Jan 26, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Your Voice Your Choice Project Ideas’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/ec9400b9-59fd-4a96-8994-9867942296ea on 26 January 2022.

--- Dataset description provided by original source is as follows ---

A program of Seattle Department of Neighborhoods, this is a list of street or park improvement ideas submitted by community members as a part of Your Voice Your Choice Participatory Budgeting. Ideas were vetted by project development teams made up of community members who volunteered to evaluate each project. Seattle Parks and Recreation and Seattle Department of Transportation also reviewed the projects for feasibility. The results and evaluation, along with location are provided in the set. The list will be finalized and ready for the community to vote (by council district) beginning June 3.

--- Original source retains full ownership of the source dataset ---
r
HMP Data Analysis and Coordination Center
rrid.site
dknet.org
+2more
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). HMP Data Analysis and Coordination Center [Dataset]. http://identifiers.org/RRID:SCR_004919/resolver?q=*&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004919 https://identifiers.org/RRID:SCR_004919/resolver?q=*&i=rrid
Dataset updated
Jul 27, 2025
Description
Common repository for diverse human microbiome datsets and minimum reporting standards for Common Fund Human Microbiome Project.
D
Data Analytics Market Report
marketreportanalytics.com
doc, pdf, ppt
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Data Analytics Market Report [Dataset]. https://www.marketreportanalytics.com/reports/data-analytics-market-11041
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global data analytics market, valued at $262.08 billion in 2025, is experiencing robust growth, projected to expand at a Compound Annual Growth Rate (CAGR) of 13.63% from 2025 to 2033. This surge is driven by several key factors. The exponential increase in data volume generated by businesses and individuals necessitates sophisticated analytical tools to extract meaningful insights. Furthermore, the rising adoption of cloud-based solutions offers scalability, cost-effectiveness, and accessibility, fueling market expansion. Advanced analytics techniques like artificial intelligence (AI) and machine learning (ML) are enhancing predictive capabilities, enabling businesses to make data-driven decisions and gain a competitive edge. The increasing focus on data security and privacy regulations also presents opportunities for specialized data analytics solutions focused on compliance and risk management. Significant market segments include cloud deployment, which dominates due to its flexibility and scalability, and services, which are crucial for implementation, maintenance, and ongoing support. Leading companies like Microsoft, Amazon, and Salesforce are strategically investing in R&D and acquisitions to consolidate their market positions and extend their offerings. Despite this positive outlook, the market faces some challenges. High implementation costs and the need for skilled professionals can hinder widespread adoption, especially among smaller businesses. Data integration complexities and concerns about data quality can also impede the successful implementation of data analytics projects. Competition among established players and emerging startups is intensifying, driving the need for continuous innovation and adaptation. However, the long-term outlook for the data analytics market remains optimistic, fueled by ongoing technological advancements, increasing data volumes, and a growing awareness of the value of data-driven decision-making across various industries. The market's regional segmentation reveals strong growth across North America and Europe, driven by early adoption and strong technological infrastructure, while APAC is poised for significant expansion in the coming years.
d
Data from: Snake River Plain Geothermal Play Fairway Analysis Project Active...
catalog.data.gov
data.openei.org
+2more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boise State University (2025). Snake River Plain Geothermal Play Fairway Analysis Project Active Source Seismic Data [Dataset]. https://catalog.data.gov/dataset/snake-river-plain-geothermal-play-fairway-analysis-project-active-source-seismic-data-ad84f
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Boise State University
Area covered
Snake River Plain, Snake River
Description
This archive contains seismic shot field records for 10 profiles located in Camas Prairie, Idaho. The eight numbered .sgy files were acquired using a seismic land streamer system with an accelerated weight drop source and 72 geophones. These 10-Hz geophones were mounted on base plates and dragged behind the seismic source. Shots were acquired every 4 meters along the length of lines 500West, 550 West, 600West, 700West, 800West, 900West, 200South and 200North. The objective was to map stratigraphy and structures related to geothermal fluid flow in the upper few hundred meters. A readme file is included with descriptions of individual files. The lines names refer to to roads which are numbered relative to the distance from the county seat (the town of Fairfield) along the the main highways. For example, 500 West implies that this north-south street crosses the main road 5 miles to the west of town. The included geologic, topographic, and aerial maps show the labeled seismic lines, while the regional map shows only the line geometry and regional faulting.
ghtorrent-projects Dataset
zenodo.org
data.niaid.nih.gov
bin, txt
Updated Jul 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marios Papachristou; Marios Papachristou (2021). ghtorrent-projects Dataset [Dataset]. http://doi.org/10.5281/zenodo.5111043
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5111043
Dataset updated
Jul 17, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marios Papachristou; Marios Papachristou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A hypergraph dataset mined from the GHTorrent project is presented. The dataset contains two files

1. project_members.txt: Contains GitHub projects with at least 2 contributors and the corresponding contributors (as a hyperedge). The format of the data is:

2. num_followers.txt: Contains all GitHub users and their number of followers.

The artifact also contains the SQL queries used to obtain the data from GHTorrent (schema).
Project Finance Deals
lseg.com
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LSEG (2024). Project Finance Deals [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/deals-data/capital-raising-new-issuance/project-finance-deals
Explore at:
csv,pdf,python,text,user interfaceAvailable download formats
Dataset updated
Nov 25, 2024
Dataset provided by
London Stock Exchange Grouphttp://www.londonstockexchangegroup.com/
Authors
LSEG
License
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Description
Explore LSEG's Project Finance Deals Data, providing loan information and league tables to the global deal-making community.
Data from: The iratebirds Citizen Science Project: a Dataset on Birds’...
figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Haukka; Aleksi Lehikoinen; Stefano Mammola; William Morris; Andrea Santangeli (2023). The iratebirds Citizen Science Project: a Dataset on Birds’ Visual Aesthetic Attractiveness to Humans [Dataset]. http://doi.org/10.6084/m9.figshare.20170082.v2
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20170082.v2
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anna Haukka; Aleksi Lehikoinen; Stefano Mammola; William Morris; Andrea Santangeli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The iratebirds database contains comprehensive visual aesthetic attractiveness, as seen by humans, data for bird taxonomic units (following the eBird/Clements integrated checklist v. 2019). The data were collected with the iratebirds.app -website citizen science project, where users rated the appearance of birds on a linear scale from 1-10. The rating were based on photographs of the birds available from the Macaulay Library database. Each rating score of a bird species or subspecies is based on several photographs of the same bird species. The application code is openly available on GitHub: https://github.com/luomus/iratebirds The application was spread during August 2020 – April 2021, globally, to as wide audiences as possible using social media, traditional media, collaborators and email-lists.

The iratebirds database is based on 408 207 ratings from 6 212 users. It consists of raw visual aesthetic attractiveness rating data as well as complementary data from an online survey that sourced demographic information from a subset of 2 785 users who scored the birds. The online survey gives information on these users’ birding skills, nature connectedness, profession, home country, age and gender. On top of these, the data scores for birds’ visual aesthetic attractiveness to humans have been modelled with hierarchical models to obtain overall average scores for the bird species and subspecies. More details on the data are found in this file’s section “Methodological information” as well as in the publication Haukka, A. et al. (2023), The iratebirds Citizen Science Project: a Dataset on Birds’ Visual Aesthetic Attractiveness to Humans, Scientific Data. The full database "iratebirds_raw_data_taxonomy_photoinfo_ratings_survey_251022.csv" includes all the data related to the photographs scored (e.g. place and location of the photograph, and its quality), the species and subspecies names (following the eBird/Clements integrated checklist v. 2019), the raw scores made by the users, details of the users (e.g. language used), and internal user ID, and for the users who took the online survey, also detailed information about their demography, e.g. home country and other information related to their knowledge of and connection to nature and birds. The modeled rating scores database "iratebirds_final_predictions_average_fullmodel_subsetmodel_151122.csv" includes visual aesthetic attractiveness of birds, as perceived by humans, calculated in three different ways. The most appropiate score can be chosen by the user according to the specific research needs, but in general we recommend using the scores from the full model (ii). The three different measures are i) raw visual aesthetic attractiveness for each bird species (or subspecies), ii) full model: visual aesthetic attractiveness corrected for language group of the scorer and the quality of the photo scored, iii) subset model: visual aesthetic attractiveness corrected as in ii) plus other user specific factors (related to bird and nature knowlegde and connections, home country, age. and gender). The file also gives information on how many photos were used for scoring each bird and how many users have scored the species. The latter subset model iii) represents only a subset of all the species. The data on visual aesthetic attractiveness are also available at the species and the sex within-species level, for the sexually dichromatic species, in the file "iratebirds_pred_ratings_species_and_sex_level_120123.csv".

All database files are given both as .csv- and .xlsx -files. The data and code to reproduce the analyses, figures and tables presented in Haukka et al. 2023 The iratebirds citizen science project: a dataset of birds’ visual aesthetic attractiveness to humans (Scientific Data doi: https://doi.org/10.1038/s41597-023-02169-0) are included in the 'iratebirds_raw_data_taxonomy_photoinfo_ratings_survey_251022.csv' and 'Haukka_et_al_Scientific_Data_modelling.R','Haukka_et_al_Scientific_Data_Figure.R' and 'Haukka_et_al_Scientific_Data_Tables.R' -files. Detailed information on dataprosessing and models can be found in the publication Haukka et al. 2023 The iratebirds Citizen Science Project: a Dataset on Birds’ Visual Aesthetic Attractiveness to Humans, Scientific Data doi: https://doi.org/10.1038/s41597-023-02169-0)
A
‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’...
analyst-2.ai
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-porotomo-project-brady-s-geothermal-field-subtask-3-5-gps-data-analysis-33f0/5075b4bf/?iid=002-865&v=presentation
Explore at:
Dataset updated
Jan 26, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/c6e77e20-1cab-4053-894a-b0edcb1df117 on 26 January 2022.

--- Dataset description provided by original source is as follows ---

Links to GPS RINEX data not previously reported, plus links to station web pages, which include most up-to-date time-series

--- Original source retains full ownership of the source dataset ---
Portal Project Teaching Database
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morgan Ernest; James Brown; Thomas Valone; Ethan P. White (2023). Portal Project Teaching Database [Dataset]. http://doi.org/10.6084/m9.figshare.1314459.v10
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1314459.v10
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Morgan Ernest; James Brown; Thomas Valone; Ethan P. White
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Portal Project Teaching Database is a simplified version of the Portal Project Database designed for teaching. It provides a real world example of life-history, population, and ecological data, with sufficient complexity to teach many aspects of data analysis and management, but with many complexities removed to allow students to focus on the core ideas and skills being taught. The database is currently available in csv, json, and sqlite. This database is not designed for research as it intentionally removes some of the real-world complexities. The original database is published at Ecological Archives(http://esapubs.org/archive/ecol/E090/118/) and this version of the database should be used for research purposes. The Python code used for converting the original database to this teach version is included as 'create_portal_teach_dataset.py'. Suggested changes or additions to this dataset can be requested or contributed in the project GitHub repository(https://github.com/weecology/portal-teachingdb).
m
Composed Encrypted Malicious Traffic Dataset for machine learning based...
data.mendeley.com
Updated Oct 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihao Wang (2021). Composed Encrypted Malicious Traffic Dataset for machine learning based encrypted malicious traffic analysis. [Dataset]. http://doi.org/10.17632/ztyk4h3v6s.2
Explore at:
Unique identifier
https://doi.org/10.17632/ztyk4h3v6s.2
Dataset updated
Oct 12, 2021
Authors
Zihao Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.

Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.
A
‘Capital Project Detail Data - Milestones’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Capital Project Detail Data - Milestones’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-capital-project-detail-data-milestones-05cf/59cf11cb/?iid=002-390&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Capital Project Detail Data - Milestones’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/fefc93a0-7cf1-4b24-be55-b39765cc9113 on 13 February 2022.

--- Dataset description provided by original source is as follows ---

This dataset contains capital commitment plan data by managing agency, project identification number and project schedules. The dataset is updated three times a year during the Preliminary, Executive and Adopted Capital Commitment Plans.

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001

Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.3389/feduc.2024.1418006.s001

Dataset updated

Aug 8, 2024

Dataset provided by

Frontiers

Authors

Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_Advanced large language models and visualization tools for data...

SHARP - Shape Analysis Research Project

Monday Coffee SQL Data Analysis Project

Dataset

Contents

student data analysis

Space Data Analytics Market Report | Global Forecast From 2025 To 2033

Space Data Analytics Market Outlook

Component Analysis

EMEC Wildlife Data Analysis Project - Cleansed Data

Water Analysis Project Summary Sheet Updated 4-14-2023

An experiment on the reliability analysis of megaproject sustainability

student-performance-data

‘Your Voice Your Choice Project Ideas’ analyzed by Analyst-2

HMP Data Analysis and Coordination Center

Data Analytics Market Report

Data from: Snake River Plain Geothermal Play Fairway Analysis Project Active...

ghtorrent-projects Dataset

Project Finance Deals

Data from: The iratebirds Citizen Science Project: a Dataset on Birds’...

‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’...

Portal Project Teaching Database

Composed Encrypted Malicious Traffic Dataset for machine learning based...

‘Capital Project Detail Data - Milestones’ analyzed by Analyst-2

Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv