17 datasets found

Most Popular Programming Languages 2004-2024
kaggle.com
zip
Updated Sep 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Roshan Riaz (2024). Most Popular Programming Languages 2004-2024 [Dataset]. https://www.kaggle.com/datasets/muhammadroshaanriaz/most-popular-programming-languages-2004-2024/code
Explore at:
zip(3491 bytes)Available download formats
Dataset updated
Sep 15, 2024
Authors
Muhammad Roshan Riaz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains the following columns:

Month: The date (in year-month format) when the data was recorded. Python Worldwide(%): The percentage of global popularity for Python during that month. JavaScript Worldwide(%): The percentage of global popularity for JavaScript. Java Worldwide(%): The percentage of global popularity for Java. C# Worldwide(%): The percentage of global popularity for C#. PhP Worldwide(%): The percentage of global popularity for PhP. Flutter Worldwide(%): The percentage of global popularity for Flutter. React Worldwide(%): The percentage of global popularity for React. Swift Worldwide(%): The percentage of global popularity for Swift. TypeScript Worldwide(%): The percentage of global popularity for TypeScript. Matlab Worldwide(%): The percentage of global popularity for Matlab.

Each row represents data for a particular month, starting from January 2004, tracking the popularity trends of these programming languages worldwide.
Z
Developer Expertise Dataset on JavaScript Libraries
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Montandon, João Eduardo; Silva, Luciana Lourdes; Valente, Marco Tulio (2020). Developer Expertise Dataset on JavaScript Libraries [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1484497
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
UFMG
IFMG
Authors
Montandon, João Eduardo; Silva, Luciana Lourdes; Valente, Marco Tulio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains an anonymized list of surveyed developers who provided their expertise level on three popular JavaScript libraries:

ReactJS, a library for building enriched web interfaces

MongoDB, a driver for accessing MongoDB databased

Socket.IO, a library for realtime communication
Computer language popularity
kaggle.com
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LIUYUMING1 (2025). Computer language popularity [Dataset]. https://www.kaggle.com/datasets/liuyuming1/computer-language-popularity/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
LIUYUMING1
Description
As shown in the chart, Python ranks first with a usage rate of 28.7%, demonstrating its continued advantage in the fields of data science and artificial intelligence. JavaScript follows closely at 19.3%, reflecting its widespread use in front-end and full-stack development. Traditional languages such as Java and C# still maintain a stable market share, while emerging languages like Go and Rust show significant growth potential. Overall, the popularity of programming languages is closely related to technological trends. The leading positions of Python and JavaScript indicate a shift in development focus towards data-driven and web-oriented directions. In the future, with the further development of cloud computing and artificial intelligence, the usage of emerging languages such as Go and Rust is expected to continue increasing.
t
Programming Language Ecosystem Project TU Wien
test.researchdata.tuwien.at
csv, text/markdown
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer (2024). Programming Language Ecosystem Project TU Wien [Dataset]. http://doi.org/10.70124/gnbse-ts649
Explore at:
text/markdown, csvAvailable download formats
Unique identifier
https://doi.org/10.70124/gnbse-ts649
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Time period covered
Dec 12, 2023
Area covered
Vienna
Description
About Dataset
This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.
The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.

About Data collection methodology
The dataset was created using the github repository above. As input data, three public datasets where used.
github_metadata
Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.
PYPL_survey_2004-2023
Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.
stack_overflow_developer_survey
Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.
All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above

Description of the data
The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.

The languages that are going to be considered for the project can be seen here:
- Python
- C
- C++
- Java
- C#
- JavaScript
- PHP
- SQL
- Assembly
- Scratch
- Fortran
- Go
- Kotlin
- Delphi
- Swift
- Rust
- Ruby
- R
- COBOL
- F#
- Perl
- TypeScript
- Haskell
- Scala

License
This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.
TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

Acknowledgments
Thanks go out to
- stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.
- the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.
- Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.
Stack Overflow tags
kaggle.com
zip
Updated Jan 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abid Ali Awan (2021). Stack Overflow tags [Dataset]. https://www.kaggle.com/datasets/kingabzpro/stack-overflow-tags/code
Explore at:
zip(273306 bytes)Available download formats
Dataset updated
Jan 6, 2021
Authors
Abid Ali Awan
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

How can we tell what programming languages and technologies are used by the most people? How about what languages are growing and which are shrinking, so that we can tell which are most worth investing time in?

One excellent source of data is Stack Overflow, a programming question and answer site with more than 16 million questions on programming topics. By measuring the number of questions about each technology, we can get an approximate sense of how many people are using it. We're going to use open data from the Stack Exchange Data Explorer to examine the relative popularity of languages like R, Python, Java and Javascript have changed over time.

Content

Each Stack Overflow question has a tag, which marks a question to describe its topic or technology. For instance, there's a tag for languages like R or Python, and for packages like ggplot2 or pandas.

We'll be working with a dataset with one observation for each tag in each year. The dataset includes both the number of questions asked in that tag in that year, and the total number of questions asked in that year.

Acknowledgements

DataCamp
Data from: E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects...
zenodo.org
bin, txt
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Di Meglio; Sergio Di Meglio; Valeria Pontillo; Valeria Pontillo; Coen De roover; Coen De roover; Luigi Libero Lucio Starace; Luigi Libero Lucio Starace; Sergio Di Martino; Sergio Di Martino; Ruben Opdebeeck; Ruben Opdebeeck (2025). E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects [Dataset]. http://doi.org/10.5281/zenodo.14221860
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14221860
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sergio Di Meglio; Sergio Di Meglio; Valeria Pontillo; Valeria Pontillo; Coen De roover; Coen De roover; Luigi Libero Lucio Starace; Luigi Libero Lucio Starace; Sergio Di Martino; Sergio Di Martino; Ruben Opdebeeck; Ruben Opdebeeck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT
End-to-End (E2E) testing is a comprehensive approach to validating the functionality of a software application by testing its entire workflow from the user’s perspective, ensuring that all integrated components work together as expected. It is crucial for ensuring the quality and reliability of applications, especially in the web domain, which is often bound by Service Level Agreements (SLAs). This testing involves two key activities:
Graphical User Interface (GUI) testing, which simulates user interactions through browsers, and performance testing, which evaluates system workload handling. Despite its importance, E2E testing is often neglected, and the lack of reliable datasets for Web GUI and performance testing has slowed research progress. This paper addresses these limitations by constructing E2EGit, a comprehensive dataset, cataloging non-trivial open-source web projects on GITHUB that adopt GUI or performance testing.
The dataset construction process involved analyzing over 5k non-trivial web repositories based on popular programming languages (JAVA, JAVASCRIPT TYPESCRIPT PYTHON) to identify: 1) GUI tests based on popular browser automation frameworks (SELENIUM PLAYWRIGHT, CYPRESS, PUPPETEER), 2) performance tests written with the most popular open-source tools (JMETER, LOCUST). After analysis, we identified 472 repositories using web GUI testing, with over 43,000 tests, and 84 repositories using performance testing, with 410 tests.

DATASET DESCRIPTION
The dataset is provided as an SQLite database, whose structure is illustrated in Figure 3 (in the paper), which consists of five tables, each serving a specific purpose.
The repository table contains information on 1.5 million repositories collected using the SEART tool on May 4. It includes 34 fields detailing repository characteristics. The
non_trivial_repository table is a subset of the previous one, listing repositories that passed the two filtering stages described in the pipeline. For each repository, it specifies whether it is a web repository using JAVA, JAVASCRIPT, TYPESCRIPT, or PYTHON frameworks. A repository may use multiple frameworks, with corresponding fields (e.g., is web java) set to true, and the field web dependencies listing the detected web frameworks. For Web GUI testing, the dataset includes two additional tables; gui_testing_test _details, where each row represents a test file, providing the file path, the browser automation framework used, the test engine employed, and the number of tests implemented in the file. gui_testing_repo_details, aggregating data from the previous table at the repository level. Each of the 472 repositories has a row summarizing
the number of test files using frameworks like SELENIUM or PLAYWRIGHT, test engines like JUNIT, and the total number of tests identified. For performance testing, the performance_testing_test_details table contains 410 rows, one for each test identified. Each row includes the file path, whether the test uses JMETER or LOCUST, and extracted details such as the number of thread groups, concurrent users, and requests. Notably, some fields may be absent—for instance, if external files (e.g., CSVs defining workloads) were unavailable, or in the case of Locust tests, where parameters like duration and concurrent users are specified via the command line.

To cite this article refer to this citation:

@inproceedings{di2025e2egit,
title={E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects},
author={Di Meglio, Sergio and Starace, Luigi Libero Lucio and Pontillo, Valeria and Opdebeeck, Ruben and De Roover, Coen and Di Martino, Sergio},
booktitle={2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)},
pages={10--15},
year={2025},
organization={IEEE/ACM}
}

This work has been partially supported by the Italian PNRR MUR project PE0000013-FAIR.
Salaries of developers in Ukraine
kaggle.com
zip
Updated Nov 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mysha Rysh (2022). Salaries of developers in Ukraine [Dataset]. https://www.kaggle.com/datasets/mysha1rysh/salaries-of-developers-in-ukraine
Explore at:
zip(24303 bytes)Available download formats
Dataset updated
Nov 17, 2022
Authors
Mysha Rysh
Area covered
Ukraine
Description
This data was collected by the team https://dou.ua/ . This resource is very popular in Ukraine. It provides salary statistics, shows current vacancies and publishes useful articles related to the life of an IT specialist. This dataset was taken from the public repository https://github.com/devua/csv/tree/master/salaries . This dataset will include the following data for each of the developer: salary, position (f.e. Junior, Middle), experience, city, tech (f.e C#/.NET, JavaScript, Python). I think this dataset will be useful to our community. Thank you.
Multi-language Open Source Code Identifier Dataset
kaggle.com
zip
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bharat Mane (2025). Multi-language Open Source Code Identifier Dataset [Dataset]. https://www.kaggle.com/datasets/bharatmane/multi-language-open-source-code-identifier-dataset/data
Explore at:
zip(3690401 bytes)Available download formats
Dataset updated
Jul 8, 2025
Authors
Bharat Mane
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset was created to support research and tool development in the areas of code readability, identifier naming, program comprehension, and code mining. It contains 362,886 unique identifier names (including classes, functions/methods, and variables) extracted from 21 widely used and actively maintained open-source projects.

Projects were carefully selected from four major programming language ecosystems: Java, Python, C#, and JavaScript/TypeScript. The repositories span popular libraries and frameworks in domains such as data science, web development, backend systems, dependency injection, and more. These projects are widely recognized as benchmarks in their respective communities, ensuring that the dataset represents industry best practices in naming and code style.

Context & Motivation: Good identifier naming is fundamental for code readability and maintainability, yet cross-language empirical datasets are rare. This dataset enables comparative studies of naming conventions, training and benchmarking of AI models, and reproducible research on identifier readability. It is designed to be both a large-scale resource and a realistic reflection of naming in production-quality code.

Sources: - commons-lang, guava, hibernate-orm, logging-log4j2, spring-framework - django, flask, numpy, pandas, requests - Autofac, Dapper, Hangfire, IdentityServer, NLog - react, vue, d3, lodash, express, angular, angular-cli, ngx-bootstrap, TypeScript, NestJS

Each identifier is labelled with its project, language, type, and name. We encourage use for academic research, code intelligence, machine learning, and developer education.
T
mnist
tensorflow.org
universe.roboflow.com
+4more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
h
oop
huggingface.co
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeAI (2024). oop [Dataset]. https://huggingface.co/datasets/codeai-dteam/oop
Explore at:
Dataset updated
Jul 8, 2024
Dataset authored and provided by
CodeAI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MultiOOP: A Multi-Language Object-Oriented Programming Benchmark for Large Language Models

Dataset Summary

MultiOOP is a multi-language object-oriented programming benchmark designed to establish fair and robust evaluations for intelligent code generation by large language models (LLMs). It addresses major imbalances in existing benchmarks by covering six popular programming languages: Python, PHP, C++, C#, Java, and JavaScript. The benchmark features 267 tasks per… See the full description on the dataset page: https://huggingface.co/datasets/codeai-dteam/oop.
d
Imagery and Map Services
catalog.data.gov
data.cityofnewyork.us
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). Imagery and Map Services [Dataset]. https://catalog.data.gov/dataset/imagery-and-map-services
Explore at:
Dataset updated
Nov 1, 2024
Dataset provided by
data.cityofnewyork.us
Description
The Department of Information Technology and Telecommunications, GIS Unit, has created a series of Map Tile Services for use in public web mapping & desktop applications. The link below describes the Basemap, Labels, & Aerial Photographic map services, as well as, how to utilize them in popular JavaScript web mapping libraries and desktop GIS applications. A showcase application, NYC Then&Now (https://maps.nyc.gov/then&now/) is also included on this page.

Job Descriptions 2025 – Tech & Non-Tech Roles

kaggle.com

Updated Aug 31, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Aditya Raj Srivastava (2025). Job Descriptions 2025 – Tech & Non-Tech Roles [Dataset]. https://www.kaggle.com/datasets/adityarajsrv/job-descriptions-2025-tech-and-non-tech-roles

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 31, 2025

Dataset provided by

Kaggle

Authors

Aditya Raj Srivastava

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Synthetic Job Descriptions Dataset – Tech & Non-Tech Roles (2025)

This dataset contains 1,100 synthetic job descriptions (JDs) spanning 55 diverse roles, designed to facilitate career guidance, resume building, ATS (Applicant Tracking System) simulation, and research in NLP/ML.

All job descriptions are synthetically generated based on curated references from publicly available job postings, career guides, and professional role descriptions. They are not real job postings but represent realistic expectations, responsibilities, and skills for each role.

Roles Covered

Tech Roles (Core, Popular, and Niche)

Software Engineer
Software Developer
Full Stack Developer
Backend Developer
Frontend Developer
Data Scientist
Machine Learning Engineer
AI Engineer
DevOps Engineer
Cloud Engineer
Data Analyst
Business Intelligence Analyst
QA Engineer
Test Automation Engineer
iOS Mobile App Developer
Android Mobile App Developer
Vibe Coder
UI Designer
UX Designer
Product Designer
Cybersecurity Analyst
Python Developer
Data Engineer
Network Engineer
Cloud Architect
Systems Engineer
Java Developer
.NET Developer
Web Developer
Software Tester (SDET)
Solutions Architect
Big Data Specialist
Fintech Engineer
AI Prompt Engineer
Blockchain Developer
Robotics Engineer
Javascript Developer
AR/VR Developer
IoT Engineer
Ethical Hacker
Site Reliability Engineer (SRE)
Game Developer

Non-Tech Roles (Business, Creative, Operations, and Niche)

Product Manager
Project Manager
Marketing Specialist
Digital Marketing Specialist
SEO Specialist
Content Writer
Copywriter
Business Analyst
Operations Manager
Sales Executive
Technical Writer
Market Research Analyst
Graphic Designer

Dataset Structure

Total Job Descriptions: 1,100 (20 per role)
Fields per JD:

Field	Description
JobID	Unique identifier for each job description
Title	Job role/title
ExperienceLevel	Fresher / Junior / Experienced / Lead / Senior
YearsOfExperience	Numeric range or years (e.g., 0-1, 3-5)
Skills	List of required skills (JSON array or semicolon-separated in CSV)
Responsibilities	Key responsibilities (JSON array or semicolon-separated in CSV)
Keywords	Role-specific focus areas (JSON array or semicolon-separated in CSV)

Key Features & Insights

Balanced Experience Distribution: ~50% entry-level / fresher, ~50% experienced / senior roles.
Top Skills Across Roles: Python, JavaScript, React, TensorFlow, ML/NLP, Docker, Collaboration, Problem-solving, Figma.
Keyword Trends: AI, ML, Data Analytics, Cloud, DevOps, Prompt Engineering, UX/UI.
Comprehensive Coverage: Tech roles cover core development, data, AI, cloud, security, and niche specialties, while non-tech roles cover business, creative, operations, marketing, and analytics.

Potential Use Cases

Resume Builders: Generate role-specific resumes highlighting relevant skills and responsibilities.
ATS Simulation / Scoring: Test applicant tracking systems with realistic job descriptions.
Career Analytics: Analyze trends in skills, responsibilities, and popular roles.
Machine Learning & NLP Projects: Use for text classification, skill extraction, recommendation systems, and job matching.
Educational Purposes: Ideal for learning about job role requirements across tech and non-tech domains.

File Formats

JSON: job_dataset.json – structured array of job objects.
CSV: job_dataset.csv – arrays flattened with semicolons for easy viewing in Excel or Pandas.

Licensing

Free to use, share, and adapt for research, educational, or personal projects.
Please cite the dataset in publications or projects:

Synthetic Job Descriptions Dataset (2025) – Curated & Generated by Aditya Raj Srivastava (https://www.kaggle.com/adityarajsrv)

Most demanded tech skills worldwide 2023
statista.com
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most demanded tech skills worldwide 2023 [Dataset]. https://www.statista.com/statistics/1296668/top-in-demand-tech-skills-worldwide/
Explore at:
Dataset updated
Nov 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
Worldwide
Description
In 2023, the tech skill most in demand by recruiters was web development. This was closely followed by DevOps and database software skills. Interestingly, over ** percent of recruiters were actively seeking individuals with cybersecurity skills. Not far behind, AI/Machine learning/Deep learning ranked fourth, with approximately ** percent of respondents identifying it as their most sought-after tech skill. These preferences align with the skills that developers worldwide are keen to acquire, particularly web development and AI/Machine learning/Deep learning. AI at the forefront of IT skills Since the release of ChatGPT in late 2022, demand for AI and automation skills has increased across all sectors. In 2023, ChatGPT was the leading technology skill globally according to topic consumption on Udemy Business, experiencing a massive growth of over ***** percent in global topic consumption. In the same year, over ** percent of software developers reported using AI to help write code in the development workflow, while another ** percent said they currently use it for debugging code. Different languages for different needs JavaScript and Java, commonly used for back-end and front-end web development, were the most demanded programming languages worldwide in 2022, followed by SQL and Python. By industry, JavaScript and Java hold the fort in the IT services and aviation industries, while SQL was more popular in the healthcare sector as well as the marketing and advertising industries. Python, well suited for data science applications, was more commonly used in the manufacturing, education, and energy industries.
A Personalized Activity-based Spatiotemporal Risk Mapping Approach to...
figshare.com
tiff
Updated Mar 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Li; Xuantong Wang; Hexuan Zheng; Tong Zhang (2021). A Personalized Activity-based Spatiotemporal Risk Mapping Approach to COVID-19 Pandemic [Dataset]. http://doi.org/10.6084/m9.figshare.13517105.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13517105.v1
Dataset updated
Mar 18, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jing Li; Xuantong Wang; Hexuan Zheng; Tong Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets used for this manuscript were derived from multiple sources: Denver Public Health, Esri, Google, and SafeGraph. Any reuse or redistribution of the datasets are subjected to the restrictions of the data providers: Denver Public Health, Esri, Google, and SafeGraph and should consult relevant parties for permissions.1. COVID-19 case dataset were retrieved from Denver Public Health (Link: https://storymaps.arcgis.com/stories/50dbb5e7dfb6495292b71b7d8df56d0a )2. Point of Interests (POIs) data were retrieved from Esri and SafeGraph (Link: https://coronavirus-disasterresponse.hub.arcgis.com/datasets/6c8c635b1ea94001a52bf28179d1e32b/data?selectedAttribute=naics_code) and verified with Google Places Service (Link: https://developers.google.com/maps/documentation/javascript/reference/places-service)3. The activity risk information is accessible from Texas Medical Association (TMA) (Link: https://www.texmed.org/TexasMedicineDetail.aspx?id=54216 )The datasets for risk assessment and mapping are included in a geodatabase. Per SafeGraph data sharing guidelines, raw data cannot be shared publicly. To view the content of the geodatabase, users should have installed ArcGIS Pro 2.7. The geodatabase includes the following:1. POI. Major attributes are locations, name, and daily popularity.2. Denver neighborhood with weekly COVID-19 cases and computed regional risk levels.3. Simulated four travel logs with anchor points provided. Each is a separate point layer.
h
@--> Disqusin lataaminen ei onnistu. Jos olet valvoja, katso...
hri.fi
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). @--> Disqusin lataaminen ei onnistu. Jos olet valvoja, katso vianratkaisuopasta. Please enable JavaScript to view the comments powered by Disqus. [Dataset]. https://hri.fi/data/dataset/pysakointivirheet-helsingissa
Explore at:
Dataset updated
Mar 4, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Helsingissä pysäköinninvalvonnan ja poliisin kirjaamat pysäköintivirheet tammikuusta 2014 alkaen. Aineistossa on virheen tekoaika kuukauden ja vuoden tarkkuudella, osoite, jossa virhemaksu tai huomautus on annettu, virhemaksun vaihe, virheen pääsyy sekä virheen kirjaaja, postinumero ja aluetiedot. Todellinen sijainti -ominaisuus kertoo, onko kyseessä todellinen pysäköintivirheen sijainti. Aineisto viedään tietokantaan noin puolen vuoden välein. Aineistojen esikatselu kartta.hel.fi -palvelussa: Vuosi 2023 Koordinaatisto(t): ETRS-GK25 (EPSG:3879) Rajapintapalvelujen osoitteet: WFS: https://kartta.hel.fi/ws/geoserver/avoindata/wfs?request=getCapabilities Julkaistut tasot: Pysakointivirheet Pysakointivirheet-tason ominaisuustiedot ja tietotyypit: id (int): kohteen yksilöllinen tunniste kuukausi (string): kuukausi sanallisesti vuosi (int): vuosiluku osoite (string): katunimi ja mahdollinen osoitenumero virhemaksun_vaihe (string): vaihe; huomautus tai pysäköintivirhemaksu virheen_paasyy_ja_paaluokka (string): Pysäköintivirheen syy virheen_kirjaaja (string): kuka on kirjannut virheen; pysäköinnintarkastus vai poliisi easting (int): e-koordinaatti northing (int): n-koordinaatti postinumero (string): postinumero postitoimipaikka (string): postitoimipaikka suurpiiri (string): suurpiirin nimi kunta (string): kuntanimi kunta_nro (string): kuntatunnus kaupunginosa (string): kaupunginosan nimi osa_alue (string): osa-alueen nimi todellinen_sijainti (string): ominaisuustieto arvioi onko kyseessä kohteen todellinen sijainti (Kyllä) vai likimääräinen sijainti (Ei) Vuosien 2014 - 2022 tiedostoissa on seuraavat tiedot Virheen tekokuukausi = Pysäköintivirheen tai huomautuksen kirjoituskuukausi Virheen tekovuosi = Pysäköintivirheen tai huomautuksen kirjoitusvuosi Osoite = osoite, jossa virhemaksu tai huomautus on annettu. Osoitetta ei ole välttämättä kaikissa näkyvissä, esim. käsin kirjoitetut poliisien antamat virhemaksut / huomautukset tai se voi olla puutteellinen Virhemaksun vaihe = Pysäköintivirhemaksu tai huomautus Virheen pääluokka / Pääsyy = tässä voi olla yhdestä kolmeen eri virheen luokkaa Virheen kirjaaja = Pysäköinnintarkastaja tai poliisi Kaupunginosa Lisäksi vuoden 2014-2017 paikkatietoaineistossa (SHP) on postinumeroaluetieto (y, x, postinumero, postitoimipaikka, alue, kunta, kunta_nro). Lähteenä on ollut vuoden 2015 pääkaupunkiseudun postinumeroaluejako. Aineistossa on joitakin Helsingin ulkopuolelle jääviä pisteitä. Taulukkomuotoisessa aineistossa (CSV) on joitain osoitteettomia pysäköintivirheitä; niitä ei ole paikkatietoaineistoissa (SHP). Vuosien 2018-2021 paikkatietoaineistot on geokoodattu QGIS Digitransit Geocoding -lisäosalla taulukkomuotoisesta aineistosta (ks. lisätiedot kutakin resurssia klikkaamalla) ja ne sisältävät virheitä. Vuodesta 2022 on saatavilla vain taulukkomuotoinen aineisto (CSV). Vuodesta 2023 eteenpäin uudet datat on saatavilla vain WFS-rajapinnan kautta.
p
Please enable JavaScript to view the page content.Your support ID is:...
dados.prefeitura.sp.gov.br
Updated Jul 25, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Please enable JavaScript to view the page content.Your support ID is: 17090373647339254903. Este desafio é para testar se você é um visitante humano. Audio is not supported in your browser. Digite o código que aparece na imagem submit Suporte ID: 17090373647339254903. [Dataset]. http://dados.prefeitura.sp.gov.br/dataset/base-de-contribuicao-popular-programa-de-metas-2017-2020
Explore at:
Dataset updated
Jul 25, 2017
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Please enable JavaScript to view the page content.Your support ID is: 17090373647339254903. Este desafio é para testar se você é um visitante humano. Audio is not supported in your browser. Digite o código que aparece na imagem submit Suporte ID: 17090373647339254903.
US Congress Members' Tweets
kaggle.com
zip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Congress Members' Tweets [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-congress-members-tweets
Explore at:
zip(5451625 bytes)Available download formats
Dataset updated
Dec 20, 2023
Authors
The Devastator
Area covered
United States
Description
US Congress Members' Tweets

August 2017 Tweets from US Congress Members

By Social Media Data [source]

About this dataset

In-depth Description of the Dataset

This dataset is a comprehensive compilation of tweets from members of the United States Congress, specifically focusing on the month of August in 2017. It contains a wealth of information encapsulating over one thousand account activities from various political entities. These entities include Congress representatives' personal accounts, their office accounts, campaign accounts as well as any committee and party related handles that are associated with them.

The creator behind this project, Alex Litel undertook an ambitious initiative to compile and present all daily tweets originating from both chambers (the House and Senate) using an automated process referred to as 'Tweets of Congress'. This system is programmed to systematically check Twitter at fixed intervals ensuring that every tweet within this time frame is accounted for.

In order to make these vast amounts of data manageable and easily navigable for potential users or researchers, the complete collection has been curated and presented in raw data format using Javascript Object Notation (JSON). These datasets can be found hosted on Github repositories produced daily around midnight Eastern Standard Time (EST).

Furthermore, each aspect involved in collating this dataset including its front-end portion forming the visual facade for users - along with certain mechanical aspects responsible for generating data within given repositories work harmoniously thanks to the Congressional Tweet Automator. For more insights on how each aspect functions together or individually - visit official Github repo's automation section.

Congruent with facilitating convenience for its potential users further, another 'users-filtered.JSON' dataset has been included which contains metadata pertaining to every account utilized by this project during tweet collection.

Despite offering such granulated detail about these digital interactions it's noteworthy that due to sheer size limitations there is a cutoff point where archives will stop collecting data/information making room for new incoming entries ensuring viable repository management.

Aspirants who wish to explore computational social science projects may find high value here since they can use various statistical analysis strategies like content visualization, time-series analysis, and sentiment analysis to reveal and understand underlying patterns within the tweets. Additionally, it can also be used in fields like Natural Language Processing (NLP) for various linguistic studies.

The 'Tweets of Congress' project appreciates contributions from John Otander's Pixyll theme which has been used extensively in building the front-end of the site. Furthermore much owed credit goes to the 'unitedstates/congress-legislators' project which greatly assisted in procuring data that aided creation amidst a wealth of others who have contributed.

Finally, it is vital to mention that this dataset comes under MIT license permitting any person obtaining

How to use the dataset

Exploratory Data Analysis:

Start with doing a basic exploratory data analysis (EDA) to find trends, patterns and outliers in the Tweet texts.

Analyze tweet lengths: Check if there is any noticeable trend between tweets from different members.

Examine tweet timings: Are most tweets sent during work hours or is there significant activity outside normal business hours?

Delve into the frequency of hashtags/mentions (): Identify the ratio or percentage of tweets that include other users’ handles or hashtags — this could suggest whether Congress members are conversing with constituents via Twitter versus broadcasting messages.

Sentiment analysis: Use NLP tools to perform sentiment analysis on Tweet text to gauge overall sentiments being expressed by congressmen over time.

Social Network Analysis:

Social Network Analysis (SNA) is a popular approach for identifying influential individuals in social networks like Twitter.

Graph theory techniques could be employed in identifying clusters and communities among Congress members based on who they mention in their tweets (indicating possible relationships between users).

Centrality measures can help identify influential Twitter handles that serve as important information hubs or bridges in communication paths.

There’s also potential for studying Congressional relationships through frequency of communications amongst each other, which could demonstrate alliances.

...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Muhammad Roshan Riaz (2024). Most Popular Programming Languages 2004-2024 [Dataset]. https://www.kaggle.com/datasets/muhammadroshaanriaz/most-popular-programming-languages-2004-2024/code

Most Popular Programming Languages

Explore at:

zip(3491 bytes)Available download formats

Dataset updated

Sep 15, 2024

Authors

Muhammad Roshan Riaz

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset contains the following columns:

Month: The date (in year-month format) when the data was recorded. Python Worldwide(%): The percentage of global popularity for Python during that month. JavaScript Worldwide(%): The percentage of global popularity for JavaScript. Java Worldwide(%): The percentage of global popularity for Java. C# Worldwide(%): The percentage of global popularity for C#. PhP Worldwide(%): The percentage of global popularity for PhP. Flutter Worldwide(%): The percentage of global popularity for Flutter. React Worldwide(%): The percentage of global popularity for React. Swift Worldwide(%): The percentage of global popularity for Swift. TypeScript Worldwide(%): The percentage of global popularity for TypeScript. Matlab Worldwide(%): The percentage of global popularity for Matlab.

Each row represents data for a particular month, starting from January 2004, tracking the popularity trends of these programming languages worldwide.

Clear search

Close search

Google apps

Main menu

Most Popular Programming Languages 2004-2024

Developer Expertise Dataset on JavaScript Libraries

Computer language popularity

Programming Language Ecosystem Project TU Wien

About Dataset

About Data collection methodology

github_metadata

PYPL_survey_2004-2023

stack_overflow_developer_survey

Description of the data

License

Acknowledgments

Stack Overflow tags

Context

Content

Acknowledgements

Data from: E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects...

Salaries of developers in Ukraine

Multi-language Open Source Code Identifier Dataset

mnist

oop

Imagery and Map Services

Job Descriptions 2025 – Tech & Non-Tech Roles

Synthetic Job Descriptions Dataset – Tech & Non-Tech Roles (2025)

Roles Covered

Dataset Structure

Key Features & Insights

Potential Use Cases

File Formats

Licensing

Most demanded tech skills worldwide 2023

A Personalized Activity-based Spatiotemporal Risk Mapping Approach to...

@--> Disqusin lataaminen ei onnistu. Jos olet valvoja, katso...

Please enable JavaScript to view the page content.Your support ID is:...

US Congress Members' Tweets

US Congress Members' Tweets

August 2017 Tweets from US Congress Members

About this dataset

In-depth Description of the Dataset

How to use the dataset

Exploratory Data Analysis:

Social Network Analysis:

...

Most Popular Programming Languages 2004-2024

Most Popular Programming Languages