5 datasets found

Twitter Graph Example v2 43
kaggle.com
zip
Updated Jun 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Weiß (2022). Twitter Graph Example v2 43 [Dataset]. https://www.kaggle.com/datasets/weissmedia/twitter-graph-example-v2-43
Explore at:
zip(17943518 bytes)Available download formats
Dataset updated
Jun 29, 2022
Authors
Mathias Weiß
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.

Twitter Graph

Show data from your personal Twitter account

The Graph Your Network application inserts your Twitter activity into Neo4j.

https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">

Content

~10 MB of graphs data (CSV)

43.325 node labels - Hashtag - Link - Me - Source - Tweet - User

57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING
Neo4j open measurment
kaggle.com
zip
Updated Feb 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom Nijhof-Verhees (2023). Neo4j open measurment [Dataset]. https://www.kaggle.com/datasets/wagenrace/neo4j-open-measurment
Explore at:
zip(29854808766 bytes)Available download formats
Dataset updated
Feb 15, 2023
Authors
Tom Nijhof-Verhees
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Kickstart a chemical graph database

I have spent some time scrapping and shaping PubChem data into a Neo4j graph database. The process took a lot of time, mainly downloading, and loading it into Neo4j. The whole process took weeks. If you want to build your own I will show you how to download mine and set it up in less than an hour (most of the time you’ll just have to wait). The process of how this dataset is created is described in the following blogs: - https://medium.com/@nijhof.dns/exploring-neodash-for-197m-chemical-full-text-graph-e3baed9615b8 - https://medium.com/neo4j/combining-3-biochemical-datasets-in-a-graph-database-8e9aafbb5788 - https://medium.com/p/d9ee9779dfbe

What do you get?

The full database is a merge of 3 datasets, PubChem (compounds + synonyms), NCI60 (GI50), and ChEMBL (cell lines). It contains 6 nodes of interest: ● Compound: This is related to a compound of PubChem. It has 1 property. ○ pubChemCompId: The id within pubchem. So “compound:cid162366967” links to https://pubchem.ncbi.nlm.nih.gov/compound/162366967. This number can be used with both PubChem RDF and PUG. ● Synonym: A name found in the literature. This name can refer to zero, one, or more compounds. This helps find relations between natural language names and absolute compounds they are related to. ○ Name: Natural language name. Can contain letters, spaces, numbers, and any other Unicode character. ○ pubChemSynId: PubChem synonym id as used within the RDF ● CellLine: These are the ChEMBL cell lines. They hold a lot of information. ○ Name: The name of the cell line. ○ Uri: A unique URI for every element within the ChEMBL RDF. ○ cellosaurusId: The id to connect it to the Cellosaurus dataset. This is one of the most extensive cell line datasets out there. ● Measurement: A measurement you can do within a biomedical experiment. Currently, only GI50 (the concentration needed for Growth Inhibition of 50%) is added. ○ Name: Name of the measurement. ● Condition: A single condition of an experiment. A condition is part of an experiment. Examples are: an individual of the control group, a sample with drug A, or a sample with more CO2 ● Experiment: A collection of multiple conditions all done at the same time with the same bias. Meaning we assume all uncontrolled variables are the same. ○ Name: Name of experiment.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F442733%2F7dd804811e105390dfe20bb5cd1a68c0%2FUntitled%20graph.png?generation=1680113457794452&alt=media" alt="">

Overview of the graph design

How do download it Warning, you need 120 GB of free memory. The compressed file you download is already 30 GB. The uncompressed file is 30 GB. The database afterward is 60 GB. 60 GB is only for temporary files, the other 60 is for the database. If you do this on an HDD hard disk it will be slow.

If you load this into Neo4j desktop as a local database (like I do) it will scream and yell at you, just ignore this. We are pushing it far further than it is designed for, but it will still work.

Download the file

Go to this Kaggle dataset and download the dump file. Unzip the file, then delete the zipped file. This part needs 60 GB but only takes 30 by the end of it. Create a database Open the Neo4j desktop app, and click “Reveal files in File Explorer”. Move the .dump you downloaded into this folder.

Click on the ... behind the .dump file and click Create new DBMS from dump. This database is a dump from Neo4j V4, so your database also needs to be V4.x.x!

It will now create the database. This will take a long time, it might even say it has timed out. Do not believe this lie! In the background, it is still running. Every time you start it, it will time out. Just let it run and press start later again. The second time it will be started up directly.

Every time I start it up I get the timed-out error. After waiting 10 minutes and clicking start again the database, and with it, more than 200 million nodes, is ready. And you are done! Good luck and let me know what you build with it
Z
Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...
data.niaid.nih.gov
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V. (2024). Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and KDE [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_400614
Explore at:
Dataset updated
Aug 3, 2024
Dataset provided by
Ryerson University
Authors
Sadat, Mefta; Bener, Ayse Basar; Miranskyy, Andriy V.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects.

File Descriptions

apache.csv - Apache Defect Rediscovery dataset

eclipse.csv - Eclipse Defect Rediscovery dataset

kde.csv - KDE Defect Rediscovery dataset

apache.relations.csv - Inter-relations of rediscovered defects of Apache

eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse

kde.relations.csv - Inter-relations of rediscovered defects of KDE

create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping

create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files

rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database

neo4j_examples.txt - Sample Neo4j queries

mysql_examples.txt - Sample MySQL queries

rediscovery_eclipse_6325.png - Output of Neo4j example #1

distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project
Z
Dataset used for "A Recommender System of Buggy App Checkers for App Store...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jun 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
Explore at:
Dataset updated
Jun 28, 2021
Dataset provided by
University of Lille / Inria
Authors
Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

Dataset Stats Some stats about the datasets:

D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

Additional stats about the datasets are available here.

Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

USES_PERMISSION relationships between APP and PERMISSION nodes

HAS_REVIEW between APP and USER_REVIEW nodes

HAS_TOPIC between USER_REVIEW and TOPIC nodes

BELONGS_TO_CATEGORY between APP and CATEGORY nodes

BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

Dataset Files Info

Neo4j 2.0 Databases

googlePlayDB1-Jan2014_neo4j_2_0.rar

googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

Neo4j 3.5 Databases

googlePlayDB1-Jan2014_neo4j_3_5_28.rar

googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide. First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
f
DataSheet1_Threat modelling in Internet of Things (IoT) environments using...
frontiersin.figshare.com
zip
Updated May 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marwa Salayma (2024). DataSheet1_Threat modelling in Internet of Things (IoT) environments using dynamic attack graphs.ZIP [Dataset]. http://doi.org/10.3389/friot.2024.1306465.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/friot.2024.1306465.s001
Dataset updated
May 30, 2024
Dataset provided by
Frontiers
Authors
Marwa Salayma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This work presents a threat modelling approach to represent changes to the attack paths through an Internet of Things (IoT) environment when the environment changes dynamically, that is, when new devices are added or removed from the system or when whole sub-systems join or leave. The proposed approach investigates the propagation of threats using attack graphs, a popular attack modelling method. However, traditional attack-graph approaches have been applied in static environments that do not continuously change, such as enterprise networks, leading to static and usually very large attack graphs. In contrast, IoT environments are often characterised by dynamic change and interconnections; different topologies for different systems may interconnect with each other dynamically and outside the operator’s control. Such new interconnections lead to changes in the reachability amongst devices according to which their corresponding attack graphs change. This requires dynamic topology and attack graphs for threat and risk analysis. This article introduces an example scenario based on healthcare systems to motivate the work and illustrate the proposed approach. The proposed approach is implemented using a graph database management tool (GDBM), Neo4j, which is a popular tool for mapping, visualising, and querying the graphs of highly connected data. It is efficient in providing a rapid threat modelling mechanism, making it suitable for capturing security changes in the dynamic IoT environment. Our results show that our developed threat modelling approach copes with dynamic system changes that may occur in IoT environments and enables identifying attack paths, whilst allowing for system dynamics. The developed dynamic topology and attack graphs can cope with the changes in the IoT environment efficiently and rapidly by maintaining their associated graphs.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mathias Weiß (2022). Twitter Graph Example v2 43 [Dataset]. https://www.kaggle.com/datasets/weissmedia/twitter-graph-example-v2-43

Twitter Graph Example v2 43

Explore at:

zip(17943518 bytes)Available download formats

Dataset updated

Jun 29, 2022

Authors

Mathias Weiß

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.

Twitter Graph

Show data from your personal Twitter account

The Graph Your Network application inserts your Twitter activity into Neo4j.

https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">

Content

~10 MB of graphs data (CSV)

43.325 node labels - Hashtag - Link - Me - Source - Tweet - User

57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING

Clear search

Close search

Google apps

Main menu

Twitter Graph Example v2 43

Twitter Graph

Content

Neo4j open measurment

Kickstart a chemical graph database

What do you get?

Overview of the graph design

Download the file

Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...

Dataset used for "A Recommender System of Buggy App Checkers for App Store...

DataSheet1_Threat modelling in Internet of Things (IoT) environments using...

Twitter Graph Example v2 43

Twitter Graph

Content