Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.
Show data from your personal Twitter account
The Graph Your Network application inserts your Twitter activity into Neo4j.
https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">
~10 MB of graphs data (CSV)
43.325 node labels - Hashtag - Link - Me - Source - Tweet - User
57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-spee ...
Facebook
TwitterSalmonella pangenome graph and variant call data for 539,283 genomesDescription:Salmonella enterica causes human disease and decreases agricultural production. The overall goals of this project is to generate a large database of S. enterica variants with 539,283 samples and 236,069 features for applications in machine learning and genomics. We transformed single nucleotide polymorphism (SNP) data into reduced dimensional representations which are tolerant of missing data based on disentangled variational autoencoders. TFRecord files were made with custom Python scripts that parsed the variant call formats (VCF) into sparse tensors and combined them with the Salmonella In Silico Typing Resource (SISTR) serotype data.The data directory contains:The tar file of TFRecords: tfrecords.tar (103 GB). The TFRecords are organized first by how they were genotyped. mpileup records were created with Mpileup, and the gvg records were created with graph variant calling. In each of these directories batches of ~10,000 sequence reads named Sra10k_XX.tfrecord.gz (00--54). File Sra10k_99.tfrecord.gz contains incomplete SRAs. Each TFRecord contains the shape of the tensor, the indices of non-zero variants, sample name, serotype, and sparse values. Value 99 was assigned to '.' records.The file output.tar (11.4 TB) contains the .vcf files used to create the TFRecords above. The data in here is contained more succinctly in the TTFrecord format. This data will not normally be used.A tar file of metadata files for the samples, metadata (95 MB). Sequence read archive (SRA) accessions were downloaded using edirect/eutilities and saved as SraAccList.txt.esearch -db sra -query "txid28901[Organism:exp] AND (cluster_public[prop] AND 'biomol dna'[Properties] AND 'library layout paired'[Properties] AND 'platform illumina'[Properties] AND 'strategy wgs'[Properties] OR 'strategy wga'[Properties] OR 'strategy wcs'[Properties] OR 'strategy clone'[Properties] OR 'strategy finishing'[Properties] OR 'strategy validation'[Properties])" | efetch -format runinfo -mode xml | xtract -pattern Row -element Run > SraAccList.txtGoogle BigQuery was used to download metadata for the SRA accessions from the National Institute of Health (NIH).SELECT * FROM nih-sra-datastore.sra.metadata as metadata INNER JOIN {table_id} as leiacc ON metadata.acc = leiacc.accID;Files were processed into batches of ~10,000 and named Sra_completed_XX.csv (00--53).A VCF document mapping the TFRecord data to the positions in the graph subjected to the Type strain LT2: mapping/DRR452337.gvg.vcf-with_TFRecord_in_1st_column.txtScripts for creating and reading TFRecord data: code.reading_and_parsing_fns.py defines functions for converting VCFs of variants called using gvg to sparse tensors and makes the TFRecord files.gvg_to_tfrecord.py creates TFRecords from from the sparse tensors.Tutorial for using the TFRecords: Example_logistic_regression.mdPangenome graph files and references used for variant calling and genotyping: pangenome.refPlus100.fasta.gz which contains the genomes of the 101 Salmonella strains without plasmids used for construction of the pangenome graph.salm.100.NC_003197_v2.d2_complete.gfa.gz The complete 101 Salmonella strain pangenome graph in Graphical Fragment Assembly (GFA2) Format 2.0 including alt nodes used for genotypingsalm.100.NC_003197_v2.full.gfa.gz the full graph including alt nodes.salm.100.NC_003197_v2.full.vcf.gz A VCF of the file abovegenotyped.gvg.vcf the genotype calls in vcf formatpaths.txt the paths of the graphSCINet users: The data folder can be accessed/retrieved with valid SCINet account at this location: /LTS/ADCdatastorage/NAL/published/node28083194/See the SCINet File Transfer guide for more information on moving large files: https://scinet.usda.gov/guides/data/datatransferGlobus users: The files can also be accessed through Globus by following this data link. The user will need to log in to Globus in order to access this data. User accounts are free of charge with several options for signing on. Instructions for creating an account are on the login page.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Business process event data modeled as labeled property graphs
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic15-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2015 dataset.
van Dongen, B.F. (Boudewijn) (2015): BPI Challenge 2015. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1
This data is provided by five Dutch municipalities. The data contains all building permit applications over a period of approximately four years. There are many different activities present, denoted by both codes (attribute concept:name) and labels, both in Dutch (attribute taskNameNL) and in English (attribute taskNameEN). The cases in the log contain information on the main application as well as objection procedures in various stages. Furthermore, information is available about the resource that carried out the task and on the cost of the application (attribute SUMleges). The processes in the five municipalities should be identical, but may differ slightly. Especially when changes are made to procedures, rules or regulations the time at which these changes are pushed into the five municipalities may differ. Of course, over the four year period, the underlying processes have changed. The municipalities have a number of questions, namely: What are the roles of the people involved in the various stages of the process and how do these roles differ across municipalities? What are the possible points for improvement on the organizational structure for each of the municipalities? The employees of two of the five municipalities have physically moved into the same location recently. Did this lead to a change in the processes and if so, what is different? Some of the procedures will be outsourced from 2018, i.e. they will be removed from the process and the applicant needs to have these activities performed by an external party before submitting the application. What will be the effect of this on the organizational structures in the five municipalities? Where are differences in throughput times between the municipalities and how can these be explained? What are the differences in control flow between the municipalities? There are five different log files available in this collection. Events are labeled with both a code and a Dutch and English label. Each activity code consists of three parts: two digits, a variable number of characters, and then three digits. The first two digits as well as the characters indicate the subprocess the activity belongs to. For instance ‘01_HOOFD_xxx’ indicates the main process and ‘01_BB_xxx’ indicates the ‘objections and complaints’ (‘Beroep en Bezwaar’ in Dutch) subprocess. The last three digits hint on the order in which activities are executed, where the first digit often indicates a phase within a process. Each trace and each event, contain several data attributes that can be used for various checks and predictions. Furthermore, some employees may have performed tasks for different municipalities, i.e. if the employee number is the same, it is safe to assume the same person is being identified.
The data contains the following entities and their events
- Application - a building permit application handled in one of five Dutch municipalities
- Case_R - a user or worker involved in handling the application
- Responsible_actor - a user or worker designated as responsible actor for an activity
- monitoringResource - a user or worker designated as monitoring resource for an activity
The data contains 5 event log nodes as the data was integrated from 5 different event logs from 5 different systems.
Data Size
---------
BPIC15, nodes: 268851, relationships: 2620418
Facebook
TwitterResearch dissemination and knowledge translation are imperative in social work. Methodological developments in data visualization techniques have improved the ability to convey meaning and reduce erroneous conclusions. The purpose of this project is to examine: (1) How are empirical results presented visually in social work research?; (2) To what extent do top social work journals vary in the publication of data visualization techniques?; (3) What is the predominant type of analysis presented in tables and graphs?; (4) How can current data visualization methods be improved to increase understanding of social work research? Method: A database was built from a systematic literature review of the four most recent issues of Social Work Research and 6 other highly ranked journals in social work based on the 2009 5-year impact factor (Thomson Reuters ISI Web of Knowledge). Overall, 294 articles were reviewed. Articles without any form of data visualization were not included in the final database. The number of articles reviewed by journal includes : Child Abuse & Neglect (38), Child Maltreatment (30), American Journal of Community Psychology (31), Family Relations (36), Social Work (29), Children and Youth Services Review (112), and Social Work Research (18). Articles with any type of data visualization (table, graph, other) were included in the database and coded sequentially by two reviewers based on the type of visualization method and type of analyses presented (descriptive, bivariate, measurement, estimate, predicted value, other). Additional revi ew was required from the entire research team for 68 articles. Codes were discussed until 100% agreement was reached. The final database includes 824 data visualization entries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analytics as a field is currently at a crucial point in its development, as a commoditization takes place in the context of increasing amounts of data, more user diversity, and automated analysis solutions, the latter potentially eliminating the need for expert analysts. A central hypothesis of the present paper is that data visualizations should be adapted to both the user and the context. This idea was initially addressed in Study 1, which demonstrated substantial interindividual variability among a group of experts when freely choosing an option to visualize data sets. To lay the theoretical groundwork for a systematic, taxonomic approach, a user model combining user traits, states, strategies, and actions was proposed and further evaluated empirically in Studies 2 and 3. The results implied that for adapting to user traits, statistical expertise is a relevant dimension that should be considered. Additionally, for adapting to user states different user intentions such as monitoring and analysis should be accounted for. These results were used to develop a taxonomy which adapts visualization recommendations to these (and other) factors. A preliminary attempt to validate the taxonomy in Study 4 tested its visualization recommendations with a group of experts. While the corresponding results were somewhat ambiguous overall, some aspects nevertheless supported the claim that a user-adaptive data visualization approach based on the principles outlined in the taxonomy can indeed be useful. While the present approach to user adaptivity is still in its infancy and should be extended (e.g., by testing more participants), the general approach appears to be very promising.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset information
Internet topology graph. From traceroutes run daily in 2005 -
http://www.caida.org/tools/measurement/skitter. From several scattered sources
to million destinations. 1.7 million nodes, 11 million edges.
Dataset statistics
Nodes 1696415
Edges 11095298
Nodes in largest WCC 1694616 (0.999)
Edges in largest WCC 11094209 (1.000)
Nodes in largest SCC 1694616 (0.999)
Edges in largest SCC 11094209 (1.000)
Average clustering coefficient 0.2963
Number of triangles 28769868
Fraction of closed triangles 0.005387
Diameter (longest shortest path) 25
90-percentile effective diameter 5.9
Source (citation)
J. Leskovec, J. Kleinberg and C. Faloutsos. Graphs over Time: Densification
Laws, Shrinking Diameters and Possible Explanations. ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD), 2005.
Files
File Description
as-skitter.txt.gz AS from traceroutes run daily in 2005 by skitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Value Added by Industry: Private Industries (Chain-Type Quantity Index) was 4.70000 % Chg. in April of 2025, according to the United States Federal Reserve. Historically, United States - Value Added by Industry: Private Industries (Chain-Type Quantity Index) reached a record high of 40.10000 in July of 2020 and a record low of -29.90000 in April of 2020. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Value Added by Industry: Private Industries (Chain-Type Quantity Index) - last updated from the United States Federal Reserve on November of 2025.
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Producer Price Index by Commodity: Metals and Metal Products: Nonthreaded Metal Fasteners, Except Aircraft Types (WPU10810424) from Jul 1991 to Sep 2025 about aircraft, metals, commodities, PPI, inflation, price index, indexes, price, and USA.
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Expenditures: Apparel, Men, 16 and over by Type of Area: Urban (CXUMENSLB1802M) from 1984 to 2020 about apparel, males, expenditures, urban, and USA.
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in the New England Census Division (APU01107471A) from Jan 2018 to Sep 2025 about energy, gas, retail, price, and USA.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Intermediate Inputs by Industry: All Industries (Chain-Type Quantity Index) was 115.23400 Index 2009=100 in April of 2025, according to the United States Federal Reserve. Historically, United States - Intermediate Inputs by Industry: All Industries (Chain-Type Quantity Index) reached a record high of 116.02200 in January of 2025 and a record low of 80.22500 in April of 2009. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Intermediate Inputs by Industry: All Industries (Chain-Type Quantity Index) - last updated from the United States Federal Reserve on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical dataset showing Africa immigration statistics by year from N/A to N/A.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Real final sales to domestic purchasers (chain-type quantity index) was 120.47800 Index 2009=100 in January of 2024, according to the United States Federal Reserve. Historically, United States - Real final sales to domestic purchasers (chain-type quantity index) reached a record high of 123.01900 in January of 2021 and a record low of 5.23800 in January of 1933. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Real final sales to domestic purchasers (chain-type quantity index) - last updated from the United States Federal Reserve on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Intermediate Inputs by Industry: Construction (Chain-Type Price Index) was 144.26800 Index 2009=100 in April of 2025, according to the United States Federal Reserve. Historically, United States - Intermediate Inputs by Industry: Construction (Chain-Type Price Index) reached a record high of 144.26800 in April of 2025 and a record low of 74.88200 in January of 2005. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Intermediate Inputs by Industry: Construction (Chain-Type Price Index) - last updated from the United States Federal Reserve on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Imports of Goods and Services (chain-type price index) was -0.80000 % Chg. from Preceding Period in April of 2025, according to the United States Federal Reserve. Historically, United States - Imports of Goods and Services (chain-type price index) reached a record high of 80.20000 in January of 1974 and a record low of -36.70000 in October of 2008. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Imports of Goods and Services (chain-type price index) - last updated from the United States Federal Reserve on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chain-Type Quantity Index for Real GDP: Private Industries in Vermont was 114.14300 Index 2009=100 in April of 2025, according to the United States Federal Reserve. Historically, Chain-Type Quantity Index for Real GDP: Private Industries in Vermont reached a record high of 114.14300 in April of 2025 and a record low of 86.99400 in January of 2007. Trading Economics provides the current actual value, an historical data chart and related indicators for Chain-Type Quantity Index for Real GDP: Private Industries in Vermont - last updated from the United States Federal Reserve on October of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Paraguay Imports of Household or Laundry-type Washing Machines was US$28.15 Million during 2013, according to the United Nations COMTRADE database on international trade. Paraguay Imports of Household or Laundry-type Washing Machines - data, historical chart and statistics - was last updated on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Producer Price Index by Commodity: Intermediate Demand by Commodity Type: Components for Construction was 182.45500 Index Nov 2009=100 in March of 2025, according to the United States Federal Reserve. Historically, United States - Producer Price Index by Commodity: Intermediate Demand by Commodity Type: Components for Construction reached a record high of 182.45500 in March of 2025 and a record low of 100.00000 in December of 2009. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Producer Price Index by Commodity: Intermediate Demand by Commodity Type: Components for Construction - last updated from the United States Federal Reserve on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Real gross domestic product: Income payments to the rest of the world (chain-type quantity index) was 164.54000 Index 2009=100 in April of 2025, according to the United States Federal Reserve. Historically, United States - Real gross domestic product: Income payments to the rest of the world (chain-type quantity index) reached a record high of 165.59200 in April of 2024 and a record low of 0.56200 in July of 1947. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Real gross domestic product: Income payments to the rest of the world (chain-type quantity index) - last updated from the United States Federal Reserve on November of 2025.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.
Show data from your personal Twitter account
The Graph Your Network application inserts your Twitter activity into Neo4j.
https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">
~10 MB of graphs data (CSV)
43.325 node labels - Hashtag - Link - Me - Source - Tweet - User
57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING