63 datasets found
  1. c

    Street Network Database SND

    • s.cnmilf.com
    • data.seattle.gov
    • +2more
    Updated Jun 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2025). Street Network Database SND [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/street-network-database-snd-1712b
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    City of Seattle ArcGIS Online
    Description

    The pathway representation consists of segments and intersection elements. A segment is a linear graphic element that represents a continuous physical travel path terminated by path end (dead end) or physical intersection with other travel paths. Segments have one street name, one address range and one set of segment characteristics. A segment may have none or multiple alias street names. Segment types included are Freeways, Highways, Streets, Alleys (named only), Railroads, Walkways, and Bike lanes. SNDSEG_PV is a linear feature class representing the SND Segment Feature, with attributes for Street name, Address Range, Alias Street name and segment Characteristics objects. Part of the Address Range and all of Street name objects are logically shared with the Discrete Address Point-Master Address File layer. Appropriate uses include: Cartography - Used to depict the City's transportation network _location and connections, typically on smaller scaled maps or images where a single line representation is appropriate. Used to depict specific classifications of roadway use, also typically at smaller scales. Used to label transportation network feature names typically on larger scaled maps. Used to label address ranges with associated transportation network features typically on larger scaled maps. Geocode reference - Used as a source for derived reference data for address validation and theoretical address _location Address Range data repository - This data store is the City's address range repository defining address ranges in association with transportation network features. Polygon boundary reference - Used to define various area boundaries is other feature classes where coincident with the transportation network. Does not contain polygon features. Address based extracts - Used to create flat-file extracts typically indexed by address with reference to business data typically associated with transportation network features. Thematic linear _location reference - By providing unique, stable identifiers for each linear feature, thematic data is associated to specific transportation network features via these identifiers. Thematic intersection _location reference - By providing unique, stable identifiers for each intersection feature, thematic data is associated to specific transportation network features via these identifiers. Network route tracing - Used as source for derived reference data used to determine point to point travel paths or determine optimal stop allocation along a travel path. Topological connections with segments - Used to provide a specific definition of _location for each transportation network feature. Also provides a specific definition of connection between each transportation network feature. (defines where the streets are and the relationship between them ie. 4th Ave is west of 5th Ave and 4th Ave does intersect with Cherry St) Event _location reference - Used as source for derived reference data used to locate event and linear referencing.Data source is TRANSPO.SNDSEG_PV. Updated weekly.

  2. Bitcoin Trust Weighted Signed Networks (SNAP)

    • kaggle.com
    Updated Jan 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2022). Bitcoin Trust Weighted Signed Networks (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-soc-sign-bitcoin
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 2, 2022
    Dataset provided by
    Kaggle
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bitcoin Alpha trust weighted signed network

    https://snap.stanford.edu/data/soc-sign-bitcoin-alpha.html

    Dataset information

    This is who-trusts-whom network of people who trade using Bitcoin on a
    platform called Bitcoin Alpha (http://www.btcalpha.com/). Since Bitcoin
    users are anonymous, there is a need to maintain a record of users'
    reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin Alpha rate other members in a scale of -10 (total distrust) to
    +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.

    Dataset statistics
    Nodes 3,783
    Edges 24,186
    Range of edge weight -10 to +10
    Percentage of positive edges 93%

    Similar network from another Bitcoin platform, Bitcoin OTC, is available at https://snap.stanford.edu/data/soc-sign-bitcoinotc.html (and as
    SNAP/bitcoin-otc in the SuiteSparse Matrix Collection).

    Source (citation) Please cite the following paper if you use this dataset: S. Kumar, F. Spezzano, V.S. Subrahmanian, C. Faloutsos. Edge Weight
    Prediction in Weighted Signed Networks. IEEE International Conference on
    Data Mining (ICDM), 2016.
    http://cs.stanford.edu/~srijan/pubs/wsn-icdm16.pdf

    The following BibTeX citation can be used:
    @inproceedings{kumar2016edge,
    title={Edge weight prediction in weighted signed networks},
    author={Kumar, Srijan and Spezzano, Francesca and
    Subrahmanian, VS and Faloutsos, Christos},
    booktitle={Data Mining (ICDM), 2016 IEEE 16th Intl. Conf. on},
    pages={221--230},
    year={2016},
    organization={IEEE}
    }

    The project webpage for this paper, along with its code to calculate two
    signed network metrics---fairness and goodness---is available at
    http://cs.umd.edu/~srijan/wsn/

    Files
    File Description
    soc-sign-bitcoinalpha.csv.gz
    Weighted Signed Directed Bitcoin Alpha web of trust network

    Data format
    Each line has one rating with the following format:

    SOURCE, TARGET, RATING, TIME                      
    

    where

    SOURCE: node id of source, i.e., rater                 
    TARGET: node id of target, i.e., ratee                 
    RATING: the source's rating for the target,              
        ranging from -10 to +10 in steps of 1             
    TIME: the time of the rating, measured as seconds since Epoch.     
    

    Notes on inclusion into the Suite...

  3. Datasets Supporting Paper Titled, “Influence of Network Model Detail on the...

    • datasets.ai
    • catalog.data.gov
    • +3more
    57
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2024). Datasets Supporting Paper Titled, “Influence of Network Model Detail on the Performance of Designs of Contamination Warning Systems” [Dataset]. https://datasets.ai/datasets/datasets-supporting-paper-titled-influence-of-network-model-detail-on-the-performance-of-d-296cf
    Explore at:
    57Available download formats
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Authors
    U.S. Environmental Protection Agency
    Description

    This ZIP file contains the four EPANET network models used for one of the two water distribution system (WDS) network models (N1) analyzed in the paper titled:

    “The effect of a loss of model structural detail due to network skeletonization on contamination warning system design: case studies”.

    The EPANET network models provided here are for the network model named “N1” in this paper.

    This dataset is associated with the following publication: Janke , R., and M. Davis. The effect of a loss of model structural detail due to network skeletonization on contamination warning system design: case studies. Drinking Water Engineering and Science Discussions. Copernicus Gesellschaft mbH, Gottingen, GERMANY, 1-25, (2018).

  4. Z

    Data from: CESNET-QUIC22: A large one-month QUIC network traffic dataset...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hynek, Karel (2024). CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7409923
    Explore at:
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    Luxemburk, Jan
    Lukačovič, Andrej
    Šiška, Pavel
    Čejka, Tomáš
    Hynek, Karel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please refer to the original data article for further data description: Jan Luxemburk et al. CESNET-QUIC22: A large one-month QUIC network traffic dataset from backbone lines, Data in Brief, 2023, 108888, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.108888. We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo. The QUIC (Quick UDP Internet Connection) protocol has the potential to replace TLS over TCP, which is the standard choice for reliable and secure Internet communication. Due to its design that makes the inspection of QUIC handshakes challenging and its usage in HTTP/3, there is an increasing demand for research in QUIC traffic analysis. This dataset contains one month of QUIC traffic collected in an ISP backbone network, which connects 500 large institutions and serves around half a million people. The data are delivered as enriched flows that can be useful for various network monitoring tasks. The provided server names and packet-level information allow research in the encrypted traffic classification area. Moreover, included QUIC versions and user agents (smartphone, web browser, and operating system identifiers) provide information for large-scale QUIC deployment studies. Data capture The data was captured in the flow monitoring infrastructure of the CESNET2 network. The capturing was done for four weeks between 31.10.2022 and 27.11.2022. The following list provides per-week flow count, capture period, and uncompressed size:

    W-2022-44

    Uncompressed Size: 19 GB Capture Period: 31.10.2022 - 6.11.2022 Number of flows: 32.6M W-2022-45

    Uncompressed Size: 25 GB Capture Period: 7.11.2022 - 13.11.2022 Number of flows: 42.6M W-2022-46

    Uncompressed Size: 20 GB Capture Period: 14.11.2022 - 20.11.2022 Number of flows: 33.7M W-2022-47

    Uncompressed Size: 25 GB Capture Period: 21.11.2022 - 27.11.2022 Number of flows: 44.1M CESNET-QUIC22

    Uncompressed Size: 89 GB Capture Period: 31.10.2022 - 27.11.2022 Number of flows: 153M

    Data description The dataset consists of network flows describing encrypted QUIC communications. Flows were created using ipfixprobe flow exporter and are extended with packet metadata sequences, packet histograms, and with fields extracted from the QUIC Initial Packet, which is the first packet of the QUIC connection handshake. The extracted handshake fields are the Server Name Indication (SNI) domain, the used version of the QUIC protocol, and the user agent string that is available in a subset of QUIC communications. Packet Sequences Flows in the dataset are extended with sequences of packet sizes, directions, and inter-packet times. For the packet sizes, we consider payload size after transport headers (UDP headers for the QUIC case). Packet directions are encoded as ±1, +1 meaning a packet sent from client to server, and -1 a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate the response to be sent in the next packet. Packet metadata sequences have a length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction (from packet directions data); in other words, each client request and server response pair counts as one roundtrip. Flow statistics Flows also include standard flow statistics, which represent aggregated information about the entire bidirectional flow. The fields are: the number of transmitted bytes and packets in both directions, the duration of flow, and packet histograms. Packet histograms include binned counts of packet sizes and inter-packet times of the entire flow in both directions (more information in the PHISTS plugin documentation There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes. Moreover, each flow has its end reason - either it was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons. The end of flow detected reason is not considered because it is not relevant for UDP connections. Dataset structure The dataset flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the number of saved and seen (before sampling) flows per service and total counts of all received (observed on the CESNET2 network), service (belonging to one of the dataset's services), and saved (provided in the dataset) flows. There is also the stats-week.json file aggregating flow counts of a whole week and the stats-dataset.json file aggregating flow counts for the entire dataset. Flow counts before sampling can be used to compute sampling ratios of individual services and to resample the dataset back to the original service distribution. Moreover, various dataset statistics, such as feature distributions and value counts of QUIC versions and user agents, are provided in the dataset-statistics folder. The mapping between services and service providers is provided in the servicemap.csv file, which also includes SNI domains used for ground truth labeling. The following list describes flow data fields in CSV files:

    ID: Unique identifier SRC_IP: Source IP address DST_IP: Destination IP address DST_ASN: Destination Autonomous System number SRC_PORT: Source port DST_PORT: Destination port PROTOCOL: Transport protocol QUIC_VERSION QUIC: protocol version QUIC_SNI: Server Name Indication domain QUIC_USER_AGENT: User agent string, if available in the QUIC Initial Packet TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff DURATION: Duration of the flow in seconds BYTES: Number of transmitted bytes from client to server BYTES_REV: Number of transmitted bytes from server to client PACKETS: Number of packets transmitted from client to server PACKETS_REV: Number of packets transmitted from server to client PPI: Packet metadata sequence in the format: [[inter-packet times], [packet directions], [packet sizes]] PPI_LEN: Number of packets in the PPI sequence PPI_DURATION: Duration of the PPI sequence in seconds PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence PHIST_SRC_SIZES: Histogram of packet sizes from client to server PHIST_DST_SIZES: Histogram of packet sizes from server to client PHIST_SRC_IPT: Histogram of inter-packet times from client to server PHIST_DST_IPT: Histogram of inter-packet times from server to client APP: Web service label CATEGORY: Service category FLOW_ENDREASON_IDLE: Flow was terminated because it was idle FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

    Link to other CESNET datasets

    https://www.liberouter.org/technology-v2/tools-services-datasets/datasets/ https://github.com/CESNET/cesnet-datazoo Please cite the original data article:

    @article{CESNETQUIC22, author = {Jan Luxemburk and Karel Hynek and Tomáš Čejka and Andrej Lukačovič and Pavel Šiška}, title = {CESNET-QUIC22: a large one-month QUIC network traffic dataset from backbone lines}, journal = {Data in Brief}, pages = {108888}, year = {2023}, issn = {2352-3409}, doi = {https://doi.org/10.1016/j.dib.2023.108888}, url = {https://www.sciencedirect.com/science/article/pii/S2352340923000069} }

  5. c

    UCSD Real-time Network Telescope

    • catalog.caida.org
    Updated May 17, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAIDA (2018). UCSD Real-time Network Telescope [Dataset]. https://catalog.caida.org/dataset/telescope_live
    Explore at:
    Dataset updated
    May 17, 2018
    Dataset authored and provided by
    CAIDA
    License

    https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/

    Description

    The UCSD Network Telescope consists of a globally routed, but lightly utilized /9 and /10 network prefix, that is, 1/256th of the whole IPv4 address space. It contains few legitimate hosts; inbound traffic to non-existent machines - so called Internet Background Radiation (IBR) - is unsolicited and results from a wide range of events, including misconfiguration (e.g. mistyping an IP address), scanning of address space by attackers or malware looking for vulnerable targets, backscatter from randomly spoofed denial-of-service attacks, and the automated spread of malware. CAIDA continously captures this anomalous traffic discarding the legitimate traffic packets destined to the few reachable IP addresses in this prefix. We archive and aggregate these data, and provide this valuable resource to network security researchers. This dataset represents raw traffic traces captured by the Telescope instrumentation and made available in near-real time as one-hour long compressed pcap files. We collect more than 3 TB of uncompressed IBR traffic traces data per day. The most recent 14 days of data are stored locally at CAIDA. Once data slides out of this near-real-time window, the pcap files are off-loaded to a tape storage. This historical Telescope data starting from 2008 are available by additional request.

  6. Data from: Hornet 40: Network Dataset of Geographically Placed Honeypots

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip, csv +1
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica Valeros; Veronica Valeros (2024). Hornet 40: Network Dataset of Geographically Placed Honeypots [Dataset]. http://doi.org/10.17632/tcfzkbpw46.3
    Explore at:
    png, application/gzip, csvAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Veronica Valeros; Veronica Valeros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hornet 40 is a dataset of 40 days of network traffic attacks captured in cloud servers used as honeypots to help understand how geography may impact the inflow of network attacks. The honeypots are located in eight different cities: Amsterdam, London, Frankfurt, San Francisco, New York, Singapore, Toronto, Bangalore. The data was captured in April, May, and June 2021.

    The eight cloud servers were created and configured simultaneously following identical instructions. The network capture was performed using the Argus network monitoring tool in each cloud server. The cloud servers had only one service running (SSH on a non-standard port) and were fully dedicated as a honeypot. No honeypot software was used in this dataset.

    The dataset consists of eight scenarios, one for each geographically located cloud server. Each scenario contains bidirectional NetFlow files in the following format:

    • hornet40-biargus.tar.gz: all scenarios with bidirectional NetFlow files in Argus binary format;
    • hornet40-netflow-v5.tar.gz: all scenarios with bidirectional NetFlow v5 files in CSV format;
    • hornet40-netflow-extended.tar.gz: all scenarios with bidirectional NetFlows files in CSV format containing all features provided by Argus.
    • hornet40-full.tar.gz: download all the data (biargus, NetFlow v5, and extended NetFlows)
  7. Data from: Learning Naturalistic Temporal Structure in the Posterior Medial...

    • openneuro.org
    Updated Jul 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Aly; J. Chen; N.B. Turk-Browne; U. Hasson (2019). Learning Naturalistic Temporal Structure in the Posterior Medial Network [Dataset]. http://doi.org/10.18112/openneuro.ds001545.v1.1.1
    Explore at:
    Dataset updated
    Jul 23, 2019
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    M. Aly; J. Chen; N.B. Turk-Browne; U. Hasson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Participants watched three different clips from the movie "The Grand Budapest Hotel", six times each.

    One clip was "intact" (viewed in its original order); one clip was scrambled but viewed in the same scrambled order for all six repetitions ("scrambled-fixed"). The last clip was also scrambled, but viewed in a different scrambled order for each of the six repetitions ("scrambled-random").

    TIMING The onset times in each '_events.tsv' reflects the onset times (in seconds) after discarding 3 volumes in a preprocessing step (for T1 equilibration). For these timing files to be relevant, first discard 3 volumes at the beginning of each functional run.

    The "waiting for scanner" period at the beginning of the run-specific mp4 files takes longer than 4.5s (3 volumes at 1.5s TR) because the first multiband volume takes a long time to collect. Therefore, although the countdown to the first movie clip does not start until 37 seconds into each run-specific mp4, only 3 volumes were collected during that time. The first clip in each run onsets right after the countdown, which takes 6 seconds.

    For convenience, below is a list of which volumes correspond to which movie clip in each run, after discarding the first 3 volumes of each run: Clip 1: volumes 5 to 64 Clip 2: volumes 71 to 130 Clip 3: volumes 137 to 196 Clip 4: volumes 203 to 262 Clip 5: volumes 269 to 328 Clip 6: volumes 335 to 394

    Timing is the same for all subjects and runs, though the assignment of specific movie clips to each of these 6 positions varies between runs and between counterbalancing conditions 1 and 2.

    COUNTERBALANCING CONDITIONS Subjects 1-15 are in counterbalancing condition #1 (stimuli labeled cond1); they viewed identical stimuli in the same order

    Subjects 16-30 are in counterbalancing condition #2 (stimuli labeled cond2); they viewed identical stimuli in the same order.

    Subjects in counterbalancing conditions 1 & 2 watched the same 'intact' clip. The 'scrambled-fixed' clip for subjects in counterbalancing condition 1 was the 'scrambled-random' clip for subjects in counterbalancing condition 2, and vice versa.

    FURTHER HELP See the paper in References And Links for more information. The first author (Dr. Mariam Aly) can provide further clarification if needed.

  8. Data from: Citation network data sets for 'Oxytocin – a social peptide?...

    • zenodo.org
    csv
    Updated Jun 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rhodri Ivor Leng; Rhodri Ivor Leng (2022). Citation network data sets for 'Oxytocin – a social peptide? Deconstructing the evidence' [Dataset]. http://doi.org/10.5281/zenodo.6615221
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rhodri Ivor Leng; Rhodri Ivor Leng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This note describes the data sets used for all analyses contained in the manuscript 'Oxytocin - a social peptide?’[1]

    Data Collection

    The datasets described here were originally retrieved from Web of Science (WoS) Core Collection via the University of Edinburgh’s library subscription [2]. The aim of the original study for which these data were gathered was to survey peer-reviewed primary studies on oxytocin and social behaviour. To capture relevant papers, we used the following query:

    TI = (“oxytocin” OR “pitocin” OR “syntocinon”) AND TS = (“social*” OR “pro$social” OR “anti$social”)

    The final search was performed on the 13 September 2021. This returned a total of 2,747 records, of which 2,049 were classified by WoS as ‘articles’. Given our interest in primary studies only – articles reporting original data – we excluded all other document types. We further excluded all articles sub-classified as ‘book chapters’ or as ‘proceeding papers’ in order to limit our analysis to primary studies published in peer-reviewed academic journals. This reduced the set to 1,977 articles. All of these were published in the English language, and no further language refinements were unnecessary.

    All available metadata on these 1,977 articles was exported as plain text ‘flat’ format files in four batches, which we later merged together via Notepad++. Upon manually examination, we discovered examples of papers classified as ‘articles’ by WoS that were, in fact, reviews. To further filter our results, we searched all available PMIDs in PubMed (1,903 had associated PMIDs - ~96% of set). We then filtered results to identify all records classified as ‘review’, ‘systematic review’, or ‘meta-analysis’, identifying 75 records [3] (thus, ~4% of records classified by WoS were classified as reviews in PubMed). After examining a sample and agreeing with the PubMed classification, these were removed these from our dataset - leaving a total of 1,902 articles.

    From these data, we constructed two datasets via parsing out relevant reference data via the Sci2 Tool [4]. First, we constructed a ‘node-attribute-list’ by first linking unique reference strings (‘Cite Me As’ column in WoS data files) to unique identifiers, we then parsed into this dataset information on the identify of a paper, including the title of the article, all authors, journal publication, year of publication, total citations as recorded from WoS, and WoS accession number. Second, we constructed an ‘edge-list’ that records the citations from a citing paper in the ‘Source’ column and identifies the cited paper in the ‘Target’ column, using the unique identifies as described previously to link these data to the node-attribute-list.

    We then constructed a network in which papers are nodes, and citation links between nodes are directed edges between nodes. We used Gephi Version 0.9.2 [5] to manually clean these data by merging duplicate references that are caused by different reference formats or by referencing errors. To do this, we needed to retain both all retrieved records (1,902) as well as including all of their references to papers whether these were included in our original search or not. In total, this produced a network of 46,633 nodes (unique reference strings) and 112,520 edges (citation links). Thus, the average reference list size of these articles is ~59 references. The mean indegree (within network citations) is 2.4 (median is 1) for the entire network reflecting a great diversity in referencing choices among our 1,902 articles.

    After merging duplicates, we then restricted the network to include only articles fully retrieved (1,902), and retrained only those that were connected together by citations links in a large interconnected network (i.e. the largest component). In total, 1,892 (99.5%) of our initial set were connected together via citation links, meaning a total of ten papers were removed from the following analysis – and these were neither connected to the largest component, nor did they form connections with one another (i.e. these were ‘isolates’).

    This left us with a network of 1,892 nodes connected together by 26,019 edges. It is this network that is described by the ‘node-attribute-list’ and ‘edge-list’ provided here. This network has a mean in-degree of 13.76 (median in-degree of 4). By restricting our analysis in this way, we lose 44,741 unique references (96%) and 86,501 citations (77%) from the full network, but retain a set of articles tightly knitted together, all of which have been fully retrieved due to possessing certain terms related to oxytocin AND social behaviour in their title, abstract, or associated keywords.

    Before moving on, we calculated indegree for all nodes in this network – this counts the number of citations to a given paper from other papers within this network – and have included this in the node-attribute-list. We further clustered this network via modularity maximisation via the Leiden algorithm [6]. We set the algorithm to resolution 1, and allowed the algorithm to run over 100 iterations and 100 restarts. This gave Q=0.43 and identified seven clusters, which we describe in detail within the body of the paper. We have included cluster membership as an attribute in the node-attribute-list.

    For additional analysis, we also analysed the full reference list data to examine the most commonly cited references between 2016 and 2021 - the results of this are described in OTSOC_Cited_2016-2021.csv. This takes the reference lists of all retrieved papers within the network and examines their full reference lists (including references to other papers not contained within the network). These data were cleaned by matching DOIs and manual cleansing.

    Data description

    We include here two network datasets: (i) ‘OTSOC-node-attribute-list.csv’ consists of the attributes of 1,892 primary articles retrieved from WoS that include terms indicating a focus on oxytocin and social behaviour; (ii) ‘OTSOC-edge-list.csv’ records the citations between these papers. Together, these can be imported into a range of different software for network analysis; however, we have formatted these for ease of upload into Gephi 0.9.2. Finally, we include (iii) 'OTSOC_Cited_2016-2021' that lists all papers cited by >10 papers in the OTSOC network following any analysis of the bibliographies of retrieved papers. Below, we detail their contents:

    1. ‘OTSOC-node-attribute-list.csv’ is a comma-separate values file that contains all node attributes for the citation network (n=1,892) analysed in the paper. The columns refer to:

    Id, the unique identifier

    Label, the reference string of the paper to which the attributes in this row correspond. This is taken from the ‘Cite Me As’ column from the original WoS download. The reference string is in the following format: last name of first author, publication year, journal, volume, start page, and DOI (if available).

    Wos_id, unique Web of Science (WoS) accession number. These can be used to query WoS to find further data on all papers via the ‘UT= ’ field tag.

    Title, paper title.

    Authors, all named authors.

    Journal, journal of publication.

    Pub_year, year of publication.

    Wos_citations, total number of citations recorded by WoS Core Collection to a given paper as of 13 September 2021

    Indegree, the number of within network citations to a given paper, calculated for the network shown in Figure 1 of the manuscript.

    Cluster, provides the cluster membership number as discussed within the manuscript (Figure 1). This was established via modularity maximisation via the Leiden algorithm (Res 1; Q=0.43|7 clusters)

    2. ‘OTSOC-edge -list.csv’ is a comma-separated values file that contains all citation links between the 1,892 articles (n=26,019). The columns refer to:

    Source, the unique identifier of the citing paper.

    Target, the unique identifier of the cited paper.

    Type, edges are ‘Directed’, and this column tells Gephi to regard all edges as such.

    Syr_date, this contains the date of publication of the citing paper.

    Tyr_date, this contains the date of publication of the cited paper.

    3. 'OTSOC_Cited_2016-2021.csv' is a comma-separated values file that contain citations to all cited references that were cited by at least 10 of the retrieved papers within the OTSOC network published from 2016 onwards. The columns refer to:

    Reference, the cited reference string extracted from the bibliographies of retrieved papers.

    Publication year, the publication year of the cited reference.

    DOI, the DOI of the cited reference.

    indegree_2016, the total number of citations to a cited reference from papers published in 2016 and contained within the OTSOC network.

    indegree_2017, the total number of citations to a cited reference from papers published in 2017 and contained within the OTSOC network.

    indegree_2018, the total number of citations to a cited reference from papers published in 2018 and contained within the OTSOC network.

    indegree_2019, the total number of citations to a cited

  9. Z

    AIT Log Data Set V2.0

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +2more
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rauber, Andreas (2024). AIT Log Data Set V2.0 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5789063
    Explore at:
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Frank, Maximilian
    Landauer, Max
    Skopik, Florian
    Wurzenberger, Markus
    Rauber, Andreas
    Hotwagner, Wolfgang
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AIT Log Data Sets

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems, federated learning, and alert aggregation. A detailed description of the dataset is available in [1]. The logs were collected from eight testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by [2]. Please cite these papers if the data is used for academic publications.

    In brief, each of the datasets corresponds to a testbed representing a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise over a time span of 4-6 days. At some point, a sequence of attack steps is launched against the network. Log data is collected from all hosts and includes Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. Separate ground truth files are used to label events that are related to the attacks. Compared to the AIT-LDSv1.1, a more complex network and diverse user behavior is simulated, and logs are collected from all hosts in the network. If you are only interested in network traffic analysis, we also provide the AIT-NDS containing the labeled netflows of the testbed networks. We also provide the AIT-ADS, an alert data set derived by forensically applying open-source intrusion detection systems on the log data.

    The datasets in this repository have the following structure:

    The gather directory contains all logs collected from the testbed. Logs collected from each host are located in gather//logs/.

    The labels directory contains the ground truth of the dataset that indicates which events are related to attacks. The directory mirrors the structure of the gather directory so that each label files is located at the same path and has the same name as the corresponding log file. Each line in the label files references the log event corresponding to an attack by the line number counted from the beginning of the file ("line"), the labels assigned to the line that state the respective attack step ("labels"), and the labeling rules that assigned the labels ("rules"). An example is provided below.

    The processing directory contains the source code that was used to generate the labels.

    The rules directory contains the labeling rules.

    The environment directory contains the source code that was used to deploy the testbed and run the simulation using the Kyoushi Testbed Environment.

    The dataset.yml file specifies the start and end time of the simulation.

    The following table summarizes relevant properties of the datasets:

    fox

    Simulation time: 2022-01-15 00:00 - 2022-01-20 00:00

    Attack time: 2022-01-18 11:59 - 2022-01-18 13:15

    Scan volume: High

    Unpacked size: 26 GB

    harrison

    Simulation time: 2022-02-04 00:00 - 2022-02-09 00:00

    Attack time: 2022-02-08 07:07 - 2022-02-08 08:38

    Scan volume: High

    Unpacked size: 27 GB

    russellmitchell

    Simulation time: 2022-01-21 00:00 - 2022-01-25 00:00

    Attack time: 2022-01-24 03:01 - 2022-01-24 04:39

    Scan volume: Low

    Unpacked size: 14 GB

    santos

    Simulation time: 2022-01-14 00:00 - 2022-01-18 00:00

    Attack time: 2022-01-17 11:15 - 2022-01-17 11:59

    Scan volume: Low

    Unpacked size: 17 GB

    shaw

    Simulation time: 2022-01-25 00:00 - 2022-01-31 00:00

    Attack time: 2022-01-29 14:37 - 2022-01-29 15:21

    Scan volume: Low

    Data exfiltration is not visible in DNS logs

    Unpacked size: 27 GB

    wardbeck

    Simulation time: 2022-01-19 00:00 - 2022-01-24 00:00

    Attack time: 2022-01-23 12:10 - 2022-01-23 12:56

    Scan volume: Low

    Unpacked size: 26 GB

    wheeler

    Simulation time: 2022-01-26 00:00 - 2022-01-31 00:00

    Attack time: 2022-01-30 07:35 - 2022-01-30 17:53

    Scan volume: High

    No password cracking in attack chain

    Unpacked size: 30 GB

    wilson

    Simulation time: 2022-02-03 00:00 - 2022-02-09 00:00

    Attack time: 2022-02-07 10:57 - 2022-02-07 11:49

    Scan volume: High

    Unpacked size: 39 GB

    The following attacks are launched in the network:

    Scans (nmap, WPScan, dirb)

    Webshell upload (CVE-2020-24186)

    Password cracking (John the Ripper)

    Privilege escalation

    Remote command execution

    Data exfiltration (DNSteal)

    Note that attack parameters and their execution orders vary in each dataset. Labeled log files are trimmed to the simulation time to ensure that their labels (which reference the related event by the line number in the file) are not misleading. Other log files, however, also contain log events generated before or after the simulation time and may therefore be affected by testbed setup or data collection. It is therefore recommended to only consider logs with timestamps within the simulation time for analysis.

    The structure of labels is explained using the audit logs from the intranet server in the russellmitchell data set as an example in the following. The first four labels in the labels/intranet_server/logs/audit/audit.log file are as follows:

    {"line": 1860, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1861, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1862, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1863, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    Each JSON object in this file assigns a label to one specific log line in the corresponding log file located at gather/intranet_server/logs/audit/audit.log. The field "line" in the JSON objects specify the line number of the respective event in the original log file, while the field "labels" comprise the corresponding labels. For example, the lines in the sample above provide the information that lines 1860-1863 in the gather/intranet_server/logs/audit/audit.log file are labeled with "attacker_change_user" and "escalate" corresponding to the attack step where the attacker receives escalated privileges. Inspecting these lines shows that they indeed correspond to the user authenticating as root:

    type=USER_AUTH msg=audit(1642999060.603:2226): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:authentication acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_ACCT msg=audit(1642999060.603:2227): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=CRED_ACQ msg=audit(1642999060.615:2228): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_START msg=audit(1642999060.627:2229): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:session_open acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    The same applies to all other labels for this log file and all other log files. There are no labels for logs generated by "normal" (i.e., non-attack) behavior; instead, all log events that have no corresponding JSON object in one of the files from the labels directory, such as the lines 1-1859 in the example above, can be considered to be labeled as "normal". This means that in order to figure out the labels for the log data it is necessary to store the line numbers when processing the original logs from the gather directory and see if these line numbers also appear in the corresponding file in the labels directory.

    Beside the attack labels, a general overview of the exact times when specific attack steps are launched are available in gather/attacker_0/logs/attacks.log. An enumeration of all hosts and their IP addresses is stated in processing/config/servers.yml. Moreover, configurations of each host are provided in gather//configs/ and gather//facts.json.

    Version history:

    AIT-LDS-v1.x: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.

    AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU projects GUARD (833456) and PANDORA (SI2.835928).

    If you use the dataset, please cite the following publications:

    [1] M. Landauer, F. Skopik, M. Frank, W. Hotwagner, M. Wurzenberger, and A. Rauber. "Maintainable Log Datasets for Evaluation of Intrusion Detection Systems". IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482, doi: 10.1109/TDSC.2022.3201582. [PDF]

    [2] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner and A. Rauber, "Have it Your Way: Generating Customized Log Datasets With a Model-Driven Simulation Testbed," in IEEE Transactions on Reliability, vol. 70, no. 1, pp. 402-415, March 2021, doi: 10.1109/TR.2020.3031317. [PDF]

  10. Signed Graphs

    • kaggle.com
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Signed Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-signed
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2021
    Dataset provided by
    Kaggle
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    soc-RedditHyperlinks: Social Network: Reddit Hyperlink Network

    The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017.

    Subreddit Hyperlink Network: the subreddit-to-subreddit hyperlink network is extracted from the posts that create hyperlinks from one subreddit to another. We say a hyperlink originates from a post in the source community and links to a post in the target community. Each hyperlink is annotated with three properties: the timestamp, the sentiment of the source community post towards the target community post, and the text property vector of the source post. The network is directed, signed, temporal, and attributed.

    Note that each post has a title and a body. The hyperlink can be present in either the title of the post or in the body. Therefore, we provide one network file for each.

    Subreddit Embeddings: We have also provided embedding vectors representing each subreddit. These can be found in this dataset link: subreddit embedding dataset. Please note that some subreddit embeddings could not be generated, so this file has 51,278 embeddings.

    soc-sign-bitcoin-otc: Bitcoin OTC trust weighted signed network

    This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.

    soc-sign-bitcoin-alpha: Bitcoin Alpha trust weighted signed network

    This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin Alpha. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin Alpha rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.

    soc-sign-epinions: Epinions social network

    This is who-trust-whom online social network of a a general consumer review site Epinions.com. Members of the site can decide whether to ''trust'' each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user.

    wiki-Elec: Wikipedia adminship election data

    Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. A small part of Wikipedia contributors are administrators, who are users with access to additional technical features that aid in maintenance. In order for a user to become an administrator a Request for adminship (RfA) is issued and the Wikipedia community via a public discussion or a vote decides who to promote to adminship. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all administrator elections and vote history data. This gave us nearly 2,800 elections with around 100,000 total votes and about 7,000 users participating in the elections (either casting a vote or being voted on). Out of these 1,200 elections resulted in a successful promotion, while about 1,500 elections did not result in the promotion. About half of the votes in the dataset are by existing admins, while the other half comes from ordinary Wikipedia users.

    Dataset has the following format:

    • E: did the elector result in promotion (1) or not (0)
    • T: time election was closed
    • U: user id (and screen name) of editor that is being considered for promotion
    • N: user id (and screen name) of the nominator
    • V: vote(1:support, 0:neutral, -1:oppose) user_id time screen_name

    wiki-RfA: Wikipedia Requests for Adminship (with text)

    For a Wikipedia editor to become an administrator, a request for adminship (RfA) must be submitted, either by the candidate or by another community member. Subsequently, any Wikipedia member may cast a supporting, neutral, or opposing vote.

    We crawled and parsed all votes since the adoption of the RfA process in 2003 through May 2013. The dataset contains 11,381 users (voters and votees) forming 189,004 distinct voter/votee pairs, for a total of 198,275 votes (this is larger than the number of distinct voter/votee pairs because, if the same user ran for election several times, the same voter/votee pair may contribute several votes).

    This induces a directed, signed network in which nodes represent Wikipedia members and edges represent votes. In this sense, the...

  11. m

    Salvador Urban Network Transportation (SUNT)

    • data.mendeley.com
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcos Vinícius (2025). Salvador Urban Network Transportation (SUNT) [Dataset]. http://doi.org/10.17632/85fdtx3kr5.1
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Marcos Vinícius
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Efficient public transportation management is essential for the development of large urban centers, providing several benefits such as comprehensive coverage of population mobility, the decrease of transport costs, better control of traffic congestion, and significant reduction of environmental impact limiting gas emissions and pollution. Realizing these benefits requires a deeply understanding the population and transit patterns and the adoption of approaches to model multiple relations and characteristics efficiently. This work addresses these challenges by providing a novel dataset that includes various public transportation components from three different systems: regular buses, subway, and BRT (Bus Rapid Transit). Our dataset comprises daily information from about 710,000 passengers in Salvador, one of Brazil's largest cities, and local public transportation data with approximately 2,000 vehicles operating across nearly 400 lines, connecting almost 3,000 stops and stations. With data collected from March 2024 to March 2025 at a frequency lower than one minute, SUNT stands as one of the largest, most comprehensive, and openly available urban datasets in the literature.

  12. LBA Regional Global Historical Climatology Network, V. 1, 1832-1990

    • s.cnmilf.com
    • data.globalchange.gov
    • +7more
    Updated Jun 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ORNL_DAAC (2025). LBA Regional Global Historical Climatology Network, V. 1, 1832-1990 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/lba-regional-global-historical-climatology-network-v-1-1832-1990-93988
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    Oak Ridge National Laboratory Distributed Active Archive Center
    Description

    This data set consists of a subset of the Global Historical Climatology Network (GHCN) Version 1 database for the study area of the Large Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) in South America (i.e., longitude 85 to 30 degrees W, latitude 25 degrees S to 10 degrees N). There are three files available, one each for precipitation, temperature, and pressure data. Within this subset the oldest data date from 1832 and the most recent from 1990.The GHCN V1 database contains monthly temperature, precipitation, sea-level pressure, and station-pressure data for thousands of meteorological stations worldwide. The database was compiled from pre-existing national, regional, and global collections of data as part of the Global Historical Climatology Network (GHCN) project, the goal of which was to produce, maintain and make available a comprehensive global surface baseline climate data set for monitoring climate and detecting climate change. It contains data from roughly 6000 temperature stations, 7500 precipitation stations, 1800 sea-level pressure stations, and 1800 station-pressure stations. Each station has at least 10 years of data; 40% have more than 50 years of data. Spatial coverage is good over most of the globe, particularly for the United States and Europe. Data gaps are evident over the Amazon rainforest, the Sahara desert, Greenland, and Antarctica. The earliest station data are from 1697; the most recent are from 1990. The database was created from 15 source data sets including:The National Climatic Data Center's (NCDC's) World Weather Records,CAC's Climate Anomaly Monitoring System (CAMS),NCAR's World Monthly Surface Station Climatology,CIRES' (Eischeid/Diaz) Global precipitation data set,P. Jones' Temperature data base for the world, andS. Nicholson's African precipitation database. Quality Control of the GHCN V1 database included visual inspection of graphs of all station time series, tests for precipitation digitized 6 months out of phase, tests for different stations having identical data, and other tests. This detailed analysis has revealed that most stations (95% for temperature and precipitation, 75% for pressure) contain high-quality data. However, gross data-processing errors (e.g., keypunch problems) and discontinuous inhomogeneities (e.g., station relocations and instrumentation changes) do characterize a small number of stations. All major data processing problems have been flagged (or corrected, when possible). Similarly, all major inhomogeneities have been flagged, although no homogeneity corrections were applied.LBA was designed to create the new knowledge needed to understand the climatological, ecological, biogeochemical, and hydrological functioning of Amazonia; the impact of land use change on these functions; and the interactions between Amazonia and the Earth system. LBA was a cooperative international research initiative led by Brazil and NASA was a lead sponsor for several experiments. More information about LBA and links to other LBA project sites can be found at http://www.daac.ornl.gov/LBA/misc_amazon.html.

  13. Drug Abuse Warning Network (DAWN-2006)

    • catalog.data.gov
    • data.virginia.gov
    • +4more
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse & Mental Health Services Administration (2023). Drug Abuse Warning Network (DAWN-2006) [Dataset]. https://catalog.data.gov/dataset/drug-abuse-warning-network-dawn-2006
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttp://www.samhsa.gov/
    Description

    The Drug Abuse Warning Network (DAWN) is a nationally representative public health surveillance system that has monitored drug related emergency department (ED) visits to hospitals since the early 1970s. First administered by the Drug Enforcement Administration (DEA) and the National Institute on Drug Abuse (NIDA), the responsibility for DAWN now rests with the Substance Abuse and Mental Health Services Administration's (SAMHSA) Center for Behavioral Health Statistics and Quality (CBHSQ). Over the years, the exact survey methodology has been adjusted to improve the quality, reliability, and generalizability of the information produced by DAWN. The current approach was first fully implemented in the 2004 data collection year. DAWN relies on a longitudinal probability sample of hospitals located throughout the United States. To be eligible for selection into the DAWN sample, a hospital must be a non-Federal, short-stay, general surgical and medical hospital located in the United States, with at least one 24-hour ED. DAWN cases are identified by the systematic review of ED medical records in participating hospitals. The unit of analysis is any ED visit involving recent drug use. DAWN captures both ED visits that are directly caused by drugs and those in which drugs are a contributing factor but not the direct cause of the ED visit. The reason a patient used a drug is not part of the criteria for considering a visit to be drug related. Therefore, all types of drug-related events are included: drug misuse or abuse, accidental drug ingestion, drug-related suicide attempts, malicious drug poisonings, and adverse reactions. DAWN does not report medications that are unrelated to the visit. The DAWN public-use dataset provides information for all types of drugs, including illegal drugs, prescription drugs, over-the-counter medications, dietary supplements, anesthetic gases, substances that have psychoactive effects when inhaled, alcohol when used in combination with other drugs (all ages), and alcohol alone (only for patients aged 20 or younger). Public-use dataset variables describe and categorize up to 16 drugs contributing to the ED visit, including toxicology confirmation and route of administration. Administrative variables specify the type of case, case disposition, categorized episode time of day, and quarter of year. Metropolitan area is included for represented metropolitan areas. Created variables include the number of unique drugs reported and case-level indicators for alcohol, non-alcohol illicit substances, any pharmaceutical, non-medical use of pharmaceuticals, and all misuse and abuse of drugs. Demographic items include age category, sex, and race/ethnicity. Complex sample design and weighting variables are included to calculate various estimates of drug-related ED visits for the Nation as a whole, as well as for specific metropolitan areas, from the ED visits classified as DAWN cases in the selected hospitals.This study has 1 Data Set.

  14. Data from: CESNET-TLS-Year22: A year-spanning TLS network traffic dataset...

    • zenodo.org
    • data.niaid.nih.gov
    csv, zip
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karel Hynek; Karel Hynek; Jan Luxemburk; Jan Luxemburk; Jaroslav Pešek; Jaroslav Pešek; Tomáš Čejka; Tomáš Čejka; Šiška Pavel; Šiška Pavel (2025). CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines [Dataset]. http://doi.org/10.5281/zenodo.10608607
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Karel Hynek; Karel Hynek; Jan Luxemburk; Jan Luxemburk; Jaroslav Pešek; Jaroslav Pešek; Tomáš Čejka; Tomáš Čejka; Šiška Pavel; Šiška Pavel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2022
    Description

    We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo.

    The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.

    Data description The dataset consists of network flows describing encrypted TLS communications. Flows are extended with packet sequences, histograms, and fields extracted from the TLS ClientHello message, which is transmitted in the first packet of the TLS connection handshake. The most important extracted handshake field is the SNI domain, which is used for ground-truth labeling.

    Packet Sequences Sequences of packet sizes, directions, and inter-packet times are standard data input for traffic analysis. For packet sizes, we consider the payload size after transport headers (TCP headers for the TLS case). We omit packets with no TCP payload, for example ACKs, because zero-payload packets are related to the transport layer internals rather than services’ behavior. Packet directions are encoded as ±1, where +1 means a packet sent from client to server, and -1 is a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate a response. Packet sequences have a maximum length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction; in other words, each client request and server response pair counts as one roundtrip.

    Flow statistics Each data record also includes standard flow statistics, representing aggregated information about the entire bidirectional connection. The fields are the number of transmitted bytes and packets in both directions, the duration of the flow, and packet histograms. The packet histograms include binned counts (not limited to the first 30 packets) of packet sizes and inter-packet times in both directions. There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes (More information in the PHISTS plugin documentation). Moreover, each flow has its end reason---either it ended with the TCP connection termination (FIN packets), was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons.

    Dataset structure The dataset is organized per weeks and individual days. The flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the total number of saved flows and the number of flows per service. There are also files aggregating flow counts for each week (stats-week.json) and for the entire dataset (stats-dataset.json). The following list describes flow data fields in CSV files:

    • ID: Unique identifier
    • SRC_IP: Source IP address
    • DST_IP: Destination IP address
    • DST_ASN: Destination Autonomous System number
    • SRC_PORT: Source port
    • DST_PORT: Destination port
    • PROTOCOL: Transport protocol
    • FLAG_CWR: Presence of the CWR flag
    • FLAG_CWR_REV: Presence of the CWR flag in the reverse direction
    • FLAG_ECE: Presence of the ECE flag
    • FLAG_ECE_REV: Presence of the ECE flag in the reverse direction
    • FLAG_URG: Presence of the URG flag
    • FLAG_URG_REV: Presence of the URG flag in the reverse direction
    • FLAG_ACK: Presence of the ACK flag
    • FLAG_ACK_REV: Presence of the ACK flag in the reverse direction
    • FLAG_PSH: Presence of the PSH flag
    • FLAG_PSH_REV: Presence of the PSH flag in the reverse direction
    • FLAG_RST: Presence of the RST flag
    • FLAG_RST_REV: Presence of the RST flag in the reverse direction
    • FLAG_SYN: Presence of the SYN flag
    • FLAG_SYN_REV: Presence of the SYN flag in the reverse direction
    • FLAG_FIN: Presence of the FIN flag
    • FLAG_FIN_REV: Presence of the FIN flag in the reverse direction
    • TLS_SNI: Server Name Indication domain
    • TLS_JA3: JA3 fingerprint of TLS client
    • TIME_FIRST: Timestamp of the first packet in format YYYY-MM-DDTHH-MM-SS.ffffff
    • TIME_LAST: Timestamp of the last packet in format YYYY-MM-DDTHH-MM-SS.ffffff
    • DURATION: Duration of the flow in seconds
    • BYTES: Number of transmitted bytes from client to server
    • BYTES_REV: Number of transmitted bytes from server to client
    • PACKETS: Number of packets transmitted from client to server
    • PACKETS_REV: Number of packets transmitted from server to client
    • PPI: Packet sequence in the format: [[inter-packet times], [packet directions], [packet sizes], [push flags]]
    • PPI_LEN: Number of packets in the PPI sequence
    • PPI_DURATION: Duration of the PPI sequence in seconds
    • PPI_ROUNDTRIPS: Number of roundtrips in the PPI sequence
    • PHIST_SRC_SIZES: Histogram of packet sizes from client to server
    • PHIST_DST_SIZES: Histogram of packet sizes from server to client
    • PHIST_SRC_IPT: Histogram of inter-packet times from client to server
    • PHIST_DST_IPT: Histogram of inter-packet times from server to client
    • APP: Web service label
    • CATEGORY: Service category
    • FLOW_ENDREASON_IDLE: Flow was terminated because it was idle
    • FLOW_ENDREASON_ACTIVE: Flow was terminated because it reached the active timeout
    • FLOW_ENDREASON_END: Flow ended with the TCP connection termination
    • FLOW_ENDREASON_OTHER: Flow was terminated for other reasons

  15. m

    Data from: CTU Hornet 65 Niner: A Network Dataset of Geographically...

    • data.mendeley.com
    • zenodo.org
    Updated Oct 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica Valeros (2024). CTU Hornet 65 Niner: A Network Dataset of Geographically Distributed Low-Interaction Honeypots [Dataset]. http://doi.org/10.17632/nt4p9zsv5k.1
    Explore at:
    Dataset updated
    Oct 9, 2024
    Authors
    Veronica Valeros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CTU Hornet 65 Niner is a dataset of 65 days of network traffic attacks captured in cloud servers used as honeypots to help understand how geography may impact the inflow of network attacks. The honeypots were placed in nine different geographical locations: Amsterdam, London, Frankfurt, San Francisco, New York, Singapore, Toronto, Bangalore, and Sydney. The data was captured from April 28th to July 1st, 2024.

    The nine cloud servers were created and configured following identical instructions using Ansible [1] in DigitalOcean [2] cloud provider. The network capture was performed using the Zeek [3] network monitoring tool, which was installed on each cloud server. The cloud servers had only one service running (SSH on a non-standard port) and were fully dedicated to being used as a honeypot. No honeypot software was used in this dataset.

    The dataset is composed of nine scenarios:

    • Honeypot-Cloud-DigitalOcean-Geo-1: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-2: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-3: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-4: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-5: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-6: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-7: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-8: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files
    • Honeypot-Cloud-DigitalOcean-Geo-9: has 65 folders (YYYY-MM-DD), each containing 24 Zeek conn.log files and other Zeek files

    References: [1] Ansible IT Automation Engine, https://www.ansible.com/. Accessed on 08/28/2024. [2] DigitalOcean, https://www.digitalocean.com/. Accessed on 08/28/2024. [3] Zeek Documentation, https://docs.zeek.org/en/master/index.html. Accessed on 08/28/2024.

  16. F

    Households; Net Worth, Level

    • fred.stlouisfed.org
    json
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Households; Net Worth, Level [Dataset]. https://fred.stlouisfed.org/series/BOGZ1FL192090005Q
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 12, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for Households; Net Worth, Level (BOGZ1FL192090005Q) from Q4 1987 to Q1 2025 about net worth, Net, households, and USA.

  17. Dataset of knee joint contact force peaks and corresponding subject...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    txt
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jere Joonatan Lavikainen; Jere Joonatan Lavikainen; Lauri Stenroth; Lauri Stenroth (2023). Dataset of knee joint contact force peaks and corresponding subject characteristics from 4 open datasets [Dataset]. http://doi.org/10.5281/zenodo.7253458
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 9, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jere Joonatan Lavikainen; Jere Joonatan Lavikainen; Lauri Stenroth; Lauri Stenroth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data from overground walking trials of 166 subjects with several trials per subject (approximately 2900 trials total).

    DATA ORIGINS & LICENSE INFORMATION

    The data comes from four existing open datasets collected by others:

    Schreiber & Moissenet, A multimodal dataset of human gait at different walking speeds established on injury-free adult participants

    Fukuchi et al., A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals

    Horst et al., A public dataset of overground walking kinetics and full-body kinematics in healthy adult individuals

    Camargo et al., A comprehensive, open-source dataset of lower limb biomechanics in multiple conditions of stairs, ramps, and level-ground ambulation and transitions

    In this dataset, those datasets are referred to as the Schreiber, Fukuchi, Horst, and Camargo datasets, respectively.
    The Schreiber, Fukuchi, Horst, and Camargo datasets are licensed under the CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/).

    We have modified the datasets by analyzing the data with musculoskeletal simulations & analysis software (OpenSim).
    In this dataset, we publish modified data as well as some of the original data.


    STRUCTURE OF THE DATASET
    The dataset contains two kinds of text files: those starting with "predictors_" and those starting with "response_".

    Predictors comprise 12 text files, each describing the input (predictor) variables we used to train artifical neural networks to predict knee joint loading peaks.
    Responses similarly comprise 12 text files, each describing the response (outcome) variables that we trained and evaluated the network on.
    The file names are of the form "predictors_X" for predictors and "response_X" for responses, where X describes which response (outcome) variable is predicted with them.
    X can be:
    - loading_response_both: the maximum of the first peak of stance for the sum of the loading of the medial and lateral compartments
    - loading_response_lateral: the maximum of the first peak of stance for the loading of the lateral compartment
    - loading_response_medial: the maximum of the first peak of stance for the loading of the medial compartment
    - terminal_extension_both: the maximum of the second peak of stance for the sum of the loading of the medial and lateral compartments
    - terminal_extension_lateral: the maximum of the second peak of stance for the loading of the lateral compartment
    - terminal_extension_medial: the maximum of the second peak of stance for the loading of the medial compartment
    - max_peak_both: the maximum of the entire stance phase for the sum of the loading of the medial and lateral compartments
    - max_peak_lateral: the maximum of the entire stance phase for the loading of the lateral compartment
    - max_peak_medial: the maximum of the entire stance phase for the loading of the medial compartment
    - MFR_common: the medial force ratio for the entire stance phase
    - MFR_LR: the medial force ratio for the first peak of stance
    - MFR_TE: the medial force ratio for the second peak of stance

    The predictor text files are organized as comma-separated values. Each row corresponds to one walking trial. A single subject typically has several trials.
    The column labels are DATASET_INDEX,SUBJECT_INDEX,KNEE_ADDUCTION,MASS,HEIGHT,BMI,WALKING_SPEED,HEEL_STRIKE_VELOCITY,AGE,GENDER.

    • DATASET_INDEX describes which original dataset the trial is from, where {1=Schreiber, 2=Fukuchi, 3=Horst, 4=Camargo}
    • SUBJECT_INDEX is the index of the subject in the original dataset. If you use this column, you will have to rewrite these to avoid duplicates (e.g., several datasets probably have subject "3").
    • KNEE_ADDUCTION is the knee adduction-abduction angle (positive for adduction, negative for abduction) of the subject in static pose, estimated from motion capture markers.
    • MASS is the mass of the subject in kilograms
    • HEIGHT is the height of the subject in millimeters
    • BMI is the body mass index of the subject
    • WALKING_SPEED is the mean walking speed of the subject during the trial
    • HEEL_STRIKE_VELOCITY is the mean of the velocities of the subject's pelvis markers at the instant of heel strike
    • AGE is the age of the subject in years
    • GENDER is an integer/boolean where {1=male, 0=female}

    The response text files contain one floating-point value per row, describing the knee joint contact force peak for the trial in newtons (or the medial force ratio). Each row corresponds to one walking trial.
    The rows in predictor and response text files match each other (e.g., row 7 describes the same trial in both predictors_max_peak_medial.txt and response_max_peak_medial.txt).


    See our journal article "Prediction of Knee Joint Compartmental Loading Maxima Utilizing Simple Subject Characteristics and Neural Networks" (https://doi.org/10.1007/s10439-023-03278-y) for more information.

    Questions & other contacts: jere.lavikainen@uef.fi

  18. Predicting the trading behavior of socially connected investors

    • figshare.com
    zip
    Updated Jul 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kęstutis Baltakys; Margarita Baltakiene; Juho Kanniainen; Negar Heidari; Alexandros Iosifidis (2022). Predicting the trading behavior of socially connected investors [Dataset]. http://doi.org/10.6084/m9.figshare.20310240.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Kęstutis Baltakys; Margarita Baltakiene; Juho Kanniainen; Negar Heidari; Alexandros Iosifidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Eight data sets that differ in whether they are used to predict investor trading in the same or subsequent periods with observations about the trading of neighbors in the social network. The periods are either daily or weekly windows. Moreover, we investigate separately investor influence over purchase and sale transactions. These three differences lead to eight distinct data sets. The size of the data sets ranges from just below 2,400 to almost 22 thousand observations. The labels are positive (set equal to 1) if an investor on a given day traded specific security in the same direction as at least one of his neighbors and negative (set equal to 0) otherwise. We use a sliding window with the size corresponding to the prediction window. In each window, for each ego investor, we create observations of instances of social influence in the neighborhood, given that at least one of the neighbors is active. An ego investor can be understood as a tippee and her neighbors as tippers. We record the specific behavior of investors in their neighborhood and, depending on the prediction period, the ego investors' behavior in the same or subsequent period. Initially, the data sets were highly imbalanced in terms of labels and, for this reason, were re-sampled to achieve a 1:3 label balance ratio.

  19. i

    OpenIce Medical Device Network Normal Operation, Data Set 1

    • impactcybertrust.org
    Updated Apr 25, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massachusetts General Hospital (2019). OpenIce Medical Device Network Normal Operation, Data Set 1 [Dataset]. http://doi.org/10.23721/1504233
    Explore at:
    Dataset updated
    Apr 25, 2019
    Authors
    Massachusetts General Hospital
    Time period covered
    Apr 25, 2019
    Description

    Initial state:

    Data was collected from a network of computers running OpenICE software. One MacOS computer was running an OpenICE supervisor and four BeagleBone computers running Debian linux were connected to medical devices: one each to a Puritan-Bennet 840 ventilator, a Philips MP70 patient monitor, a Drager Apollo anesthesia machine, and a Philps MX800 patient monitor. A Windows 10 PC was also connected to the network and will be used during the data capture to start a second OpenICE supervisor.

    The Puritan-Bennet 840 ventilator was running connected to a Michigan Lung test lung, the Philips MX800 patient monitor was running in demo mode where it transmits a typical set of vital signs, the MP70 was turned on with no patient connected, and the Drager Apollo was turned off but with an active BeagleBone network interface connected.

    Data file timeline:

    5 seconds Start a second OpenICE supervisor on the Windows 10 PC
    30 seconds Connect a fifth BeagleBone, connected to an Oridion Capnostream patient monitor running in demo mode, to the network
    1:35 Start a simulated multiparameter monitor on the Windows 10 PC Supervisor
    3:00 Start a serial interface to a GE Solar 8000 patient monitor on the MocOS Supervisor
    3:50 Use the Windows Supervisor to associate the GE Solar 8000 with the simulated patient "Joeseph Baker"
    5:00 End recording

  20. i

    Cora

    • ieee-dataport.org
    • huggingface.co
    Updated Mar 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sepideh Neshatfar (2024). Cora [Dataset]. https://ieee-dataport.org/documents/cora
    Explore at:
    Dataset updated
    Mar 11, 2024
    Authors
    Sepideh Neshatfar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
City of Seattle ArcGIS Online (2025). Street Network Database SND [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/street-network-database-snd-1712b

Street Network Database SND

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 29, 2025
Dataset provided by
City of Seattle ArcGIS Online
Description

The pathway representation consists of segments and intersection elements. A segment is a linear graphic element that represents a continuous physical travel path terminated by path end (dead end) or physical intersection with other travel paths. Segments have one street name, one address range and one set of segment characteristics. A segment may have none or multiple alias street names. Segment types included are Freeways, Highways, Streets, Alleys (named only), Railroads, Walkways, and Bike lanes. SNDSEG_PV is a linear feature class representing the SND Segment Feature, with attributes for Street name, Address Range, Alias Street name and segment Characteristics objects. Part of the Address Range and all of Street name objects are logically shared with the Discrete Address Point-Master Address File layer. Appropriate uses include: Cartography - Used to depict the City's transportation network _location and connections, typically on smaller scaled maps or images where a single line representation is appropriate. Used to depict specific classifications of roadway use, also typically at smaller scales. Used to label transportation network feature names typically on larger scaled maps. Used to label address ranges with associated transportation network features typically on larger scaled maps. Geocode reference - Used as a source for derived reference data for address validation and theoretical address _location Address Range data repository - This data store is the City's address range repository defining address ranges in association with transportation network features. Polygon boundary reference - Used to define various area boundaries is other feature classes where coincident with the transportation network. Does not contain polygon features. Address based extracts - Used to create flat-file extracts typically indexed by address with reference to business data typically associated with transportation network features. Thematic linear _location reference - By providing unique, stable identifiers for each linear feature, thematic data is associated to specific transportation network features via these identifiers. Thematic intersection _location reference - By providing unique, stable identifiers for each intersection feature, thematic data is associated to specific transportation network features via these identifiers. Network route tracing - Used as source for derived reference data used to determine point to point travel paths or determine optimal stop allocation along a travel path. Topological connections with segments - Used to provide a specific definition of _location for each transportation network feature. Also provides a specific definition of connection between each transportation network feature. (defines where the streets are and the relationship between them ie. 4th Ave is west of 5th Ave and 4th Ave does intersect with Cherry St) Event _location reference - Used as source for derived reference data used to locate event and linear referencing.Data source is TRANSPO.SNDSEG_PV. Updated weekly.

Search
Clear search
Close search
Google apps
Main menu