100+ datasets found
  1. P

    SLTrans Dataset

    • paperswithcode.com
    • huggingface.co
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indraneil Paul; Goran Glavaš; Iryna Gurevych (2024). SLTrans Dataset [Dataset]. https://paperswithcode.com/dataset/sltrans
    Explore at:
    Dataset updated
    Mar 7, 2024
    Authors
    Indraneil Paul; Goran Glavaš; Iryna Gurevych
    Description

    The dataset consists of source code and LLVM IR pairs generated from accepted and de-duped programming contest solutions. The dataset is divided into language configs and mode splits. The language can be one of C, C++, D, Fortran, Go, Haskell, Nim, Objective-C, Python, Rust and Swift, indicating the source files' languages. The mode split indicates the compilation mode, which can be wither Size_Optimized or Perf_Optimized.

  2. Fuτure - dataset for studies, development, and training of algorithms for...

    • zenodo.org
    bin
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurits Tani; Laurits Tani; Joosep Pata; Joosep Pata (2024). Fuτure - dataset for studies, development, and training of algorithms for reconstructing and identifying hadronically decaying tau leptons [Dataset]. http://doi.org/10.5281/zenodo.13881061
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laurits Tani; Laurits Tani; Joosep Pata; Joosep Pata
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data description

    MC Simulation


    The Fuτure dataset is intended for studies, development, and training of algorithms for reconstructing and identifying hadronically decaying tau leptons. The dataset is generated with Pythia 8, with the full detector simulation being performed by Geant4 with the CLIC-like detector setup CLICdet (CLIC_o3_v14) setup. Events are reconstructed using the Marlin reconstruction framework and interfaced with Key4HEP. Particle candidates in the reconstructed events are reconstructed using the PandoraPF algorithm.

    In this version of the dataset no γγ -> hadrons background is included.

    Samples


    This dataset contains e+e- samples with Z->ττ, ZH,H->ττ and Z->qq events, with approximately 2 million events simulated in each category.

    The following processes e+e- were simulated with Pythia 8 at sqrt(s) = 380 GeV:

    • p8_ee_qq_ecm380 [Z -> qq events]
    • p8_ee_ZH_Htautau [ZH -> Ztautau]
    • p8_ee_Z_Ztautau_ecm380 [ZH -> Ztautau]

    The .root files from the MC simulation chain are eventually processed by the software found in Github in order to create flat ntuples as the final product.


    Features


    The basis of the ntuples are the particle flow (PF) candidates from PandoraPF. Each PF candidate has four momenta, charge and particle label (electron / muon / photon / charged hadron / neutral hadron). The PF candidates in a given event are clustered into jets using generalized kt algorithm for ee collisions, with parameters p=-1 and R=0.4. The minimum pT is set to be 0 GeV for both generator level jets and reconstructed jets. The dataset contains the four momenta of the jets, with the PF candidates in the jets with the above listed properties.

    Additionally, a set of variables describing the tau lifetime are calculated using the software in Github. As tau lifetime is very short, these variables are sensitive to true tau decays. In the calculation of these lifetime variables, we use a linear approximation.

    In summary, the features found in the flat ntuples are:

    NameDescription
    reco_cand_p4s4-momenta per particle in the reco jet.
    reco_cand_chargeCharge per particle in the jet.
    reco_cand_pdgPDGid per particle in the jet.
    reco_jet_p4sRecoJet 4-momenta.
    reco_cand_dzLongitudinal impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
    reco_cand_dz_errUncertainty of the longitudinal impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
    reco_cand_dxyTransverse impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
    reco_cand_dxy_errUncertainty of the transverse impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
    gen_jet_p4sGenJet 4-momenta. Matched with RecoJet within a cone of radius dR < 0.3.
    gen_jet_tau_decaymodeDecay mode of the associated genTau. Jets that have associated leptonically decaying taus are removed, so there are no DM=16 jets. If no GenTau can be matched to GenJet within dR < 0.4, a fill value is used.
    gen_jet_tau_p4sVisible 4-momenta of the genTau. If no GenTau can be matched to GenJet within dR<0.4, a fill value is used.

    The ground truth is based on stable particles at the generator level, before detector simulation. These particles are clustered into generator-level jets and are matched to generator-level τ leptons as well as reconstructed jets. In order for a generator-level jet to be matched to generator-level τ lepton, the τ lepton needs to be inside a cone of dR = 0.4. The same applies for the reconstructed jet, with the requirement on dR being set to dR = 0.3. For each reconstructed jet, we define three target values related to τ lepton reconstruction:

    • a binary flag isTau if it was matched to a generator-level hadronically decaying τ lepton. gen_jet_tau_decaymode of value -1 indicates no match to generator-level hadronically decaying τ.
    • the categorical decay mode of the τ gen_jet_tau_decaymode in terms of the number of generator level charged and neutral hadrons. Possible gen_jet_tau_decaymode are {0, 1, . . . , 15}.
    • if matched, the visible (neglecting neutrinos), reconstructable pT of the τ lepton. This is inferred from the gen_jet_tau_p4s

    Contents:

    • qq_test.parquet
    • qq_train.parquet
    • zh_test.parquet
    • zh_train.parquet
    • z_test.parquet
    • z_train.parquet
    • data_intro.ipynb

    Dataset characteristics

    File# JetsSize
    z_test.parquet
    870 843
    171 MB
    z_train.parquet
    3 483 369
    681 MB
    zh_test.parquet
    1 068 606
    213 MB
    zh_train.parquet
    4 274 423
    851 MB
    qq_test.parquet
    6 366 715
    1.4 GB
    qq_train.parquet
    25 466 858
    5.6 GB

    The dataset consists of 6 files of 8.9 GB in total.

    How can you use these data?

    The .parquet files can be directly loaded with the Awkward Array Python library.
    An example how one might use the dataset and the features is given in data_intro.ipynb

  3. Zero Modes and Classification of Combinatorial Metamaterials

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais (2022). Zero Modes and Classification of Combinatorial Metamaterials [Dataset]. http://doi.org/10.5281/zenodo.7070963
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.

    In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.

    Modescaling_raw_data.zip

    This file contains uniformly sampled unit cell designs for metamaterial M2 and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.

    Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4)\}\).

    Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).

    Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4), M_k(5), M_k(6)\}\).

    Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+9: number of zero modes \(M_k(n_x, n_y)\) in order: \(\{M_k(1, 1), M_k(2, 2), M_k(3, 2), M_k(4, 2), M_k(1, 1), M_k(2, 2), M_k(2, 3), M_k(2, 4)\}\).

    Simulation data of metamaterial M1 for \(k_x \times k_y\) metamaterials are stored in compressed numpy array format (.npz) and can be loaded in Python with the Numpy package using the numpy.load command. These files are named "smiley_cube_x_y_\(k_x\)x\(k_y\).npz", which contain all possible metamaterial designs, and "smiley_cube_uniform_sample_x_y_\(k_x\)x\(k_y\).npz", which contain uniformly sampled metamaterial designs. The configurations are accessed with the keyword argument 'configs'. The classification is accessed with the keyword argument 'compatible'. The configurations array is of shape [Nsim, \(k_x\), \(k_y\)], the classification array is of shape [Nsim]. The building blocks in the configuration are denoted by 0 or 1, which correspond to the red/green and white/dashed building blocks respectively. Classification is 0 or 1, which corresponds to I and C respectively.

    Modescaling_classification_results.zip

    This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells of metamaterial M2 in Modescaling_raw_data.zip. The data is organized as follows.

    The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))

    col 2: slope from \(n \geq 2\) onward (undefined for class X)

    col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

    col 4: \(M_k(1)\)

    The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))

    col 2: slope from \(n \geq 2\) onward (undefined for class X)

    col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

    col 4: \(M_k(1)\)

    The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))

    col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))

    col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)

    col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)

    col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)

    col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)

    col 7: (M_k(1,

  4. Z

    Data from: FISBe: A real-world benchmark dataset for instance segmentation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reinke, Annika (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
    Explore at:
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Maier-Hein, Lena
    Rumberger, Josef Lorenz
    Ihrke, Gudrun
    Kainmueller, Dagmar
    Kandarpa, Ramya
    Hirsch, Peter
    Managan, Claire
    Reinke, Annika
    Mais, Lisa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General

    For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

    Summary

    A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

    30 completely labeled (segmented) images

    71 partly labeled images

    altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

    To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

    A set of metrics and a novel ranking score for respective meaningful method benchmarking

    An evaluation of three baseline methods in terms of the above metrics and score

    Abstract

    Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

    Dataset documentation:

    We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

    FISBe Datasheet

    Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

    Files

    fisbe_v1.0_{completely,partly}.zip

    contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

    fisbe_v1.0_mips.zip

    maximum intensity projections of all samples, for convenience.

    sample_list_per_split.txt

    a simple list of all samples and the subset they are in, for convenience.

    view_data.py

    a simple python script to visualize samples, see below for more information on how to use it.

    dim_neurons_val_and_test_sets.json

    a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

    Readme.md

    general information

    How to work with the image files

    Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

    We recommend to work in a virtual environment, e.g., by using conda:

    conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

    How to open zarr files

    Install the python zarr package:

    pip install zarr

    Opened a zarr file with:

    import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

    optional:import numpy as npraw_np = np.array(raw)

    Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

    How to view zarr image files

    We recommend to use napari to view the image data.

    Install napari:

    pip install "napari[all]"

    Save the following Python script:

    import zarr, sys, napari

    raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

    viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

    Execute:

    python view_data.py /R9F03-20181030_62_B5.zarr

    Metrics

    S: Average of avF1 and C

    avF1: Average F1 Score

    C: Average ground truth coverage

    clDice_TP: Average true positives clDice

    FS: Number of false splits

    FM: Number of false merges

    tp: Relative number of true positives

    For more information on our selected metrics and formal definitions please see our paper.

    Baseline

    To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

    License

    The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Citation

    If you use FISBe in your research, please use the following BibTeX entry:

    @misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

    Acknowledgments

    We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

    Changelog

    There have been no changes to the dataset so far.All future change will be listed on the changelog page.

    Contributing

    If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

    All contributions are welcome!

  5. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World, World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  6. A geometric shape regularity effect in the human brain: fMRI dataset

    • openneuro.org
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene (2025). A geometric shape regularity effect in the human brain: fMRI dataset [Dataset]. http://doi.org/10.18112/openneuro.ds006010.v1.0.1
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A geometric shape regularity effect in the human brain: fMRI dataset

    Authors:

    • Mathias Sablé-Meyer*
    • Lucas Benjamin
    • Cassandra Potier Watkins
    • Chenxi He
    • Maxence Pajot
    • Théo Morfoisse
    • Fosca Al Roumi
    • Stanislas Dehaene

    *Corresponding author: mathias.sable-meyer@ucl.ac.uk

    Abstract

    The perception and production of regular geometric shapes is a characteristic trait of human cultures since prehistory, whose neural mechanisms are unknown. Behavioral studies suggest that humans are attuned to discrete regularities such as symmetries and parallelism, and rely on their combinations to encode regular geometric shapes in a compressed form. To identify the relevant brain systems and their dynamics, we collected functional MRI and magnetoencephalography data in both adults and six-year-olds during the perception of simple shapes such as hexagons, triangles and quadrilaterals. The results revealed that geometric shapes, relative to other visual categories, induce a hypoactivation of ventral visual areas and an overactivation of the intraparietal and inferior temporal regions also involved in mathematical processing, whose activation is modulated by geometric regularity. While convolutional neural networks captured the early visual activity evoked by geometric shapes, they failed to account for subsequent dorsal parietal and prefrontal signals, which could only be captured by discrete geometric features or by more advanced transformer models of vision. We propose that the perception of abstract geometric regularities engages an additional symbolic mode of visual perception.

    Notes about this dataset

    We separately share the MEG dataset at https://openneuro.org/datasets/ds006012. Below are some notes about the fMRI dataset of N=20 adult participants (sub-2xx, numbers between 204 and 223), and N=22 children (sub-3xx, numbers between 301 and 325).

    • The code for the analyses is provided at https://github.com/mathias-sm/AGeometricShapeRegularityEffectHumanBrain
      However, the analyses work from already preprocessed data. Since there is no custom code per se for the preprocessing, I have not included it in the repository. To preprocess the data as was done in the published article, here is the command and software information:
      • fMRIPrep version: 20.0.5
      • fMRIPrep command: /usr/local/miniconda/bin/fmriprep /data /out participant --participant-label <label> --output-spaces MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2
    • Defacing has been performed with bidsonym running the pydeface masking, and nobrainer brain registraction pipeline.
      The published analyses have been performed on the non-defaced data. I have checked for data quality on all participants after defacing. In specific cases, I may be able to request the permission to share the original, non-defaced dataset.
    • sub-325 was acquired by a different experimenter and defaced before being shared with the rest of the research team, hence why the slightly different defacing mask. That participant was also preprocessed separately, and using a more recent fMRIPrep version: 20.2.6.
    • The data associated with the children has a few missing files. Notably:
      1. sub-313 and sub-316 are missing one run of the localizer each
      2. sub-316 has no data at all for the geometry
      3. sub-308 has eno useable data for the intruder task Since all of these still have some data to contribute to either task, all available files were kept on this dataset. The analysis code reflects these inconsistencies where required with specific exceptions.
  7. d

    Traffic Crashes - Vehicles

    • catalog.data.gov
    • data.cityofchicago.org
    Updated Jun 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2025). Traffic Crashes - Vehicles [Dataset]. https://catalog.data.gov/dataset/traffic-crashes-vehicles
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    data.cityofchicago.org
    Description

    This dataset contains information about vehicles (or units as they are identified in crash reports) involved in a traffic crash. This dataset should be used in conjunction with the traffic Crash and People dataset available in the portal. “Vehicle” information includes motor vehicle and non-motor vehicle modes of transportation, such as bicycles and pedestrians. Each mode of transportation involved in a crash is a “unit” and get one entry here. Each vehicle, each pedestrian, each motorcyclist, and each bicyclist is considered an independent unit that can have a trajectory separate from the other units. However, people inside a vehicle including the driver do not have a trajectory separate from the vehicle in which they are travelling and hence only the vehicle they are travelling in get any entry here. This type of identification of “units” is needed to determine how each movement affected the crash. Data for occupants who do not make up an independent unit, typically drivers and passengers, are available in the People table. Many of the fields are coded to denote the type and location of damage on the vehicle. Vehicle information can be linked back to Crash data using the “CRASH_RECORD_ID” field. Since this dataset is a combination of vehicles, pedestrians, and pedal cyclists not all columns are applicable to each record. Look at the Unit Type field to determine what additional data may be available for that record. The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here. Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.

  8. D

    Vision Zero High Injury Network

    • data.sfgov.org
    • healthdata.gov
    • +2more
    Updated Aug 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Vision Zero High Injury Network [Dataset]. https://data.sfgov.org/Health-and-Social-Services/Vision-Zero-High-Injury-Network/8vtn-qytr
    Explore at:
    tsv, csv, kml, kmz, application/rdfxml, xml, application/rssxml, application/geo+jsonAvailable download formats
    Dataset updated
    Aug 21, 2024
    Description

    A. SUMMARY This data was created by the San Francisco Department of Public Health (SFDPH) to update the 2017 Vision Zero High Injury Network dataset. It identifies street segments in San Francisco that have a high number of fatalities and severe injuries. This dataset is a simplified representation of the network and only indicates which streets qualified; it does not contain any additional information, including prioritization by mode or a breakdown of count reported/unreported severe/fatal injuries by corridorized segment. SFDPH shares this network with CCSF agencies to help inform where interventions could save lives and reduce injury severity.

    B. HOW THE DATASET IS CREATED The 2022 Vision Zero High Injury Network is derived from 2017-2022 severe and fatal injury data from Zuckerberg San Francisco General (ZSFG), San Francisco Police Department (SFPD), the Office of the Medical Examiner (OME), and Emergency Medical Services agencies. ZSFG patient records and SFPD victim records were probabilistically linked through the Transportation Injury Surveillance System (TISS) using LinkSolv Software. Injury severity for linked SFPD/ZSFG records was reclassified based on injury outcome as determined by ZSFG medical personnel (net 1732 police reported severe injuries) consistent with the Vision Zero Severe Injury Protocol (2017) while unlinked SFPD victim records were not changed (178 police reported severe injuries). Severe injuries captured by ZSFG but not reported to SFPD were also included in this analysis (650 unreported/unlinked geocodable severe injury patient records). Fatality data came from OME records that meet San Francisco’s Vision Zero Fatality Protocol (129 fatalities). Only transportation-related injuries resulting in a severe injury or fatality were used in this analysis. Each street centerline segment block was converted into ~0.25 mile overlapping corridorized sections using ArcPy. These sections were intersected with the severe/fatal injury data. Only severe/fatal injuries with the same primary street as the corridorized section were counted for that section. The count of severe/fatal injuries was then normalized by the sections mileage to derive the number of severe/fatal injuries per mile. A threshold of ≥10 severe/fatal injuries per mile was used as the threshold to determine if a corridorized segment qualified for inclusion into the network. A full methodology of the 2022 update to the Vision Zero High Injury Network can be found here: https://www.visionzerosf.org/wp-content/uploads/2023/03/2022_Vision_Zero_Network_Update_Methodology.pdf

    C. UPDATE PROCESS This dataset will be updated on an as needed basis.

    D. HOW TO USE THIS DATASET The 2022 Vision Zero Network represents a snapshot in time (2017-2021) where severe and fatal injuries are most concentrated. It may not reflect current conditions or changes to the City’s transportation system. Although prior incidents can be indicative of future incidents, the 2022 Vision Zero High Injury Network is not a prediction (probability) of future risk. The High Injury Network approach is in contrast to risk-based analysis, which focuses on locations determined to be more dangerous with increased risk or danger often calculated by dividing the number of injuries or collisions by vehicle volumes to estimate risk of injury per vehicle. The High Injury Network provides information regarding the streets where injuries, particularly severe and fatal, are concentrated in San Francisco based on injury counts; it is not an assessment of whether a street or particular location is dangerous. The 2022 Vision Zero Network is derived from the more severe injury outcomes (count of severe/fatal injuries) and may not cover locations with high numbers of less severe injury collisions. Hospital and emergency medical service records from which SFPD-unreported injury and reclassified injury collisions are derived are protected by the Health Insurance Portability and Accountability Act and state medical privacy laws, thus have strict confidentiality and privacy requirements. As of November 2021, SFDPH is working in conjunction with SFDPH’s Office of Compliance and Privacy Affairs, Zuckerberg San Francisco General Hospital (“ZSFG”) and the SFMTA to determine how SFDPH can share the data in compliance with federal and state privacy laws. Intersection and other small-area specific counts of severe/fatal injuries have thus been intentionally excluded from this document as data sharing requirements are yet to be determined.

    E. RELATED DATASETS

  9. Traffic Crashes Resulting in Fatality
  10. Traffic Crashes Resulting in Injury
  11. Traffic Crashes Resulting in Injury: Parties Involved
  12. Traffic Crashes Resulting in Injury: Victims Involved
  13. Vision Zero High Injury Network: 2022, GIS Map

  • Z

    Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

    • data.niaid.nih.gov
    Updated Jan 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrej Hrovat (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
    Explore at:
    Dataset updated
    Jan 6, 2023
    Dataset provided by
    Miha Mohorčič
    Mihael Mohorčič
    Andrej Hrovat
    Aleš Simončič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

    This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

    It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

    Related dataset

    Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

    Measurement setup

    The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

    The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

    The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

    Data preprocessing

    The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

    PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

    Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
    Missing IE fields in the captured PR are not included in PR_IE_DATA.

    When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

    {'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

    where PR_data is structured as follows:

    { 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

    This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

    At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

    Folder structure

    For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

    The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

    Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

    Environments description

    The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

    Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

    Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

    Known dataset shortcomings

    Due to technical and physical limitations, the dataset contains some identified deficiencies.

    PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

    Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

    The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

     Location 1 - Piazza del Duomo - Chierici
    

    The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

     Location 2 - Via Etnea - Piazza del Duomo
    

    The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

     Location 3 - Via Etnea - Piazza Università
    

    Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

     Location 4 - Piazza Università
    

    This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

    Recognitions

    The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.

  • d

    Spaceborne Imaging Radar C-band (SIR-C)

    • catalog.data.gov
    • data.nasa.gov
    • +3more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DOI/USGS/EROS (2025). Spaceborne Imaging Radar C-band (SIR-C) [Dataset]. https://catalog.data.gov/dataset/spaceborne-imaging-radar-c-band-sir-c
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    DOI/USGS/EROS
    Description

    Spaceborne Imaging Radar-C (SIR-C) is part of an imaging radar system that was flown on board two Space Shuttle flights (9 - 20 April, 1994 and 30 September - 11 October, 1994). The USGS distributes the C-band (5.8 cm) and L-band (23.5 cm) data. All X-band (3 cm) data is distributed by DLR. There are several types of products that are derived from the SIR-C data: Survey Data is intended as a "quick look" browse for viewing the areas that were imaged by the SIR-C system. The data consists of a strip image of an entire data swath. Resolution is approximately 100 meters, processed to a 50-meter pixel spacing. Files are distributed via File Transfer Protocol (FTP) download. Precision (Standard) Data consists of a frame image of a data segment, which represents a processed subset of the data swath. It contains high-resolution multifrequency and multipolarization data. All precision data is in CEOS format. The following types of precision data products are available: Single-Look Complex (SLC) consists of one single-look file for each scene, per frequency. Each data segment will cover 50 kilometers along the flight track, and is broken into four processing runs (two L band, two C-band). Resolution and polarization will depend on the mode in which the data was collected. Available as calibrated or uncalibrated data. Multi-Look Complex (MLC) is based on an averaging of multiple looks, and consists of one file for each scene per frequency. Each data segment will cover 100 km along the flight track, and is broken into two processing runs (one L band and one C band). Polarization will depend on the modes in which the looks were collected. The data is available in 12.5- or 25-meter pixel spacing. Reformatted Signal Data (RSD) consists of the raw radar signal data only. Each data segment will cover 100 km along the flight track, and the segment will be broken into two processing runs (L-band and C-band). Interferometry Data consists of experimental multitemporal data that covers the same area. Most data takes were collected during repeat passes within the second flight (days 7, 8, 9, and/or 10). In addition, nine data takes were collected during the second flight that were repeat passes of the first flight. Most data takes were also single polarization, although dual and quad polarization data was also collected on some passes. A Digital Elevation Model (DEM) is not included with any of the SIR-C interferometric data. The following types of interferometry products are available: Interferometric Single-Look Complex (iSLC) consists of two or more uncalibrated SLC images that have been processed with the same Doppler centroid to allow interferometric processing. Each frame image covers 50 kilometers along the flight track. The data is available in CEOS format. Raw Interferogram product (RIn) involves the combination of two data takes over the same area to produce an interferogram for each frequency (L-band and C-band). The data is available in TAR format. Reformatted Signal Data (RSD) consists of radar signal data that has been processed from two or more data takes over the same area, but the data has not been combined. Although this is not technically an interferometric product, the RSD can then be used to generate an interferogram. Each frame will cover 100 km along the flight track. The data is available in CEOS format.

  • Mode of travel

    • gov.uk
    Updated Apr 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2025). Mode of travel [Dataset]. https://www.gov.uk/government/statistical-data-sets/nts03-modal-comparisons
    Explore at:
    Dataset updated
    Apr 16, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    Accessible Tables and Improved Quality

    As part of the Analysis Function Reproducible Analytical Pipeline Strategy, processes to create all National Travel Survey (NTS) statistics tables have been improved to follow the principles of Reproducible Analytical Pipelines (RAP). This has resulted in improved efficiency and quality of NTS tables and therefore some historical estimates have seen very minor change, at least the fifth decimal place.

    All NTS tables have also been redesigned in an accessible format where they can be used by as many people as possible, including people with an impaired vision, motor difficulties, cognitive impairments or learning disabilities and deafness or impaired hearing.

    If you wish to provide feedback on these changes then please email national.travelsurvey@dft.gov.uk.

    Revision to table NTS9919

    On the 16th April 2025, the figures in table NTS9919 have been revised and recalculated to include only day 1 of the travel diary where short walks of less than a mile are recorded (from 2017 onwards), whereas previous versions included all days. This is to more accurately capture the proportion of trips which include short walks before a surface rail stage. This revision has resulted in fewer available breakdowns than previously published due to the smaller sample sizes.

    Trips, stages, distance and time spent travelling

    NTS0303: https://assets.publishing.service.gov.uk/media/66ce0f118e33f28aae7e1f75/nts0303.ods">Average number of trips, stages, miles and time spent travelling by mode: England, 2002 onwards (ODS, 53.9 KB)

    NTS0308: https://assets.publishing.service.gov.uk/media/66ce0f128e33f28aae7e1f76/nts0308.ods">Average number of trips and distance travelled by trip length and main mode; England, 2002 onwards (ODS, 191 KB)

    NTS0312: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f71/nts0312.ods">Walks of 20 minutes or more by age and frequency: England, 2002 onwards (ODS, 35.1 KB)

    NTS0313: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f72/nts0313.ods">Frequency of use of different transport modes: England, 2003 onwards (ODS, 27.1 KB)

    NTS0412: https://assets.publishing.service.gov.uk/media/66ce0f1325c035a11941f653/nts0412.ods">Commuter trips and distance by employment status and main mode: England, 2002 onwards (ODS, 53.8 KB)

    NTS0504: https://assets.publishing.service.gov.uk/media/66ce0f141aaf41b21139cf7d/nts0504.ods">Average number of trips by day of the week or month and purpose or main mode: England, 2002 onwards (ODS, 141 KB)

    <h2 id=

  • Estimated stand-off distance between ADS-B equipped aircraft and obstacles

    • zenodo.org
    • data.niaid.nih.gov
    jpeg, zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Weinert; Andrew Weinert (2024). Estimated stand-off distance between ADS-B equipped aircraft and obstacles [Dataset]. http://doi.org/10.5281/zenodo.7741273
    Explore at:
    zip, jpegAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Weinert; Andrew Weinert
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Summary:

    Estimated stand-off distance between ADS-B equipped aircraft and obstacles. Obstacle information was sourced from the FAA Digital Obstacle File and the FHWA National Bridge Inventory. Aircraft tracks were sourced from processed data curated from the OpenSky Network. Results are presented as histograms organized by aircraft type and distance away from runways.

    Description:

    For many aviation safety studies, aircraft behavior is represented using encounter models, which are statistical models of how aircraft behave during close encounters. They are used to provide a realistic representation of the range of encounter flight dynamics where an aircraft collision avoidance system would be likely to alert. These models currently and have historically have been limited to interactions between aircraft; they have not represented the specific interactions between obstacles and aircraft equipped transponders. In response, we calculated the standoff distance between obstacles and ADS-B equipped manned aircraft.

    For robustness, this assessment considered two different datasets of manned aircraft tracks and two datasets of obstacles. For robustness, MIT LL calculated the standoff distance using two different datasets of aircraft tracks and two datasets of obstacles. This approach aligned with the foundational research used to support the ASTM F3442/F3442M-20 well clear criteria of 2000 feet laterally and 250 feet AGL vertically.

    The two datasets of processed tracks of ADS-B equipped aircraft curated from the OpenSky Network. It is likely that rotorcraft were underrepresented in these datasets. There were also no considerations for aircraft equipped only with Mode C or not equipped with any transponders. The first dataset was used to train the v1.3 uncorrelated encounter models and referred to as the “Monday” dataset. The second dataset is referred to as the “aerodrome” dataset and was used to train the v2.0 and v3.x terminal encounter model. The Monday dataset consisted of 104 Mondays across North America. The other dataset was based on observations at least 8 nautical miles within Class B, C, D aerodromes in the United States for the first 14 days of each month from January 2019 through February 2020. Prior to any processing, the datasets required 714 and 847 Gigabytes of storage. For more details on these datasets, please refer to "Correlated Bayesian Model of Aircraft Encounters in the Terminal Area Given a Straight Takeoff or Landing" and “Benchmarking the Processing of Aircraft Tracks with Triples Mode and Self-Scheduling.”

    Two different datasets of obstacles were also considered. First was point obstacles defined by the FAA digital obstacle file (DOF) and consisted of point obstacle structures of antenna, lighthouse, meteorological tower (met), monument, sign, silo, spire (steeple), stack (chimney; industrial smokestack), transmission line tower (t-l tower), tank (water; fuel), tramway, utility pole (telephone pole, or pole of similar height, supporting wires), windmill (wind turbine), and windsock. Each obstacle was represented by a cylinder with the height reported by the DOF and a radius based on the report horizontal accuracy. We did not consider the actual width and height of the structure itself. Additionally, we only considered obstacles at least 50 feet tall and marked as verified in the DOF.

    The other obstacle dataset, termed as “bridges,” was based on the identified bridges in the FAA DOF and additional information provided by the National Bridge Inventory. Due to the potential size and extent of bridges, it would not be appropriate to model them as point obstacles; however, the FAA DOF only provides a point location and no information about the size of the bridge. In response, we correlated the FAA DOF with the National Bridge Inventory, which provides information about the length of many bridges. Instead of sizing the simulated bridge based on horizontal accuracy, like with the point obstacles, the bridges were represented as circles with a radius of the longest, nearest bridge from the NBI. A circle representation was required because neither the FAA DOF or NBI provided sufficient information about orientation to represent bridges as rectangular cuboid. Similar to the point obstacles, the height of the obstacle was based on the height reported by the FAA DOF. Accordingly, the analysis using the bridge dataset should be viewed as risk averse and conservative. It is possible that a manned aircraft was hundreds of feet away from an obstacle in actuality but the estimated standoff distance could be significantly less. Additionally, all obstacles are represented with a fixed height, the potentially flat and low level entrances of the bridge are assumed to have the same height as the tall bridge towers. The attached figure illustrates an example simulated bridge.

    It would had been extremely computational inefficient to calculate the standoff distance for all possible track points. Instead, we define an encounter between an aircraft and obstacle as when an aircraft flying 3069 feet AGL or less comes within 3000 feet laterally of any obstacle in a 60 second time interval. If the criteria were satisfied, then for that 60 second track segment we calculate the standoff distance to all nearby obstacles. Vertical separation was based on the MSL altitude of the track and the maximum MSL height of an obstacle.

    For each combination of aircraft track and obstacle datasets, the results were organized seven different ways. Filtering criteria were based on aircraft type and distance away from runways. Runway data was sourced from the FAA runways of the United States, Puerto Rico, and Virgin Islands open dataset. Aircraft type was identified as part of the em-processing-opensky workflow.

    • All: No filter, all observations that satisfied encounter conditions
    • nearRunway: Aircraft within or at 2 nautical miles of a runway
    • awayRunway: Observations more than 2 nautical miles from a runway
    • glider: Observations when aircraft type is a glider
    • fwme: Observations when aircraft type is a fixed-wing multi-engine
    • fwse: Observations when aircraft type is a fixed-wing single engine
    • rotorcraft: Observations when aircraft type is a rotorcraft

    License

    This dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International(CC BY-NC-ND 4.0).

    This license requires that reusers give credit to the creator. It allows reusers to copy and distribute the material in any medium or format in unadapted form and for noncommercial purposes only. Only noncommercial use of your work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Exceptions are given for the not for profit standards organizations of ASTM International and RTCA.

    MIT is releasing this dataset in good faith to promote open and transparent research of the low altitude airspace. Given the limitations of the dataset and a need for more research, a more restrictive license was warranted. Namely it is based only on only observations of ADS-B equipped aircraft, which not all aircraft in the airspace are required to employ; and observations were source from a crowdsourced network whose surveillance coverage has not been robustly characterized.

    As more research is conducted and the low altitude airspace is further characterized or regulated, it is expected that a future version of this dataset may have a more permissive license.

    Distribution Statement

    DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

    © 2021 Massachusetts Institute of Technology.

    Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

    This material is based upon work supported by the Federal Aviation Administration under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration.

    This document is derived from work done for the FAA (and possibly others); it is not the direct product of work done for the FAA. The information provided herein may include content supplied by third parties. Although the data and information contained herein has been produced or processed from sources believed to be reliable, the Federal Aviation Administration makes no warranty, expressed or implied, regarding the accuracy, adequacy, completeness, legality, reliability or usefulness of any information, conclusions or recommendations provided herein. Distribution of the information contained herein does not constitute an endorsement or warranty of the data or information provided herein by the Federal Aviation Administration or the U.S. Department of Transportation. Neither the Federal Aviation Administration nor the U.S. Department of

  • Good Growth Plan 2014-2019 - Japan

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syngenta (2023). Good Growth Plan 2014-2019 - Japan [Dataset]. https://microdata.worldbank.org/index.php/catalog/5634
    Explore at:
    Dataset updated
    Jan 27, 2023
    Dataset authored and provided by
    Syngenta
    Time period covered
    2014 - 2019
    Area covered
    Japan
    Description

    Abstract

    Syngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.

    Geographic coverage

    National coverage

    Analysis unit

    Agricultural holdings

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.

    B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).

    C. Selection procedure The respondents were picked randomly using a “quota based random sampling” procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.

    BF Screened from Japan were selected based on the following criterion: Location: Hokkaido Tokachi (JA Memuro, JA Otofuke, JA Tokachi Shimizu, JA Obihiro Taisho) --> initially focus on Memuro, Otofuke, Tokachi Shimizu, Obihiro Taisho // Added locations in GGP 2015 due to change of RF: Obhiro, Kamikawa, Abashiri
    BF: no use of in furrow application (Amigo) - no use of Amistar

    Contract farmers of snacks and other food companies --> screening question: 'Do you have quality contracts in place with snack and food companies for your potato production? Y/N --> if no, screen out

    Increase of marketable yield --> screening question: 'Are you interested in growing branded potatoes (premium potatoes for processing industry)? Y/N --> if no, screen out

    Potato growers for process use
    Background info: No mention of Syngenta Background info: - Labor cost is very serious issue: In general, labor cost in Japan is very high. Growers try to reduce labor cost by mechanization. Percentage of labor cost in production cost. They would like to manage cost of labor - Quality and yield driven

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Data collection tool for 2019 covered the following information:

    (A) PRE- HARVEST INFORMATION

    PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment

    (B) HARVEST INFORMATION

    PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation

    See all questionnaires in external materials tab

    Cleaning operations

    Data processing:

    Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.

    Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.

    • Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.

    • Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.

    • Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.

    • Cross-validation of the answers: o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size) o Kynetec cross validates the answers of the growers in three different ways: 1. Within the grower (check if growers respond consistently during the interview) 2. Across years (check if growers respond consistently throughout the years) 3. Within cluster (compare a grower's responses with those of others in the group) o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.

    • Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.

    • Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.

    • It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.

    Data appraisal

    Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:

    For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.

  • d

    Monthly Modal Time Series

    • catalog.data.gov
    • data.transportation.gov
    • +3more
    Updated Jul 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2025). Monthly Modal Time Series [Dataset]. https://catalog.data.gov/dataset/monthly-modal-time-series
    Explore at:
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    Federal Transit Administration
    Description

    Modal Service data and Safety & Security (S&S) public transit time series data delineated by transit/agency/mode/year/month. Includes all Full Reporters--transit agencies operating modes with more than 30 vehicles in maximum service--to the National Transit Database (NTD). This dataset will be updated monthly. The monthly ridership data is released one month after the month in which the service is provided. Records with null monthly service data reflect late reporting. The S&S statistics provided include both Major and Non-Major Events where applicable. Events occurring in the past three months are excluded from the corresponding monthly ridership rows in this dataset while they undergo validation. This dataset is the only NTD publication in which all Major and Non-Major S&S data are presented without any adjustment for historical continuity.

  • m

    Graphite//LFP synthetic training prognosis dataset

    • data.mendeley.com
    Updated May 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthieu Dubarry (2020). Graphite//LFP synthetic training prognosis dataset [Dataset]. http://doi.org/10.17632/6s6ph9n8zg.1
    Explore at:
    Dataset updated
    May 6, 2020
    Authors
    Matthieu Dubarry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This training dataset was calculated using the mechanistic modeling approach. See the “Benchmark Synthetic Training Data for Artificial Intelligence-based Li-ion Diagnosis and Prognosis“ publication for mode details. More details will be added when published. The prognosis dataset was harder to define as there are no limits on how the three degradation modes can evolve. For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).

    %degradation=a × cycle+ (exp^(b×cycle)-1) (1)

    Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.

    %LLI=%LLI+p8 (LAM_PE-PT) (2)

    Where PT was calculated with equation (3) from [60].

    PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)

    Varying all those parameters accounted for more than 130,000 individual duty cycles. With one voltage curve for every 100 cycles. 6 MATLAB© .mat files are included: The GIC-LFP_duty_other.mat file contains 12 variables Qnorm: normalize capacity scale for all voltage curves

    P1 to p8: values used to generate the duty cycles

    Key: index for which values were used for each degradation paths. 1 -p1, … 8 - p8

    QL: capacity loss, one line per path, one column per 100 cycles.

    File GIC-LFP_duty_LLI-LAMsvalues.mat contains the values for LLI, LAMPE and LAMNE for all cycles (1line per 100 cycles) and duty cycles (columns).

    Files GIC-LFP_duty_1 to _4 files contains the voltage data split into 1GB chunks (40,000 simulations). Each cell corresponds to 1 line in the key variable. Inside each cell, one colunm per 100 cycles.

  • Brazil Visitor Arrivals: Marine: North America: Canada

    • ceicdata.com
    Updated Jul 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2018). Brazil Visitor Arrivals: Marine: North America: Canada [Dataset]. https://www.ceicdata.com/en/brazil/no-of-visitors-arrivals-by-mode-of-transport
    Explore at:
    Dataset updated
    Jul 20, 2018
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2018 - Dec 1, 2018
    Area covered
    Brazil
    Variables measured
    Tourism Statistics
    Description

    Visitor Arrivals: Marine: North America: Canada data was reported at 0.503 Person th in Dec 2018. This records an increase from the previous number of 0.275 Person th for Nov 2018. Visitor Arrivals: Marine: North America: Canada data is updated monthly, averaging 0.038 Person th from Jan 1989 (Median) to Dec 2018, with 360 observations. The data reached an all-time high of 1.024 Person th in Feb 2009 and a record low of 0.000 Person th in Oct 2018. Visitor Arrivals: Marine: North America: Canada data remains active status in CEIC and is reported by Ministry of Tourism. The data is categorized under Brazil Premium Database’s Tourism Sector – Table BR.QB003: No of Visitors Arrivals: by Mode of Transport. According Ministry of Tourism, the monthly Visitor Arrivals will release in annual basis due to Tourism Ministry receive once in the year some input data from the Federal Policy, based on these data they perform the estimation for all the months in the year

  • Parameters and predicted probabilities of mode choice and ride pass...

    • zenodo.org
    csv
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiyuan Ren; Xiyuan Ren; Y.J. Joseph Chow; Y.J. Joseph Chow; Venktesh Pandey; Venktesh Pandey (2024). Parameters and predicted probabilities of mode choice and ride pass subscription for microtransit in Arlington, TX [Dataset]. http://doi.org/10.5281/zenodo.13379435
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xiyuan Ren; Xiyuan Ren; Y.J. Joseph Chow; Y.J. Joseph Chow; Venktesh Pandey; Venktesh Pandey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Arlington, Texas
    Description

    We provide parameters and predicted probabilities of mode choice (at trip level) and ride pass subscription (at individual level) for microtransit in Arlington, TX. The parameters are estimated by an agent-based, nested behavioral model that is developed in the C2SMARTER project “Multi-modal Tripchain Planner for Disadvantaged Travelers to Incentivize Transit Usage” (Award #69A3551747124). We separate the choice related to microtransit into two parts: travel mode choice and ride pass subscription choice. Synthetic population data from Replica Inc. and microtransit service data from City of Arlington are used for estimation.

    In the lower-branch travel mode choice, individuals decide on mode to use by considering factors such as travel time, cost, trip purpose, tour type, and mode-specific preferences. The mode choice set includes driving, walking, biking, carpool, and microtransit. Different from traditional mode choice models, we allow time and cost parameters to vary across individuals, and we allow mode specific constants to vary across trip OD pairs. This makes sense when we do not have sufficient data for socioeconomic attributes and built environment variables. The assumption we made here is that the impacts of these unobserved variables are included in the nonparametric distribution of individual and OD pair-level parameters.

    In the upper-branch ride pass subscription choice, individuals decide whether to purchase a weekly ride pass, a monthly ride pass, or no ride pass at all. By subscribing to a ride pass, travelers pay an amount of money in advance and enjoy free microtransit trips until the ride pass expires. The utility of purchasing a ride pass consists of four components: (1) the utility related to the prices of ride passes, (2) the utility related to the change in consumer surplus (or compensating variation) brought by free microtransit trips with a ride pass, (3) the utility specific to microtransit users, and (4) the alternative specific constant of the ride pass. Given the data availability, we consider the ride pass model as a simple MNL model with six parameters to calibrate. These ride pass parameters are calibrated using the Nelder-Mead Simplex Method. The cost function to minimize is the squared distance between the predicted ride pass market share and the observed one.

    Accordingly, this dataset consists of six .csv files:

    • Mode_choice_parameters_weekday.csv
    • Mode_choice_parameters_weekend.csv
    • RidePass_subscription_parameters.csv
    • Mode_choice_probability_weekday.csv
    • Mode_choice_probability_weekend.csv
    • Ridepass_subscription_probability.csv

    Field definition in "Mode_choice_parameters_weekday.csv" and "Mode_choice_parameters_weekend.csv"

    Field NameDescription
    iidIDs of each synthetic individual
    trip_idIDs of each synthetic trip
    origin_bgrpFIPs code of the block group where the trip starts
    destination_bgrpFIPs code of the block group where the trip ends
    B_AUTO_TTThe parameter of auto travel time (vary across individuals)
    B_MICRO_TTThe parameter of microtransit waiting time (vary across individuals)
    B_NON_AUTO_TTThe parameter of non-auto travel time (vary across individuals)
    B_COSTThe parameter of travel cost (vary across individuals)
    ASC_MIRCOThe alternative specific constant of microtransit (vary across trip OD pairs)
    ASC_DRIVINGThe alternative specific constant of driving (vary across trip OD pairs)
    ASC_BIKINGThe alternative specific constant of biking (vary across trip OD pairs)
    ASC_WALKINGThe alternative specific constant of walking (vary across trip OD pairs)
    MICRO_P_SHOPPINGThe interaction effect between microtransit and shopping trip purpose (generic)
    MICRO_P_SCHOOLThe interaction effect between microtransit and school trip purpose (generic)
    MICRO_P_OTHERThe interaction effect between microtransit and other trip purpose (generic)
    MICRO_T_COMMUTEThe interaction effect between microtransit and commute tour type (generic)
    MICRO_T_HOME_BASEDThe interaction effect between microtransit and home-based tour type (generic)

    Field definition in "RidePass_subscription_parameters.csv"

    Field NameDescription
    B_COST_RPA transfer factor from trip fare to ride pass price
    B_CS_WEEKDAYThe parameter of increased consumer surplus (due to the ride pass) on weekday
    B_CS_WEEKENDThe parameter of increased consumer surplus (due to the ride pass) on weekend
    B_MICRO_USERThe parameter of a binary variable indicating former microtransit users
    ASC_WRPThe alternative specific constant of subscribing weekly ride pass
    ASC_MRPThe alternative specific constant of subscribing monthly ride pass

    Field definition in "Mode_choice_probability_weekday.csv" and "Mode_choice_probability_weekend.csv"

    Field NameDescription
    iidIDs of each synthetic individual
    trip_idIDs of each synthetic trip
    origin_bgrpFIPs code of the block group where the trip starts
    destination_bgrpFIPs code of the block group where the trip ends
    P_bikingThe predicted probability of choosing biking
    P_carpoolThe predicted probability of choosing carpool
    P_microtransitThe predicted probability of choosing microtransit
    P_drivingThe predicted probability of choosing driving
    P_walkingThe predicted probability of choosing walking

    Field definition in "Ridepass_subscription_probability.csv"

    Field NameDescription
    iidIDs of each synthetic individual
    Micro_userA binary variable indicating whether the individual used microtransit before
    Population_segSegment ID of the synthetic individual
    BLOCKGROUPFIPs code of the home block group
    P_weekly_passThe predicted probability of subscribing weekly ride pass
    P_monthly_passThe predicted probability of subscribing monthly ride pass
    P_NoneThe predicted probability of subscribing no ride pass
  • d

    2023 NTD Annual Data - General Transit Feed Specification (GTFS) Weblinks

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Dec 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2024). 2023 NTD Annual Data - General Transit Feed Specification (GTFS) Weblinks [Dataset]. https://catalog.data.gov/dataset/2023-ntd-annual-data-general-transit-feed-specification-gtfs-weblinks
    Explore at:
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    Federal Transit Administration
    Description

    As of Report Year (RY) 2023, FTA requires that reporters with fixed route modes create and maintain a public domain general transit feed specification (GTFS) dataset that reflects their fixed route service. This specification allows for the mapping and other geospatial data visualization and analyses of key transit elements such as stops, routes, and trips. At least one GTFS weblink is provided by the transit agency for each fixed route bus mode and type of service. These include all Rail modes as well as Bus, Bus Rapid Transit, Commuter Bus, Ferryboat and Trolleybus. GTFS requires that an overarching compressed file contain, at a minimum, seven underlying text files: (a) Agency; (b) Stops; (c) Routes; (d) Trips; (e) Stop Times; (f) Calendar or Calendar Dates.txt; and (g) Feed Info.txt. An eighth file, Shapes.txt, is an optional file. FTA collects and publishes these links for further analysis using related GTFS files. FTA is not responsible for managing the websites that host these files, and users with questions regarding the GTFS data are encouraged to contact the transit agency. In many cases, publicly hosted weblinks could not be provided (i.e., due to constraints within the transit agency), but the agency was able to produce a zip file of the required GTFS data. Demand Response, Vanpool, and other non-fixed route modes are excluded. The column "Alternate Format" indicates that the agency provided FTA a weblink in an alternate format with some justification for doing so. The file "Waived" indicates that no GTFS files were produced and FTA granted the agency a waiver from the requirement in Report Year 2023. NTD Data Tables organize and summarize data from the 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2023 General Transit Feed Specification database file. If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

  • o

    University SET data, with faculty and courses characteristics

    • openicpsr.org
    Updated Sep 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
    Explore at:
    Dataset updated
    Sep 12, 2021
    Authors
    Under blind review in refereed journal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○

  • Brazil Visitor Arrivals: Air: Central America & Caribbean: Costa Rica

    • ceicdata.com
    Updated Jul 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2018). Brazil Visitor Arrivals: Air: Central America & Caribbean: Costa Rica [Dataset]. https://www.ceicdata.com/en/brazil/no-of-visitors-arrivals-by-mode-of-transport
    Explore at:
    Dataset updated
    Jul 20, 2018
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2018 - Dec 1, 2018
    Area covered
    Brazil
    Variables measured
    Tourism Statistics
    Description

    Visitor Arrivals: Air: Central America & Caribbean: Costa Rica data was reported at 0.920 Person th in Dec 2018. This records a decrease from the previous number of 0.925 Person th for Nov 2018. Visitor Arrivals: Air: Central America & Caribbean: Costa Rica data is updated monthly, averaging 0.433 Person th from Jan 1989 (Median) to Dec 2018, with 360 observations. The data reached an all-time high of 5.177 Person th in Jun 2014 and a record low of 0.033 Person th in Apr 1993. Visitor Arrivals: Air: Central America & Caribbean: Costa Rica data remains active status in CEIC and is reported by Ministry of Tourism. The data is categorized under Brazil Premium Database’s Tourism Sector – Table BR.QB003: No of Visitors Arrivals: by Mode of Transport. According Ministry of Tourism, the monthly Visitor Arrivals will release in annual basis due to Tourism Ministry receive once in the year some input data from the Federal Policy, based on these data they perform the estimation for all the months in the year

  • Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indraneil Paul; Goran Glavaš; Iryna Gurevych (2024). SLTrans Dataset [Dataset]. https://paperswithcode.com/dataset/sltrans

    SLTrans Dataset

    Explore at:
    Dataset updated
    Mar 7, 2024
    Authors
    Indraneil Paul; Goran Glavaš; Iryna Gurevych
    Description

    The dataset consists of source code and LLVM IR pairs generated from accepted and de-duped programming contest solutions. The dataset is divided into language configs and mode splits. The language can be one of C, C++, D, Fortran, Go, Haskell, Nim, Objective-C, Python, Rust and Swift, indicating the source files' languages. The mode split indicates the compilation mode, which can be wither Size_Optimized or Perf_Optimized.

    Search
    Clear search
    Close search
    Google apps
    Main menu