Accessible Tables and Improved Quality
As part of the Analysis Function Reproducible Analytical Pipeline Strategy, processes to create all National Travel Survey (NTS) statistics tables have been improved to follow the principles of Reproducible Analytical Pipelines (RAP). This has resulted in improved efficiency and quality of NTS tables and therefore some historical estimates have seen very minor change, at least the fifth decimal place.
All NTS tables have also been redesigned in an accessible format where they can be used by as many people as possible, including people with an impaired vision, motor difficulties, cognitive impairments or learning disabilities and deafness or impaired hearing.
If you wish to provide feedback on these changes then please email national.travelsurvey@dft.gov.uk.
Revision to table NTS9919
On the 16th April 2025, the figures in table NTS9919 have been revised and recalculated to include only day 1 of the travel diary where short walks of less than a mile are recorded (from 2017 onwards), whereas previous versions included all days. This is to more accurately capture the proportion of trips which include short walks before a surface rail stage. This revision has resulted in fewer available breakdowns than previously published due to the smaller sample sizes.
NTS0303: https://assets.publishing.service.gov.uk/media/66ce0f118e33f28aae7e1f75/nts0303.ods">Average number of trips, stages, miles and time spent travelling by mode: England, 2002 onwards (ODS, 53.9 KB)
NTS0308: https://assets.publishing.service.gov.uk/media/66ce0f128e33f28aae7e1f76/nts0308.ods">Average number of trips and distance travelled by trip length and main mode; England, 2002 onwards (ODS, 191 KB)
NTS0312: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f71/nts0312.ods">Walks of 20 minutes or more by age and frequency: England, 2002 onwards (ODS, 35.1 KB)
NTS0313: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f72/nts0313.ods">Frequency of use of different transport modes: England, 2003 onwards (ODS, 27.1 KB)
NTS0412: https://assets.publishing.service.gov.uk/media/66ce0f1325c035a11941f653/nts0412.ods">Commuter trips and distance by employment status and main mode: England, 2002 onwards (ODS, 53.8 KB)
NTS0504: https://assets.publishing.service.gov.uk/media/66ce0f141aaf41b21139cf7d/nts0504.ods">Average number of trips by day of the week or month and purpose or main mode: England, 2002 onwards (ODS, 141 KB)
<h2 id=
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
This dataset contains data on transit agency employees as reported to the National Transit Database in the 2022 and 2023 report years. It is organized by agency, mode, type of service, and Employee Type (Full Time or Part Time Employee).
The NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis
This dataset is based on the 2022 and 2023 Employees database files, which are published to the NTD at https://transit.dot.gov/ntd/ntd-data.
Only Full Reporters report data on employees, and only for Directly Operated modes. Other reporter types, and Purchased Transportation service, do not appear in this file.
Access to up-to-date socio-economic data is a widespread challenge in Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
In Fiji, monthly HFPS data collection commenced in February 2024 on topics including employment, income, food security, health, food prices, assets and well-being. Fieldwork took place in rounds roughly one month in length in a panel method, where each household was only recontacted at least thirty days after the previous interview. Each month has approximately 700 households in the sample and is representative of urban and rural areas and divisions. This dataset contains combined monthly survey data between February and October 2024. There is one date file for household level data with a unique household ID, and a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households
Urban and rural areas of Fiji.
Household, invidiual.
Sample survey data [ssd]
The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each division month to month. It had a probability-based weighted design, with a proportionate stratification to achieve geographical representation. A panel was established from the outset, where in each subsequent round after February 2024, the survey firm would first attempt to contact all households from the previous month and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households. This dataset includes 4,120 completed interviews with 1,360 unique households.
Computer Assisted Telephone Interview [cati]
The questionnaire, which can be found in the External Resources of this documentation, is available in English, with iTaukei translation available. There were few changes to the questionnaire across the survey months, with some sections only asked in some rounds, such as the digital governance module in rounds 3 and 4. The survey instrument consists of the following modules, with notes in parentheses on dates of collection for questions which were not collected consistently across the whole survey period: - Basic information, - Household roster, - Access to Services and Shocks (additional questions on water disruption were asked since April 2024) - Subjective well-being - Food insecurity experience scale (FIES) - Views on the economy and government (some questions were added since May 2024) - Household income - Labor - Agriculture - Medical service utilization - Climate migration (April 2024) - Digital government services (May and June 2024)
The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey's monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, food security, household income, agriculture activities, social protection, subjective well-being, access to services, shocks, and perceptions. The household identifier (panel_hid) is available in both the household dataset and the individual dataset. The individual identifier (panel_indid) can be found in the individual dataset.
UNIDO maintains a variety of databases comprising statistics of overall industrial growth, detailed data on business structure and statistics on major indicators of industrial performance by country in the historical time series. Among which is the UNIDO Industrial Statistics Database at the 3 & 4-digit levels of ISIC Revision 3 (INDSTAT4- Rev.3).
INDSTAT4 contains highly disaggregated data on the manufacturing sector for the period 1985 onwards. Comparability of data over time and across the countries has been the main priority of developing and updating this database. INDSTAT4 offers a unique possibility of in-depth analysis of the structural transformation of economies over time. The database contains seven principle indicators of industrial statistics. The data are arranged at the 3- and 4-digit levels of the International Standard Industrial Classification of All Economic Activities (ISIC) Revision 3 pertaining to the manufacturing, which comprises more than 150 manufacturing sectors and sub-sectors. The time series can either be used to compare a certain branch or sector of countries or – if present in the data set – some sectors of one country.
For more information, please visit: http://www.unido.org/resources/statistics/statistical-databases.html
Sectors
Aggregate data [agg]
Other [oth]
The Failure Mode Classification dataset released in the paper "MWO2KG and Echidna: Constructing and exploring knowledge graphs from maintenance data" by Stewart et al. The goal is to label a given observation (made by a maintainer) with the corresponding Failure Mode Code.
Each row contains an observation made by a maintainer, followed by a comma, followed by the Failure Mode, for example:
falure,Breakdown
As they are written in technical language, there are often spelling/grammatical/tokenisation errors made in the observations - these are typical of maintenance work orders.
The dataset comprises 502 (observation, label) pairs (for training), 62 pairs (for validation) and 62 pairs (for testing). The labels are taken from a set of 22 failure mode codes from ISO 14224. In order to pull a list of observations in which to label, we ran MWO2KG over the data once and exported a list of all entities labelled as ‘observation’ (such as ‘leaking’, ‘not working’) by the Named Entity Recognition model. We then removed all results that were incorrectly predicted as observations by the NER model and proceeded to label each observation with the most appropriate failure mode code using a text editor.
The source code of the above paper (which also includes this dataset) is located on GitHub.
The direct link to the data (train.txt, dev.txt, and test.txt) is available here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the code and datasets used in the data analysis for "Fracture toughness of mixed-mode anticracks in highly porous materials". The analysis is implemented in Python, using Jupyter Notebooks.
main.ipynb
: Jupyter notebook with the main data analysis workflow.energy.py
: Methods for the calculation of energy release rates.regression.py
: Methods for the regression analyses.visualization.py
: Methods for generating visualizations.df_mmft.pkl
: Pickled DataFrame with experimental data gathered in the present work.df_legacy.pkl
: Pickled DataFrame with literature data.pandas
, matplotlib
, numpy
, scipy
, tqdm
, uncertainties
, weac
pip install -r requirements.txt
.main.ipynb
notebook in Jupyter Notebook or JupyterLab.df_mmft.pkl
and df_legacy.pkl
, which contain experimental measurements and corresponding parameters. Below are the descriptions for each column in these DataFrames:df_mmft.pkl
exp_id
: Unique identifier for each experiment.datestring
: Date of the experiment as a string.datetime
: Timestamp of the experiment.bunker
: Field site of the experiment. Bunker IDs 1 and 2 correspond to field sites A and B, respectively.slope_incl
: Inclination of the slope in degrees.h_sledge_top
: Distance from sample top surface to the sled in mm.h_wl_top
: Distance from sample top surface to weak layer in mm.h_wl_notch
: Distance from the notch root to the weak layer in mm.rc_right
: Critical cut length in mm, measured on the front side of the sample.rc_left
: Critical cut length in mm, measured on the back side of the sample.rc
: Mean of rc_right
and rc_left
.densities
: List of density measurements in kg/m^3 for each distinct slab layer of each sample.densities_mean
: Daily mean of densities
.layers
: 2D array with layer density (kg/m^3) and layer thickness (mm) pairs for each distinct slab layer.layers_mean
: Daily mean of layers
.surface_lineload
: Surface line load of added surface weights in N/mm.wl_thickness
: Weak-layer thickness in mm.notes
: Additional notes regarding the experiment or observations.L
: Length of the slab–weak-layer assembly in mm.df_legacy.pkl
#
: Record number.rc
: Critical cut length in mm.slope_incl
: Inclination of the slope in degrees.h
: Slab height in mm.density
: Mean slab density in kg/m^3.L
: Lenght of the slab–weak-layer assembly in mm.collapse_height
: Weak-layer height reduction through collapse.layers_mean
: 2D array with layer density (kg/m^3) and layer thickness (mm) pairs for each distinct slab layer.wl_thickness
: Weak-layer thickness in mm.surface_lineload
: Surface line load from added weights in N/mm.For more detailed information on the datasets, refer to the paper or the documentation provided within the Jupyter notebook.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Phase 1: Signboard Detection Dataset This phase focuses on detecting signboards in street images. - Total Images: 8,366 - Image Format: JPG (8,366 images) - Resolution: - Minimum: (720, 443) - Maximum: (9,280, 8,285) - Mean: (4,202, 3,138) - Median: (4,032, 3,024) - Aspect Ratio: - Minimum: 0.5625 - Maximum: 5.7043 - Mean: 1.3691 - Most Frequent: 1.3333 - Standard Deviation: 0.2329 - File Size (KB): - Minimum: 88.19 KB - Maximum: 41,266.50 KB - Mean: 5,796.19 KB - Total Dataset Size: 48,490,924.91 KB - Color Statistics: - Color Mode: RGB (8,366 images) - Mean Color (RGB): (110.32, 112.77, 118.16) - Standard Deviation (RGB): (65.71, 65.36, 65.82) - Brightness: - Average: 114.10 --- Phase 2: Region of Text Interest (RTI) Detection Dataset This phase focuses on detecting specific text regions (names and addresses) within signboards. - Total Images: 8,036 - Image Format: JPG (8,036 images) - Resolution: - Minimum: (552, 156) - Maximum: (9,228, 4,682) - Mean: (2,753, 808) - Median: (2,741, 781) - Aspect Ratio: - Minimum: 0.9615 - Maximum: 11.3835 - Mean: 3.6058 - Most Frequent: 4.0 - Standard Deviation: 1.2475 - File Size (KB): - Minimum: 40.54 KB - Maximum: 7,968.94 KB - Mean: 653.67 KB - Total Dataset Size: 5,252,868.26 KB - Color Statistics: - Color Mode: RGB (8,036 images) - Mean Color (RGB): (137.58, 136.29, 144.00) - Standard Deviation (RGB): (47.26, 49.73, 50.89) - Brightness: - Average: 138.74 --- Named Entity Recognition (NER) Dataset This dataset is used for categorizing extracted text from signboards. - Total Entries: 42,547 - Unique Categories: 10 - Category Distribution: - Religious Sites: 10,641 - Retail Outlets: 8,275 - Educational Institutions: 6,826 - Healthcare Institutions: 4,708 - Restaurants: 3,868 - Pharmacies: 3,637 - Parks: 1,547 - Banks: 1,121 - Stations: 1,094 - Hotels: 830 #### Word Count Statistics: - Overall Word Count: - Maximum: 18 - Minimum: 1 - Mean: 3.82 - Category-Wise Word Count: - Banks: Mean: 4.65, Max: 11, Min: 1 - Educational Institutions: Mean: 4.60, Max: 18, Min: 1 - Healthcare Institutions: Mean: 4.02, Max: 16, Min: 1 - Religious Sites: Mean: 4.36, Max: 17, Min: 1 - Retail Outlets: Mean: 3.08, Max: 15, Min: 1 - Restaurants: Mean: 3.36, Max: 13, Min: 1 - Pharmacies: Mean: 2.91, Max: 13, Min: 1 - Parks: Mean: 3.10, Max: 11, Min: 1 - Stations: Mean: 3.72, Max: 17, Min: 1 - Hotels: Mean: 3.12, Max: 12, Min: 1 This dataset is structured for a two-phase object detection pipeline with an additional text classification task to categorize extracted text from detected regions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Drone onboard multi-modal sensor dataset :
This dataset contains timeseries data from numerous drone flights. Each flight record has a unique identifier (uid) and a timestamp indicating when the flight occurred. The drone's position is represented by the coordinates (position_x, position_y, position_z) and altitude. The orientation of the drone is represented by the quaternion (orientation_x, orientation_y, orientation_z, orientation_w). The drone's velocity and angular velocity are represented by (velocity_x, velocity_y, velocity_z) and (angular_x, angular_y, angular_z) respectively. The linear acceleration of the drone is represented by (linear_acceleration_x, linear_acceleration_y, linear_acceleration_z).
In addition to the above, the dataset also contains information about the battery voltage (battery_voltage) and current (battery_current) and the payload attached. The payload information indicates if the drone operated with an embdded device attached (nvidia jetson), various sensors, and a solid-state weather station (trisonica).
The dataset also includes annotations for the current state of the drone, including IDLE_HOVER, ASCEND, TURN, HMSL and DESCEND. These states can be used for classification to identify the current state of the drone. Furthermore, the labeled dataset can be used for predicting the trajectory of the drone using multi-task learning.
For the annotation, we look at the change in position_x, position_y, position_z and yaw. Specifically, if the position_x, position_y changes, it means that the drone moves in a horizontal straight line, if the position_z changes, it means that the drone performs ascending or descending (depends on whether it increases or decreases), if the yaw changes, it means that the drone performs a turn and finally if any of the above features do not change, it means the drone is in idle or hover mode.
In addition to the features already mentioned, this dataset also includes data from various sensors including a weather station and an Inertial Measurement Unit (IMU). The weather station provides information about the weather conditions during the flight. This information includes, wind speed, and wind angle. These weather variables could be important factors that could influence the flight of the drone and battery consumption. The IMU is a sensor that measures the drone's acceleration, angular velocity, and magnetic field. The accelerometer provides information about the drone's linear acceleration, while the gyroscope provides information about the drone's angular velocity. The magnetometer measures the Earth's magnetic field, which can be used to determine the drone's orientation.
Field deployments were performed in order to collect empirical data using a specific type of drone, specifically a DJI Matrice 300 (M300). The M300 is equipped with advanced sensors and flight control systems, which can provide high-precision flight data. The flights were designed to cover a range of flight patterns, which include triangular flight patterns, square flight patterns, polygonal flight pattern, and random flight patterns. These flight patterns were chosen to represent a variety of different flight scenarios that could be encountered in real-world applications. The triangular flight pattern consists of the drone flying in a triangular path with a fixed altitude. The square flight pattern involves the drone flying in a square path with a fixed altitude. The polygonal flight pattern consists of the drone flying in a polygonal path with a fixed altitude, and the random flight pattern involves the drone flying in a random path with a fixed altitude. Overall, this dataset contains a rich set of flight data that can be used for various research purposes, including developing and testing algorithms for drone control, trajectory planning, and machine learning.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032
This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:
Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.
Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.
Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
One of the sectors that felt the impact of the Corona Virus Disease 2019 (COVID-19) pandemic was the educational sector. The outbreak led to the immediate closure of schools at all levels thereby sending billions of students away from their various institutions of learning. However, the shut down of academic institutions was not a total one as some institutions that were solely running online programmes were not affected. Those who were running face to face and online modes quickly switched over to the online mode. Unfortunately, institutions that have not fully embraced online mode of study were greatly affected. 85% of academic institutions in Nigeria are operating face to face mode of study, therefore, majority of Nigerian students at all levels were affected by the COVID-19 lockdown. Social media platforms and emerging technologies were the major backbones of institutions that are running online mode of study, therefore, this survey uses the unified theory of acceptance and use of technology (UTAUT) model to capture selected Face to face Nigerian University students accessibility, usage, intention and willingness to use these social media platforms and emerging technologies for learning. The challenges that could mar the usage of these technologies were also revealed. Eight hundred and fifty undergraduate students participated in the survey.
The dataset includes the questionnaire used to retrieve the data, the responses obtained in spreadsheet format, the charts generated from the responses received, the Statistical Package of the Social Sciences (SPSS) file and the descriptive statistics for all the variables captured. This second version contains the reliability statistics of the UTAUT variables using Cronbach's alpha. This measured the reliability as well as the internal consistency of the UTAUT variables. This was measured in terms of the reliability statistics, inter-item correlation matrix and item-total statistics. Authors believed that the dataset will enhance understanding of how face to face students use social media platforms and how these platforms could be used to engage the students outside their classroom activities. Also, the dataset exposes how familiar face to face University students are to these emerging teaching and learning technologies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NewsMediaBias-Plus dataset is designed for the analysis of media bias and disinformation by combining textual and visual data from news articles. It aims to support research in detecting, categorizing, and understanding biased reporting in media outlets.
NewsMediaBias-Plus pairs news articles with relevant images and annotations indicating perceived biases and the reliability of the content. It adds a multimodal dimension for bias detection in news media.
unique_id
: Unique identifier for each news item. Each unique_id
matches an image for the same article.outlet
: The publisher of the article.headline
: The headline of the article.article_text
: The full content of the news article.image_description
: Description of the paired image.image
: The file path of the associated image.date_published
: The date the article was published.source_url
: The original URL of the article.canonical_link
: The canonical URL of the article.new_categories
: Categories assigned to the article.news_categories_confidence_scores
: Confidence scores for each category.text_label
: Indicates the likelihood of the article being disinformation:
Likely
: Likely to be disinformation.Unlikely
: Unlikely to be disinformation.multimodal_label
: Indicates the likelihood of disinformation from the combination of the text snippet and image content:
Likely
: Likely to be disinformation.Unlikely
: Unlikely to be disinformation.Load the dataset into Python:
from datasets import load_dataset
ds = load_dataset("vector-institute/newsmediabias-plus")
print(ds) # View structure and splits
print(ds['train'][0]) # Access the first record of the train split
print(ds['train'][:5]) # Access the first five records
from datasets import load_dataset
# Load the dataset in streaming mode
streamed_dataset = load_dataset("vector-institute/newsmediabias-plus", streaming=True)
# Get an iterable dataset
dataset_iterable = streamed_dataset['train'].take(5)
# Print the records
for record in dataset_iterable:
print(record)
Contributions are welcome! You can:
To contribute, fork the repository and create a pull request with your changes.
This dataset is released under a non-commercial license. See the LICENSE file for more details.
Please cite the dataset using this BibTeX entry:
@misc{vector_institute_2024_newsmediabias_plus,
title={NewsMediaBias-Plus: A Multimodal Dataset for Analyzing Media Bias},
author={Vector Institute Research Team},
year={2024},
url={https://huggingface.co/datasets/vector-institute/newsmediabias-plus}
}
For questions or support, contact Shaina Raza at: shaina.raza@vectorinstitute.ai
Disclaimer: The labels Likely
and Unlikely
are based on LLM annotations and expert assessments, intended for informational use only. They should not be considered final judgments.
Guidance: This dataset is for research purposes. Cross-reference findings with other reliable sources before drawing conclusions. The dataset aims to encourage critical thinking, not provide definitive classifications.
This dataset contains in-situ measurements of temperature, salinity, and velocity from the Sub-Mesoscale Ocean Dynamics Experiment (S-MODE) conducted approximately 300 km offshore of San Francisco, during an intensive observation period in the fall of 2022. The data are available in netCDF format with a dimension of time. S-MODE aims to understand how ocean dynamics acting on short spatial scales influence the vertical exchange of physical and biological variables in the ocean. The target in-situ quantities were measured by Lagrangian floats, which were deployed from research vessels and retrieved 3-5 days later. The floats follow the 3D motion of water parcels at depths within or just below the mixed layer and carried a CTD instrument to measure temperature, salinity, and pressure, in addition to an ADCP instrument to measure velocity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset collects a raw dataset and a processed dataset derived from the raw dataset. There is a document containing the analytical code for statistical analysis of the processed dataset in .Rmd format and .html format. The study examined some aspects of mechanical performance of solid wood composites. We were interested in certain properties of solid wood composites made using different adhesives with different grain orientations at the bondline, then treated at different temperatures prior to testing. Performance was tested by assessing fracture energy and critical fracture energy, lap shear strength, and compression strength of the composites. This document concerns only the fracture properties, which are the focus of the related paper. Notes: * the raw data is provided in this upload, but the processing is not addressed here. * the authors of this document are a subset of the authors of the related paper. * this document and the related data files were uploaded at the time of submission for review. An update providing the doi of the related paper will be provided when it is available.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data set gives the best available values for ion densities, temperatures, and velocities near Neptune derived from data obtained by the Voyager 2 plasma experiment. All parameters are obtained by fitting the observed spectra (current as a function of energy) with Maxwellian plasma distributions, using a non-linear least squares fitting routine to find the plasma parameters which, when coupled with the full instrument response, best simulate the data.
VITAL SIGNS INDICATOR Transit Cost-Effectiveness (T13)
FULL MEASURE NAME Net cost per transit boarding (cost per boarding minus fare per boarding)
LAST UPDATED May 2017
DESCRIPTION Transit cost-effectiveness refers to both the total and net costs per transit boarding, both of which are adjusted to reflect inflation over time. Net costs reflect total operating costs minus farebox revenue (i.e. operating costs that are not directly funded by system users). The dataset includes metropolitan area, regional, mode, and system tables for net cost per boarding, total cost per boarding, and farebox recovery ratio.
DATA SOURCE Federal Transit Administration: National Transit Database http://www.ntdprogram.gov/ntdprogram/data.htm
Bureau of Labor Statistics: Consumer Price Index http://www.bls.gov/data/
CONTACT INFORMATION vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator) Simple modes were aggregated to combine the various bus modes (e.g. rapid bus, express bus, local bus) into a single mode to avoid incorrect conclusions resulting from mode recoding over the lifespan of NTD. For other metro areas, operators were identified by developing a list of all urbanized areas within a current MSA boundary and then using that UZA list to flag relevant operators; this means that all operators (both large and small) were included in the metro comparison data. Financial data was inflation-adjusted to match 2015 dollar values using metro-specific Consumer Price Indices.
This is the first live data stream on Kaggle providing a simple yet rich source of all soccer matches around the world 24/7 in real-time.
What makes it unique compared to other datasets?
Simply train your algorithm on the first version of training dataset of approximately 11.5k matches and predict the data provided in the following data feed.
The CSV file is updated every 30 minutes at minutes 20’ and 50’ of every hour. I kindly request not to download it more than twice per hour as it incurs additional cost.
You may download the csv data file from the following link from Amazon S3 server by changing the FOLDER_NAME as below,
https://s3.amazonaws.com/FOLDER_NAME/amasters.csv
*. Substitute the FOLDER_NAME with "**analyst-masters**"
Our goal is to identify the outcome of a match as Home, Draw or Away. The variety of sources and nature of information provided in this data stream makes it a unique database. Currently, FIVE servers are collecting data from soccer matches around the world, communicating with each other and finally aggregating the data based on the dominant features learned from 400,000 matches over 7 years. I describe every column and the data collection below in two categories, Category I – Current situation and Category II – Head-to-Head History. Hence, we divide the type of data we have from each team to 4 modes,
Below you can find a full illustration of each category.
I. Current situation
Col 1 to 3:
Votes_for_Home Votes_for_Draw Votes_for_Away
The most distinctive parts of the database are these 3 columns. We are releasing opinions of over 100 professional soccer analysts predicting the outcome of a match. Their votes is the result of every piece of information they receive on players, team line-up, injuries and the urge of a team to win a match to stay in the league. They are spread around the world in various time zones and are experts on soccer teams from various regions. Our servers aggregate their opinions to update the CSV file until kickoff. Therefore, even if 40 users predict Real-Madrid wins against Real-Sociedad in Santiago Bernabeu on January 6th, 2019 but 5 users predict Real-Sociedad (the away team) will be the winner, you should doubt the home win. Here, the “majority of votes” works in conjunction with other features.
Col 4 to 9:
Weekday Day Month Year Hour Minute
There are over 60,000 matches during a year, and approximately 400 ones are usually held per day on weekends. More critical and exciting matches, which are usually less predictable, are held toward the evening in Europe. We are currently providing time in Central Europe Time (CET) equivalent to GMT +01:00.
*. Please note that the 2nd row of the CSV file represents the time, data values are saved from all servers to the file.
Col 10 to 13:
Total_Bettors Bet_Perc_on_Home Bet_Perc_on_Draw Bet_Perc_on_Away
This data is recorded a few hours before the match as people place bets emotionally when kickoff approaches. The percentage of the overall number of people denoted as “Total_Bettors” is indicated in each column for “Home,” “Draw” and “Away” outcomes.
Col 14 to 15:
Team_1 Team_2
The team playing “Home” is “Team_1” and the opponent playing “Away” is “Team_2”.
Col 16 to 36:
League_Rank_1 League_Rank_2 Total_teams Points_1 Points_2 Max_points Min_points Won_1 Draw_1 Lost_1 Won_2 Draw_2 Lost_2 Goals_Scored_1 Goals_Scored_2 Goals_Rec_1 Goal_Rec_2 Goals_Diff_1 Goals_Diff_2
If the match is betw...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set includes tower-based Ka-band ocean surface backscatter measurements (cross section, incidence angle, radial velocity from radar, pulse-pair correlation) located offshore of Martha’s Vineyard (41°19.5′N, 70°34′W), Massachusetts (USA) over a period of three months, from October 2019 to January 2020. Data from the Ka-band radar are collected at multiple distances from the tower (up to ~32 m) at several incidence angles and at sub-second resolution. The measurements are provided as hourly files in netCDF format.
Ka-band backscatter data are often utilized to derived ocean surface vector winds. The instrument used for this dataset was a Ka-Band Ocean continuous wave Doppler Scatterometer (KaBODS) built by the University of Massachusetts, Amherst, which was installed on the Woods Hole Oceanographic Institution Air-Sea Interaction Tower (ASIT). The tower is located in 15 m deep water and extends 76 feet into the marine atmosphere. Data were collected as part of a pre-pilot campaign for the S-MODE (Submesoscale Ocean Dynamics Experiment) project. The measurements provided the opportunity to develop Ka-band backscatter models as well as study backscattering mechanisms under different wind, wave, and weather conditions in order to support operation of the airborne Ka-band Doppler scatterometer used during the main S-MODE intensive observation periods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains PRISM data from the Sub-Mesoscale Ocean Dynamics Experiment (S-MODE) during the IOP1 campaign conducted approximately 300 km offshore of San Francisco during Fall 2022. S-MODE aims to understand how ocean dynamics acting on short spatial scales influence the vertical exchange of physical and biological variables in the ocean. The Portable Remote Imaging Spectrometer (PRISM) is an airborne instrument package that is mounted on the GIII aircraft which flies long duration detailed surveys of the field domain during deployments. PRISM contains a pushbroom imaging spectrometer operating at near-UV to near-IR wavelengths (350-1050 nm), which will produce high temporal resolution and resolve spatial features as small as 30 cm. PRISM also has a two-channel spot radiometer at short-wave infrared (SWIR) band (1240 nm and 1640 nm), that is co-aligned with the spectrometer and will be used to provide accurate atmospheric correction of the ocean color measurements. Level 1 data is available in netCDF format.
CoRoT was a space astronomy mission devoted to the study of the variability with time of stars' brightness, with an extremely high accuracy (100 times better than from the ground), for very long durations (up to 150 days) and with a very high duty cycle (more than 90%). The mission was led by CNES in association with four French laboratories, and 7 participating countries and agencies (Austria, Belgium, Brazil, Germany, Spain, and the ESA Science Programme). The satellite is composed of a PROTEUS platform (the 3rd in the series), and a unique instrument: a stellar photometer. It was launched on December 27th, 2006 on a Soyuz Rocket, from Baikonour. The mission has lasted almost 6 years (the nominal 3 years duration and a 3 years extension) and has observed more than 160,000 stars. It suddenly stopped sending data on November 2nd, 2012. CoRoT performed Ultra High Precision Photometry of Stars to detect and characterize the variability of their luminosity with two main objectives: (i) the variability of the object itself: oscillations, rotation, magnetic activity, etc.; (ii) variability due to external causes such as bodies in orbit around the star: planets and companion stars. The original scientific objectives were focused on the study of stellar pulsations (asteroseismology) to probe the internal structure of stars, and the detection of small exoplanets through their transit in front of their host star, and the measurement of their sizes. This lead to the introduction of two modes of observations, working simultaneously: - The bright star mode dedicated to very precise seismology of a small sample (171) of bright and nearby stars (presented in the file named "Bright_star.dat" in the CDS version at https://cdsarc.cds.unistra.fr/ftp/cats/B/corot/): these data are not included in this HEASARC table, notice; - The faint star mode, observing a very large number of stars at the same time, to detect transits, which are rare events, as they imply the alignment of the star, the planet and the observer (these data are presented in the file named "Faint_star.dat" in the CDS version at https://cdsarc.cds.unistra.fr/ftp/cats/B/corot/): this HEASARC table is based on this sample. The large amount of data gathered in this mode mode turned out to be extremely fruitful for many topics of stellar physics. Due to project constraints, two regions of the sky were accessible (circles of 10 degrees centered on the equator around Right Ascensions of 06h 50m and 18h 50m). They are called the CoRoT 'eyes': the first one is called the "anticenter" eye, whereas the second one is called the "center eye". Each pointing covers 1.4 x 2.8 square degrees. The CoRoT project is still processing the data, aiming at removing instrumental artifacts and defects. Therefore the format and content of the catalog is still somewhat evolving. More details on the data can be found in the file http://idoc-corotn2-public.ias.u-psud.fr/jsp/doc/CoRoT_N2_versions_30sept2014.pdf. More details on the CoRoT N2 data may be found in the documentation file http://idoc-corotn2-public.ias.u-psud.fr/jsp/doc/DescriptionN2v1.5.pdf. This HEASARC table contains information on stars observed by CoRoT in its exoplanet detection program. A few percent of these stars have 2 entries since they were observed in different windows (as specified by the corot_window_id parameter) in a subsequent observing run to the initial run in which they were observed. Each entry in this table corresponds to the unique specification of target and corot_window_id, each with a link to its associated N2 data products. The original names of the parameters in this table, as given in the CoRoT mission documentation, are given in square brackets at the end of the parameter descriptions listed below. This table was created by the HEASARC in May 2012 based on CDS Catalog B/corot file Faint_star.dat. The HEASARC routinely updates this table after updates are made to the CDS version of this catalog. This is a service provided by NASA HEASARC .
This data set gives the best available values for ion densities, temperatures, and velocities near Neptune derived from data obtained by the Voyager 2 plasma experiment. All parameters are obtained by fitting the observed spectra (current as a function of energy) with Maxwellian plasma distributions, using a non-linear least squares fitting routine to find the plasma parameters which, when coupled with the full instrument response, best simulate the data. The PLS instrument measures energy/charge, so composition is not uniquely determined but can be deduced in some cases by the separation of the observed current peaks in energy (assuming the plasma is co-moving). In the upstream solar wind protons are fit to the M-long data since high energy resolution is needed to obtain accurate plasma parameters. In the magnetosheath the ion flux so low that several L-long spectra (3-5) had to be averaged to increase the signal-to-noise ratio to a level at which the data could be reliably fit. These averaged spectra were fit using 2 proton maxwellians with the same velocity. The values given in the upstream magnetosheath are the total density and the density-weighted temperature. In both the upstream solar wind and magnetosheath full vector velocities, densities and temperatures are derived for each fit component. In the magnetosphere, spectra do not contain enough information to obtain full velocity vectors, so flow is assumed to be purely azimuthal. In some cases the azimuthal velocity is a fit parameter, in some cases rigid corotation is assumed. In the 'outer' magnetosphere (L>5) two distinct current peaks appear in the spectra H+ and N+. In the inner magnetosphere the plasma is hot and the composition is ambiguous, although two superimposed Maxwellians are still required to fit the data. These spectra are fit using two compositions, one with H+ and N+ and the second with two H+ components. The N+ composition is preferred by the data provider. All fit values in the magnetosphere come with one sigma errors. It should be noted that no attempt has been made to account for the spacecraft potential, which is probably about -10 V in this region and will effect the density and velocity values. In the outbound magnetosheath and solar wind both moment and fit values are given for velocity, density, and thermal speed. The signal-to-noise ratio in the M-longs is very low, especially near the magnetopause, which can result in the analysis giving incorrect values. The L-long spectra have too low an energy resolution to permit accurate determinations parameters in many regions temperature and non-radial velocity components may be inaccurate.
Accessible Tables and Improved Quality
As part of the Analysis Function Reproducible Analytical Pipeline Strategy, processes to create all National Travel Survey (NTS) statistics tables have been improved to follow the principles of Reproducible Analytical Pipelines (RAP). This has resulted in improved efficiency and quality of NTS tables and therefore some historical estimates have seen very minor change, at least the fifth decimal place.
All NTS tables have also been redesigned in an accessible format where they can be used by as many people as possible, including people with an impaired vision, motor difficulties, cognitive impairments or learning disabilities and deafness or impaired hearing.
If you wish to provide feedback on these changes then please email national.travelsurvey@dft.gov.uk.
Revision to table NTS9919
On the 16th April 2025, the figures in table NTS9919 have been revised and recalculated to include only day 1 of the travel diary where short walks of less than a mile are recorded (from 2017 onwards), whereas previous versions included all days. This is to more accurately capture the proportion of trips which include short walks before a surface rail stage. This revision has resulted in fewer available breakdowns than previously published due to the smaller sample sizes.
NTS0303: https://assets.publishing.service.gov.uk/media/66ce0f118e33f28aae7e1f75/nts0303.ods">Average number of trips, stages, miles and time spent travelling by mode: England, 2002 onwards (ODS, 53.9 KB)
NTS0308: https://assets.publishing.service.gov.uk/media/66ce0f128e33f28aae7e1f76/nts0308.ods">Average number of trips and distance travelled by trip length and main mode; England, 2002 onwards (ODS, 191 KB)
NTS0312: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f71/nts0312.ods">Walks of 20 minutes or more by age and frequency: England, 2002 onwards (ODS, 35.1 KB)
NTS0313: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f72/nts0313.ods">Frequency of use of different transport modes: England, 2003 onwards (ODS, 27.1 KB)
NTS0412: https://assets.publishing.service.gov.uk/media/66ce0f1325c035a11941f653/nts0412.ods">Commuter trips and distance by employment status and main mode: England, 2002 onwards (ODS, 53.8 KB)
NTS0504: https://assets.publishing.service.gov.uk/media/66ce0f141aaf41b21139cf7d/nts0504.ods">Average number of trips by day of the week or month and purpose or main mode: England, 2002 onwards (ODS, 141 KB)
<h2 id=