100+ datasets found
  1. YouTube Videos and Channels Metadata

    • kaggle.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). YouTube Videos and Channels Metadata [Dataset]. https://www.kaggle.com/datasets/thedevastator/revealing-insights-from-youtube-video-and-channe
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Area covered
    YouTube
    Description

    YouTube Videos and Channels Metadata

    Analyze the statistical relation between videos and form a topic tree

    By VISHWANATH SESHAGIRI [source]

    About this dataset

    This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use This Dataset

    In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.

    To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released

    Research Ideas

    • Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.

    • Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.

    • Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...

  2. d

    2022 - 2023 NTD Annual Data - Service (by Mode and Time Period)

    • catalog.data.gov
    • data.transportation.gov
    • +2more
    Updated Jan 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2025). 2022 - 2023 NTD Annual Data - Service (by Mode and Time Period) [Dataset]. https://catalog.data.gov/dataset/service-flat-file
    Explore at:
    Dataset updated
    Jan 23, 2025
    Dataset provided by
    Federal Transit Administration
    Description

    This represents the Service data reported to the NTD by transit agencies to the NTD. In versions of the data tables from before 2014, you can find data on service in the file called "Transit Operating Statistics: Service Supplied and Consumed." If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

  3. MMS 3 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Survey...

    • data.nasa.gov
    • heliophysicsdata.gsfc.nasa.gov
    • +3more
    Updated Aug 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). MMS 3 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Survey Mode, 32 Sample/s Data [Dataset]. https://data.nasa.gov/dataset/mms-3-search-coil-magnetometer-scm-ac-magnetic-field-level-2-l2-survey-mode-32-sample-s-da
    Explore at:
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Search Coil Magnetometer (SCM) AC Magnetic Field (32 samples/s), Level 2, Survey Mode Data. The tri-axial Search-Coil Magnetometer with its associated preamplifier measures three-dimensional magnetic field fluctuations. The analog magnetic waveforms measured by the SCM are digitized and processed inside the Digital Signal Processor (DSP), collected and stored by the Central Instrument Data Processor (CIDP) via the Fields Central Electronics Box (CEB). Prior to launch, all SCM Flight models were calibrated by LPP team members at the National Magnetic Observatory, Chambon-la-Foret (Orleans). Once per orbit, each SCM transfer function is checked thanks to the onboard calibration signal provided by the DSP. The SCM is operated for the entire MMS orbit in survey mode. Within scientific Regions Of Interest (ROI), burst mode data are also acquired as well as high speed burst mode data. This SCM data set corresponds to the AC magnetic field waveforms in nanoTesla and in the GSE frame. The SCM instrument paper can be found at http://link.springer.com/article/10.1007/s11214-014-0096-9 and the SCM data product guide at https://lasp.colorado.edu/mms/sdc/public/datasets/fields/.

  4. MMS 2 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Burst...

    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • data.nasa.gov
    • +1more
    Updated Aug 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). MMS 2 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Burst Mode, 8192 Sample/s Data [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/mms-2-search-coil-magnetometer-scm-ac-magnetic-field-level-2-l2-burst-mode-8192-sample-s-d
    Explore at:
    Dataset updated
    Aug 4, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Search Coil Magnetometer (SCM) AC Magnetic Field (8192 samples/s), Level 2, Burst Mode Data. The tri-axial Search-Coil Magnetometer with its associated preamplifier measures three-dimensional magnetic field fluctuations. The analog magnetic waveforms measured by the SCM are digitized and processed inside the Digital Signal Processor (DSP), collected and stored by the Central Instrument Data Processor (CIDP) via the Fields Central Electronics Box (CEB). Prior to launch, all SCM Flight models were calibrated by LPP team members at the National Magnetic Observatory, Chambon-la-Foret (Orleans). Once per orbit, each SCM transfer function is checked thanks to the onboard calibration signal provided by the DSP. The SCM is operated for the entire MMS orbit in survey mode. Within scientific Regions Of Interest (ROI), burst mode data are also acquired as well as high speed burst mode data. This SCM data set corresponds to the AC magnetic field waveforms in nanoTesla and in the GSE frame. The SCM instrument paper for SCM can be found at http://link.springer.com/article/10.1007/s11214-014-0096-9 and the SCM data product guide at https://lasp.colorado.edu/mms/sdc/public/datasets/fields/.

  5. D

    NTD Annual Data View - Track & Roadway (by Agency)

    • data.transportation.gov
    application/rdfxml +5
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2024). NTD Annual Data View - Track & Roadway (by Agency) [Dataset]. https://data.transportation.gov/w/pvgq-a73e/m7rw-edbr?cur=u9IHJAJqQp9
    Explore at:
    application/rssxml, csv, json, application/rdfxml, xml, tsvAvailable download formats
    Dataset updated
    Dec 16, 2024
    Dataset authored and provided by
    Federal Transit Administration
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    Provides agency-wide totals for track and roadway components. Data is from the National Transit Database in the 2022 and 2023 report years. These data include the types of track/roadway elements employed in transit operation, as well as the length and/or count of certain elements. This view is based off of the "2022 - 2023 NTD Annual Data - Track & Roadway (by Mode)" dataset, which displays the same data at a lower level of aggregation. This view displays the data at a higher level (by agency).

    NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. The dataset that this view references is based on the 2022 and 2023 Transit Way Mileage database files.

    In years 2015-2021, you can find this data in the "Track and Roadway" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

    In versions of the data tables from before 2015, you can find corresponding data in the file called "Transit Way Mileage - Rail Modes" and "Transit Way Mileage - Non-Rail Modes."

    If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

  6. o

    Figure data sets for the paper "Non-classical correlations over 1250 modes...

    • explore.openaire.eu
    Updated Aug 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Businger; L. Nicolas; T. Sanchez Mejia; A. Ferrier; P. Goldner; M. Afzelius (2022). Figure data sets for the paper "Non-classical correlations over 1250 modes between telecom photons and 979-nm photons stored 171Yb3+:Y2SiO5" [Dataset]. http://doi.org/10.5281/zenodo.7006773
    Explore at:
    Dataset updated
    Aug 18, 2022
    Authors
    M. Businger; L. Nicolas; T. Sanchez Mejia; A. Ferrier; P. Goldner; M. Afzelius
    Description

    {"references": ["M. Businger et al., "Remote distribution of non-classical correlations over 1250 modes between telecom photons and 978 nm photons stored in 171Yb3+:Y2SiO5 crystal",\u00a0arXiv:2205.01481"]} Processed datasets corresponding to the Figures published in the article.

  7. i

    Chililab Surveillance Dataset 2004 -2011 - Vietnam

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pham Viet Cuong (2019). Chililab Surveillance Dataset 2004 -2011 - Vietnam [Dataset]. https://datacatalog.ihsn.org/catalog/5539
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    Le Thi Vui
    Bui Tu Quyen
    Bui Thi Thu Ha
    Le Minh Thi
    Truong Quang Tien
    Pham Viet Cuong
    Duong Kim Tuan
    Tran Huu Bich
    La Ngoc Quang
    Nguyen Thanh Nghi
    Time period covered
    2004 - 2012
    Area covered
    Vietnam
    Description

    Abstract

    CHILILAB is designed as a site for training and research for Hanoi School of Public Health. CHILILAB is based on the foundation of an active and sustainable surveillance system monitoring basic morbidity and mortality as well as other demographic, socio-economic and culture characteristics. CHILILAB was designed to focus on issues related to adolescent health and injury. The entire population of 3 selected townships and 4 communes were invited to participate in the study, included 57161 participants from 17993 household. There are 6 objectives of CHILILAB, includes:

    1.Establish a demographic and epidemiologic surveillance system on cyclic morbidity and mortality in Chi Linh district

    2.Identify patterns and trends of morbidity and mortality in the district population periodically and longitudinally

    3.Provide data on adolescent health especially risk and protective factor affecting adolescent health issues

    4.Pilot and evaluate community health intervention strategies periodically in order to form a basic for developing health policies in the locality and elsewhere

    5.Improve the procedures for data collection, analysis and application at the community level in a reliable and effective manner

    6.Strengthen the capacity of public health professionals in Hanoi School of Public Health.

    Geographic coverage

    Chi Linh is a mountainous town of Hai Duong Province of northern Vietnam. Chi Linh town covers an area of 300,54 square kilometers, contains 17 communes and 3 towns and has a population of 142,278 (50.3% female, 49.7% male). A number of different ethnic minority groups live in the district, including San Diu, Chinese, H'Mong, Tay, Nung and some other ethnic minorities. The population density varies across the district, with higher density in lowland communes and places with access to transportation routes. Approximately 25% of the total population lives in urban areas. The number of adolescents between 15-24 years of age accounts for 19.3% of the total district population according to 1999 population data. Residents receive information, including health information, from several different channels: 100% of the communes/towns are covered with the loud-speaker system; 100% of households have their own radio receivers; about 60% of households have TVs.

    Several national roads as well as inter-communal roads run through the town. While the national roads are high quality and therefore can handle a high volume of high speed traffic, the inter-communal roads are ill-maintained due to insufficient funding. As a result, some communes are not very easily accessible. Much of the land in Chi Linh is used for agriculture, including rice and subsidiary dry crop cultivation. In addition, forest resources in the district are currently being exploited. The town also contains two thermal power plants as well as a medical glass plant, a refractory soil mine, and a leather shoe making plant providing employment in the district.

    Analysis unit

    Individual

    Universe

    Resident household members of household's resident within the demographic surveillance area.

    Kind of data

    Event history data

    Frequency of data collection

    From 2004 to the end of 2007: 4 rounds per year From 2008 to the end of 2012: 2 rounds per year From 2013: 1 round per year

    Sampling procedure

    The entire population of 7 communes and townships within Chi Linh town will be selected to be the sample of study. This includes approximately 70,000 individuals from 18,000 households. Data was collected at both the household and individual levels. A list of the households in the district was made with the involvement of relevant authorities and use of existing local lists of households at the time when the baseline survey is conducted. Study subjects for specific studies will be selected according to their objectives and aims of those studies.

    Sampling deviation

    None

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    CHILILAB includes 7 kinds of form to collect data:

    1. Form Q (General Information of household): Contains basic information of household as: Name of household, address, household ID, identifies some events in the household during 3 months (or 4 months or 6 months).

    2. Form H (Household information): some information about the social - economic of household as: list of properties, type of house wall, house roof or house foundation

    3. Form C (Individual information): Date of birth, sex, age, ethnic, religion, health insurance, numbers of children

    4. Form DC (Immigration information): Date of immigration or out-migration, reason of immigration and where do they live to/from?

    5. Form TH (Pregnancy information): Pregnancy status, start-date and end-date of pregnancy

    6. Form CH (Mortality information): Expectation of life, mortality cause

    7. Form CU (Household change information): Date to change, reason to change and the name of new household

    Cleaning operations

    Data collected from the sample population is stored in the database of the CHILILAB in HSPH. Periodical updates of demographic information and socio-economic, cultural, health conditions are extracted from the database for analysis. Findings are then disseminated to the community and policy makers in annual workshops with local authorities, though manuscripts, and factsheets in Vietnamese as well as in English. Individual and household data in CHILILAB could be accessed if the request for data use is sent to the CHILILAB coordinator for consideration.

    Response rate

    100% household in the field have joined in CHILILAB

  8. D

    NTD Annual Data View - Employees (By Mode)

    • data.transportation.gov
    application/rdfxml +5
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2024). NTD Annual Data View - Employees (By Mode) [Dataset]. https://data.transportation.gov/w/wsxw-2rpq/m7rw-edbr?cur=gXKkrSEk6Ux&from=xwPw7zR3Ahe
    Explore at:
    json, csv, application/rssxml, xml, tsv, application/rdfxmlAvailable download formats
    Dataset updated
    Dec 16, 2024
    Dataset authored and provided by
    Federal Transit Administration
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    This dataset details data on hours worked by public transportation employees and the head counts of employees for each applicable agency reporting to the National Transit Database in the 2022 and 2023 report years at the mode and type of service level.

    NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 and 2023 Transit Agency Employees database files.

    In years 2015-2021, you can find this data in the "Employees" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

    If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

  9. U

    Dataset for "Highly multi-mode hollow core fibres"

    • researchdata.bath.ac.uk
    7z
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robbie Mears; Kerrianne Harrington; William Wadsworth; James Stone; Tim Birks (2025). Dataset for "Highly multi-mode hollow core fibres" [Dataset]. http://doi.org/10.15125/BATH-01499
    Explore at:
    7zAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    University of Bath
    Authors
    Robbie Mears; Kerrianne Harrington; William Wadsworth; James Stone; Tim Birks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    Engineering and Physical Sciences Research Council
    Description

    This repository contains all the raw data and raw images used in the paper titled 'Highly multi-mode hollow core fibres'. It is grouped into two folders of raw data and raw images. In the raw data there are a number of .dat files which contain alternating columns of wavelength and signal for the different measurements of transmission, cutback and bend loss for the different fibres. In the raw images, simple .tif files of the different fibres are given and different near field and far field images used in Figure 2.

  10. e

    Employment and Unemployment Survey, EUS 2016 - Jordan

    • erfdataportal.com
    Updated Oct 22, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic Research Forum (2017). Employment and Unemployment Survey, EUS 2016 - Jordan [Dataset]. http://www.erfdataportal.com/index.php/catalog/133
    Explore at:
    Dataset updated
    Oct 22, 2017
    Dataset provided by
    Economic Research Forum
    Department of Statistics
    Time period covered
    2016
    Area covered
    Jordan
    Description

    Abstract

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    The Department of Statistics (DOS) carried out four rounds of the 2016 Employment and Unemployment Survey (EUS). The survey rounds covered a sample of about fourty nine thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design.

    It is worthy to mention that the DOS employed new technology in data collection and data processing. Data was collected using electronic questionnaire instead of a hard copy, namely a hand held device (PDA).

    The survey main objectives are: - To identify the demographic, social and economic characteristics of the population and manpower. - To identify the occupational structure and economic activity of the employed persons, as well as their employment status. - To identify the reasons behind the desire of the employed persons to search for a new or additional job. - To measure the economic activity participation rates (the number of economically active population divided by the population of 15+ years old). - To identify the different characteristics of the unemployed persons. - To measure unemployment rates (the number of unemployed persons divided by the number of economically active population of 15+ years old) according to the various characteristics of the unemployed, and the changes that might take place in this regard. - To identify the most important ways and means used by the unemployed persons to get a job, in addition to measuring durations of unemployment for such persons. - To identify the changes overtime that might take place regarding the above-mentioned variables.

    The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

    Geographic coverage

    Covering a sample representative on the national level (Kingdom), governorates, and the three Regions (Central, North and South).

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey covered a national sample of households and all individuals permanently residing in surveyed households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Cleaning operations

    ----> Raw Data

    A tabulation results plan has been set based on the previous Employment and Unemployment Surveys while the required programs were prepared and tested. When all prior data processing steps were completed, the actual survey results were tabulated using an ORACLE package. The tabulations were then thoroughly checked for consistency of data. The final report was then prepared, containing detailed tabulations as well as the methodology of the survey.

    ----> Harmonized Data

    • The SPSS package is used to clean and harmonize the datasets.
    • The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.
    • All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.
    • A post-harmonization cleaning process is then conducted on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
  11. Modal comparisons (TSGB01)

    • gov.uk
    • tnaqa.mirrorweb.com
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2024). Modal comparisons (TSGB01) [Dataset]. https://www.gov.uk/government/statistical-data-sets/tsgb01-modal-comparisons
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    Accessibility of tables

    The department is currently working to make our tables accessible for our users. The data tables for these statistics are now accessible.

    We would welcome any feedback on the accessibility of our tables, please email us.

    Passenger transport

    TSGB0101: https://assets.publishing.service.gov.uk/media/6762e055cdb5e64b69e307ab/tsgb0101.ods">Passenger transport by mode from 1952 (ODS, 24.2 KB)

    TSGB0102: https://assets.publishing.service.gov.uk/media/6762e05eff2c870561bde7ef/tsgb0102.ods">Passenger journeys on public transport vehicles from 1950 (ODS, 13.9 KB)

    Mode share

    TSGB0103 (NTS0303): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821414/nts0303.ods" class="govuk-link">Average number of trips, stages, miles and time spent travelling by main mode (ODS, 55KB)

    TSGB0104 (NTS0409a): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821479/nts0409.ods" class="govuk-link">Average number of trips by purpose and main mode (ODS, 122KB)

    TSGB0105 (NTS0409b): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821479/nts0409.ods" class="govuk-link">Average distance travelled by purpose and main mode (ODS, 122KB)

    Table TSGB0106 - people entering central London during the morning peak, since 1996

    The data source for this table has been discontinued since it was last updated in December 2019.

    Passenger casualty rates

    TSGB0107 (RAS0203): https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods" class="govuk-link">Passenger casualty rates by mode (ODS, 21KB)

    Travel to work

    TSGB0108: https://assets.publishing.service.gov.uk/media/675968b1403b5cf848a292b2/tsgb0108.ods">Usual method of travel to work by region of residence (ODS, 50.1 KB)

    TSGB0109: https://assets.publishing.service.gov.uk/media/6751b8c60191590a5f351191/tsgb0109.ods">Usual method of travel to work by region of workplace (ODS, 51.9 KB)

    TSGB0110: https://assets.publishing.service.gov.uk/media/6751b8cf19e0c816d18d1e13/tsgb0110.ods">Time taken to travel to work by region of workplace (ODS, 40 KB)

    TSGB0111: https://assets.publishing.service.gov.uk/media/6751b8e72086e98fae35119d/tsgb0111.ods">Average time taken to travel to work by region of workplace and usual method of travel (ODS, 42.5 KB)

    TSGB0112: https://assets.publishing.service.gov.uk/media/6751b8f26da7a3435fecbd60/tsgb0112.ods">How workers usually travel to work by car by region of workplace (ODS, 24.7 KB)

    <h2 id=

  12. Data from: Preclinical PET data

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ville-Veikko Wettenhovi; Ville-Veikko Wettenhovi; Kimmo Jokivarsi; Kimmo Jokivarsi (2021). Preclinical PET data [Dataset]. http://doi.org/10.5281/zenodo.3528056
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ville-Veikko Wettenhovi; Ville-Veikko Wettenhovi; Kimmo Jokivarsi; Kimmo Jokivarsi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An open preclinical PET dataset. This dataset has been measured with the preclinical Siemens Inveon PET machine. The measured target is a (naive) rat with an injected dose of 21.4 MBq of FDG. The injection was done intravenously (IV) to the tail vein. No specific organ was investigated, but rather the glucose metabolism as a whole. The examination is a 60 minute dynamic acquisition. The measurement was conducted according to the ethical standards set by the University of Eastern Finland.

    The dataset contains the original list-mode data, the (dynamic) sinogram created by the Siemens Inveon Acquisition Workplace (IAW) software (28 frames), the (dynamic) scatter sinogram created by the IAW software (28 frames), the attenuation sinogram created by the IAW software and the normalization coefficients created by the IAW software. Header files are included for all the different data files.

    For documentation on reading the list-mode binary data, please ask Siemens.

    This dataset can be used in the OMEGA software, including the list-mode data, to import the data to MATLAB/Octave, create sinograms from the list-mode data and reconstruct the imported data. For help on using the dataset with OMEGA, see the wiki.

  13. D

    NTD Annual Data View - Stations (by Agency and Age)

    • data.transportation.gov
    application/rdfxml +5
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2024). NTD Annual Data View - Stations (by Agency and Age) [Dataset]. https://data.transportation.gov/Public-Transit/NTD-Annual-Data-View-Stations-by-Agency-and-Age-/puti-h4fa
    Explore at:
    application/rssxml, xml, application/rdfxml, tsv, csv, jsonAvailable download formats
    Dataset updated
    Dec 16, 2024
    Dataset authored and provided by
    Federal Transit Administration
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    Provides agency-wide totals for data pertaining to the age of transit stations reported to the National Transit Database in the 2022 and 2023 report years. This view is based off of the "2022 - 2023 NTD Annual Data - Stations (by Mode and Age)" dataset, which displays the same data at a lower level of aggregation. This view displays the data at a higher level (by agency).

    In many cases, stations are reported by each mode and type of service that uses them. For example, a single station used by bus - directly operated, bus - purchased transportation, and commuter bus - directly operated would be reported three times. For more detail, please see the NTD Policy Manual.

    Rural reporters do not report passenger stations and are not included in this file. Modes Demand Response, Demand Response - Taxi, Vanpool, and Publico also do not report stations and are also excluded.

    NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 and 2023 Facility Inventory database files.

    In years 2015-2021, you can find this data in the "Stations" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

    If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

  14. C

    Commuter Mode Share

    • data.ccrpc.org
    csv
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Champaign County Regional Planning Commission (2024). Commuter Mode Share [Dataset]. https://data.ccrpc.org/dataset/commuter-mode-share
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 2, 2024
    Dataset authored and provided by
    Champaign County Regional Planning Commission
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This commuter mode share data shows the estimated percentages of commuters in Champaign County who traveled to work using each of the following modes: drove alone in an automobile; carpooled; took public transportation; walked; biked; went by motorcycle, taxi, or other means; and worked at home. Commuter mode share data can illustrate the use of and demand for transit services and active transportation facilities, as well as for automobile-focused transportation projects.

    Driving alone in an automobile is by far the most prevalent means of getting to work in Champaign County, accounting for over 69 percent of all work trips in 2023. This is the same rate as 2019, and the first increase since 2017, both years being before the COVID-19 pandemic began.

    The percentage of workers who commuted by all other means to a workplace outside the home also decreased from 2019 to 2021, most of these modes reaching a record low since this data first started being tracked in 2005. The percentage of people carpooling to work in 2023 was lower than every year except 2016 since this data first started being tracked in 2005. The percentage of people walking to work increased from 2022 to 2023, but this increase is not statistically significant.

    Meanwhile, the percentage of people in Champaign County who worked at home more than quadrupled from 2019 to 2021, reaching a record high over 18 percent. It is a safe assumption that this can be attributed to the increase of employers allowing employees to work at home when the COVID-19 pandemic began in 2020.

    The work from home figure decreased to 11.2 percent in 2023, but which is the first statistically significant decrease since the pandemic began. However, this figure is still about 2.5 times higher than 2019, even with the COVID-19 emergency ending in 2023.

    Commuter mode share data was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.

    As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

    Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

    For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Means of Transportation to Work.

    Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (18 September 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (10 October 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (14 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).

  15. D

    NTD Annual Data View- Metrics (by Agency)

    • data.transportation.gov
    application/rdfxml +5
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Transit Administration (2024). NTD Annual Data View- Metrics (by Agency) [Dataset]. https://data.transportation.gov/w/g27i-aq2u/m7rw-edbr?cur=15tAXhzdS-r&from=E-kP7_kNZI-
    Explore at:
    application/rssxml, xml, csv, json, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Dec 16, 2024
    Dataset authored and provided by
    Federal Transit Administration
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    Provides agency-wide totals for service and cost efficiency metrics for data reported to the National Transit Database in the 2022 and 2023 report years. This view is based off of the "2022 - 2023 NTD Annual Data - Metrics" dataset, which displays the same data at a lower level of aggregation (by mode). This view displays the data at a higher level (by agency).

    Only Full Reporters report data on Passenger Miles. The columns containing ratios have been calculated as the average across all reporting modes, not as the ratio of summed data. Thus, each transit agency received equal weight, regardless of that agency's total ridership.

    NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This view and its parent dataset are based on the 2022 and 2023 Federal Funding Allocation, Operating Expenses, and Service database files.

    In years 2015-2021, you can find this data in the "Metrics" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

    In versions of the NTD data tables from before 2014, you can find data on metrics in the files called "Fare per Passenger and Recovery Ratio" and "Service Supplied and Consumed Ratios."

    If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

  16. e

    Periodicities in KIC 3527751 from 3yr Kepler data - Dataset - B2FIND

    • b2find.eudat.eu
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Periodicities in KIC 3527751 from 3yr Kepler data - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/36b938f0-daa7-5086-8f55-581076a73d55
    Explore at:
    Dataset updated
    May 9, 2023
    Description

    We analyze 3yr of nearly continuous Kepler spacecraft short cadence observations of the pulsating subdwarf B (sdB) star KIC 3527751. We detect a total of 251 periodicities, most in the g-mode domain, but some where p-modes occur, confirming that KIC3527751 is a hybrid pulsator. We apply seismic tools to the periodicities to characterize the properties of KIC3527751. Techniques to identify modes include asymptotic period spacing relationships, frequency multiplets, and the separation of multiplet splittings. These techniques allow for 189 (75%) of the 251 periods to be associated with pulsation modes. Included in these are three sets of l=4 multiplets and possibly an l=9 multiplet. Period spacing sequences indicate l=1 and 2 overtone spacings of 266.4+/-0.2 and 153.2+/-0.2s, respectively. We also calculate reduced periods, from which we find evidence of trapped pulsations. Such mode trappings can be used to constrain the core/atmosphere transition layers. Interestingly, frequency multiplets in the g-mode region, which sample deep into the star, indicate a rotation period of 42.6+/-3.4days while p-mode multiplets, which sample the outer envelope, indicate a rotation period of 15.3+/-0.7days. We interpret this as differential rotation in the radial direction with the core rotating more slowly. This is the first example of differential rotation for a sdB star.

  17. Zero Modes and Classification of Combinatorial Metamaterials

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Nov 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais (2022). Zero Modes and Classification of Combinatorial Metamaterials [Dataset]. http://doi.org/10.5281/zenodo.7070963
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.

    In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.

    Modescaling_raw_data.zip

    This file contains uniformly sampled unit cell designs for metamaterial M2 and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.

    Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4)\}\).

    Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).

    Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4), M_k(5), M_k(6)\}\).

    Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+9: number of zero modes \(M_k(n_x, n_y)\) in order: \(\{M_k(1, 1), M_k(2, 2), M_k(3, 2), M_k(4, 2), M_k(1, 1), M_k(2, 2), M_k(2, 3), M_k(2, 4)\}\).

    Simulation data of metamaterial M1 for \(k_x \times k_y\) metamaterials are stored in compressed numpy array format (.npz) and can be loaded in Python with the Numpy package using the numpy.load command. These files are named "smiley_cube_x_y_\(k_x\)x\(k_y\).npz", which contain all possible metamaterial designs, and "smiley_cube_uniform_sample_x_y_\(k_x\)x\(k_y\).npz", which contain uniformly sampled metamaterial designs. The configurations are accessed with the keyword argument 'configs'. The classification is accessed with the keyword argument 'compatible'. The configurations array is of shape [Nsim, \(k_x\), \(k_y\)], the classification array is of shape [Nsim]. The building blocks in the configuration are denoted by 0 or 1, which correspond to the red/green and white/dashed building blocks respectively. Classification is 0 or 1, which corresponds to I and C respectively.

    Modescaling_classification_results.zip

    This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells of metamaterial M2 in Modescaling_raw_data.zip. The data is organized as follows.

    The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))

    col 2: slope from \(n \geq 2\) onward (undefined for class X)

    col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

    col 4: \(M_k(1)\)

    The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))

    col 2: slope from \(n \geq 2\) onward (undefined for class X)

    col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

    col 4: \(M_k(1)\)

    The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))

    col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))

    col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)

    col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)

    col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)

    col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)

    col 7: (M_k(1,

  18. f

    Data from: New Variable Selection Method Using Interval Segmentation Purity...

    • acs.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu (2023). New Variable Selection Method Using Interval Segmentation Purity with Application to Blockwise Kernel Transform Support Vector Machine Classification of High-Dimensional Microarray Data [Dataset]. http://doi.org/10.1021/ci900032q.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant, or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. A new method for key gene selection has been proposed on the basis of interval segmentation purity that is defined as the purity of samples belonging to a certain class in intervals segmented by a mode search algorithm. This method identifies key variables most discriminative for each class, which offers possibility of unraveling the biological implication of selected genes. A salient advantage of the new strategy over existing methods is the capability of selecting genes that, though possibly exhibit a multimodal distribution, are the most discriminative for the classes of interest, considering that the expression levels of some genes may reflect systematic difference in within-class samples derived from different pathogenic mechanisms. On the basis of the key genes selected for individual classes, a support vector machine with block-wise kernel transform is developed for the classification of different classes. The combination of the proposed gene mining approach with support vector machine is demonstrated in cancer classification using two public data sets. The results reveal that significant genes have been identified for each class, and the classification model shows satisfactory performance in training and prediction for both data sets.

  19. S

    2022 NTD Annual Data - Track & Roadway (by Mode)

    • splitgraph.com
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datahub-transportation-gov (2024). 2022 NTD Annual Data - Track & Roadway (by Mode) [Dataset]. https://www.splitgraph.com/datahub-transportation-gov/2022-ntd-annual-data-track-roadway-by-mode-fzbb-f6kc
    Explore at:
    application/vnd.splitgraph.image, application/openapi+json, jsonAvailable download formats
    Dataset updated
    Sep 10, 2024
    Authors
    datahub-transportation-gov
    Description

    This dataset details track and roadway mileage/characteristics for each agency, mode, and type of service, as reported to the National Transit Database in Report Year 2022. These data include the types of track/roadway elements employed in transit operation, as well as the length and/or count of certain elements.

    NTD Data Tables organize and summarize data from the 2022 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 Transit Way Mileage database file.

    In years 2015-2021, you can find this data in the "Track and Roadway" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

    In versions of the data tables from before 2015, you can find corresponding data in the file called "Transit Way Mileage - Rail Modes" and "Transit Way Mileage - Non-Rail Modes."

    If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  20. NewsMediaBias-Plus Dataset

    • zenodo.org
    • huggingface.co
    bin, zip
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaina Raza; Shaina Raza (2024). NewsMediaBias-Plus Dataset [Dataset]. http://doi.org/10.5281/zenodo.13961155
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shaina Raza; Shaina Raza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NewsMediaBias-Plus Dataset

    Overview

    The NewsMediaBias-Plus dataset is designed for the analysis of media bias and disinformation by combining textual and visual data from news articles. It aims to support research in detecting, categorizing, and understanding biased reporting in media outlets.

    Dataset Description

    NewsMediaBias-Plus pairs news articles with relevant images and annotations indicating perceived biases and the reliability of the content. It adds a multimodal dimension for bias detection in news media.

    Contents

    • unique_id: Unique identifier for each news item. Each unique_id matches an image for the same article.
    • outlet: The publisher of the article.
    • headline: The headline of the article.
    • article_text: The full content of the news article.
    • image_description: Description of the paired image.
    • image: The file path of the associated image.
    • date_published: The date the article was published.
    • source_url: The original URL of the article.
    • canonical_link: The canonical URL of the article.
    • new_categories: Categories assigned to the article.
    • news_categories_confidence_scores: Confidence scores for each category.

    Annotation Labels

    • text_label: Indicates the likelihood of the article being disinformation:

      • Likely: Likely to be disinformation.
      • Unlikely: Unlikely to be disinformation.
    • multimodal_label: Indicates the likelihood of disinformation from the combination of the text snippet and image content:

      • Likely: Likely to be disinformation.
      • Unlikely: Unlikely to be disinformation.

    Getting Started

    Prerequisites

    • Python 3.6+
    • Pandas
    • Hugging Face Datasets
    • Hugging Face Hub

    Installation

    Load the dataset into Python:

    python
    Copy code
    from datasets import load_dataset ds = load_dataset("vector-institute/newsmediabias-plus") print(ds) # View structure and splits print(ds['train'][0]) # Access the first record of the train split print(ds['train'][:5]) # Access the first five records

    Load a Few Records

    python
    Copy code
    from datasets import load_dataset # Load the dataset in streaming mode streamed_dataset = load_dataset("vector-institute/newsmediabias-plus", streaming=True) # Get an iterable dataset dataset_iterable = streamed_dataset['train'].take(5) # Print the records for record in dataset_iterable: print(record)

    Contributions

    Contributions are welcome! You can:

    • Add Data: Contribute more data points.
    • Refine Annotations: Improve annotation accuracy.
    • Share Usage Examples: Help others use the dataset effectively.

    To contribute, fork the repository and create a pull request with your changes.

    License

    This dataset is released under a non-commercial license. See the LICENSE file for more details.

    Citation

    Please cite the dataset using this BibTeX entry:

    bibtex
    Copy code
    @misc{vector_institute_2024_newsmediabias_plus, title={NewsMediaBias-Plus: A Multimodal Dataset for Analyzing Media Bias}, author={Vector Institute Research Team}, year={2024}, url={https://huggingface.co/datasets/vector-institute/newsmediabias-plus} }

    Contact

    For questions or support, contact Shaina Raza at: shaina.raza@vectorinstitute.ai

    Disclaimer and User Guidance

    Disclaimer: The labels Likely and Unlikely are based on LLM annotations and expert assessments, intended for informational use only. They should not be considered final judgments.

    Guidance: This dataset is for research purposes. Cross-reference findings with other reliable sources before drawing conclusions. The dataset aims to encourage critical thinking, not provide definitive classifications.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2022). YouTube Videos and Channels Metadata [Dataset]. https://www.kaggle.com/datasets/thedevastator/revealing-insights-from-youtube-video-and-channe
Organization logo

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Area covered
YouTube
Description

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

By VISHWANATH SESHAGIRI [source]

About this dataset

This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use This Dataset

In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.

To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released

Research Ideas

  • Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.

  • Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.

  • Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...

Search
Clear search
Close search
Google apps
Main menu