100+ datasets found

YouTube Videos and Channels Metadata
kaggle.com
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). YouTube Videos and Channels Metadata [Dataset]. https://www.kaggle.com/datasets/thedevastator/revealing-insights-from-youtube-video-and-channe
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Area covered
YouTube
Description
YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

By VISHWANATH SESHAGIRI [source]

About this dataset

This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use This Dataset

In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.

To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released

Research Ideas

Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.

Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.

Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...
d
2022 - 2023 NTD Annual Data - Service (by Mode and Time Period)
catalog.data.gov
data.transportation.gov
+2more
Updated Jan 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Transit Administration (2025). 2022 - 2023 NTD Annual Data - Service (by Mode and Time Period) [Dataset]. https://catalog.data.gov/dataset/service-flat-file
Explore at:
Dataset updated
Jan 23, 2025
Dataset provided by
Federal Transit Administration
Description
This represents the Service data reported to the NTD by transit agencies to the NTD. In versions of the data tables from before 2014, you can find data on service in the file called "Transit Operating Statistics: Service Supplied and Consumed." If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
MMS 3 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Survey...
data.nasa.gov
heliophysicsdata.gsfc.nasa.gov
+3more
Updated Aug 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MMS 3 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Survey Mode, 32 Sample/s Data [Dataset]. https://data.nasa.gov/dataset/mms-3-search-coil-magnetometer-scm-ac-magnetic-field-level-2-l2-survey-mode-32-sample-s-da
Explore at:
Dataset updated
Aug 21, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Search Coil Magnetometer (SCM) AC Magnetic Field (32 samples/s), Level 2, Survey Mode Data. The tri-axial Search-Coil Magnetometer with its associated preamplifier measures three-dimensional magnetic field fluctuations. The analog magnetic waveforms measured by the SCM are digitized and processed inside the Digital Signal Processor (DSP), collected and stored by the Central Instrument Data Processor (CIDP) via the Fields Central Electronics Box (CEB). Prior to launch, all SCM Flight models were calibrated by LPP team members at the National Magnetic Observatory, Chambon-la-Foret (Orleans). Once per orbit, each SCM transfer function is checked thanks to the onboard calibration signal provided by the DSP. The SCM is operated for the entire MMS orbit in survey mode. Within scientific Regions Of Interest (ROI), burst mode data are also acquired as well as high speed burst mode data. This SCM data set corresponds to the AC magnetic field waveforms in nanoTesla and in the GSE frame. The SCM instrument paper can be found at http://link.springer.com/article/10.1007/s11214-014-0096-9 and the SCM data product guide at https://lasp.colorado.edu/mms/sdc/public/datasets/fields/.
MMS 2 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Burst...
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
+1more
Updated Aug 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MMS 2 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Burst Mode, 8192 Sample/s Data [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/mms-2-search-coil-magnetometer-scm-ac-magnetic-field-level-2-l2-burst-mode-8192-sample-s-d
Explore at:
Dataset updated
Aug 4, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Search Coil Magnetometer (SCM) AC Magnetic Field (8192 samples/s), Level 2, Burst Mode Data. The tri-axial Search-Coil Magnetometer with its associated preamplifier measures three-dimensional magnetic field fluctuations. The analog magnetic waveforms measured by the SCM are digitized and processed inside the Digital Signal Processor (DSP), collected and stored by the Central Instrument Data Processor (CIDP) via the Fields Central Electronics Box (CEB). Prior to launch, all SCM Flight models were calibrated by LPP team members at the National Magnetic Observatory, Chambon-la-Foret (Orleans). Once per orbit, each SCM transfer function is checked thanks to the onboard calibration signal provided by the DSP. The SCM is operated for the entire MMS orbit in survey mode. Within scientific Regions Of Interest (ROI), burst mode data are also acquired as well as high speed burst mode data. This SCM data set corresponds to the AC magnetic field waveforms in nanoTesla and in the GSE frame. The SCM instrument paper for SCM can be found at http://link.springer.com/article/10.1007/s11214-014-0096-9 and the SCM data product guide at https://lasp.colorado.edu/mms/sdc/public/datasets/fields/.
D
NTD Annual Data View - Track & Roadway (by Agency)
data.transportation.gov
application/rdfxml +5
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Transit Administration (2024). NTD Annual Data View - Track & Roadway (by Agency) [Dataset]. https://data.transportation.gov/w/pvgq-a73e/m7rw-edbr?cur=u9IHJAJqQp9
Explore at:
application/rssxml, csv, json, application/rdfxml, xml, tsvAvailable download formats
Dataset updated
Dec 16, 2024
Dataset authored and provided by
Federal Transit Administration
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Provides agency-wide totals for track and roadway components. Data is from the National Transit Database in the 2022 and 2023 report years. These data include the types of track/roadway elements employed in transit operation, as well as the length and/or count of certain elements. This view is based off of the "2022 - 2023 NTD Annual Data - Track & Roadway (by Mode)" dataset, which displays the same data at a lower level of aggregation. This view displays the data at a higher level (by agency).

NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. The dataset that this view references is based on the 2022 and 2023 Transit Way Mileage database files.

In years 2015-2021, you can find this data in the "Track and Roadway" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

In versions of the data tables from before 2015, you can find corresponding data in the file called "Transit Way Mileage - Rail Modes" and "Transit Way Mileage - Non-Rail Modes."

If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
o
Figure data sets for the paper "Non-classical correlations over 1250 modes...
explore.openaire.eu
Updated Aug 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Businger; L. Nicolas; T. Sanchez Mejia; A. Ferrier; P. Goldner; M. Afzelius (2022). Figure data sets for the paper "Non-classical correlations over 1250 modes between telecom photons and 979-nm photons stored 171Yb3+:Y2SiO5" [Dataset]. http://doi.org/10.5281/zenodo.7006773
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7006773
Dataset updated
Aug 18, 2022
Authors
M. Businger; L. Nicolas; T. Sanchez Mejia; A. Ferrier; P. Goldner; M. Afzelius
Description
{"references": ["M. Businger et al., "Remote distribution of non-classical correlations over 1250 modes between telecom photons and 978 nm photons stored in 171Yb3+:Y2SiO5 crystal",\u00a0arXiv:2205.01481"]} Processed datasets corresponding to the Figures published in the article.
i
Chililab Surveillance Dataset 2004 -2011 - Vietnam
datacatalog.ihsn.org
catalog.ihsn.org
Updated Mar 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pham Viet Cuong (2019). Chililab Surveillance Dataset 2004 -2011 - Vietnam [Dataset]. https://datacatalog.ihsn.org/catalog/5539
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Le Thi Vui
Bui Tu Quyen
Bui Thi Thu Ha
Le Minh Thi
Truong Quang Tien
Pham Viet Cuong
Duong Kim Tuan
Tran Huu Bich
La Ngoc Quang
Nguyen Thanh Nghi
Time period covered
2004 - 2012
Area covered
Vietnam
Description
Abstract

CHILILAB is designed as a site for training and research for Hanoi School of Public Health. CHILILAB is based on the foundation of an active and sustainable surveillance system monitoring basic morbidity and mortality as well as other demographic, socio-economic and culture characteristics. CHILILAB was designed to focus on issues related to adolescent health and injury. The entire population of 3 selected townships and 4 communes were invited to participate in the study, included 57161 participants from 17993 household. There are 6 objectives of CHILILAB, includes:

1.Establish a demographic and epidemiologic surveillance system on cyclic morbidity and mortality in Chi Linh district

2.Identify patterns and trends of morbidity and mortality in the district population periodically and longitudinally

3.Provide data on adolescent health especially risk and protective factor affecting adolescent health issues

4.Pilot and evaluate community health intervention strategies periodically in order to form a basic for developing health policies in the locality and elsewhere

5.Improve the procedures for data collection, analysis and application at the community level in a reliable and effective manner

6.Strengthen the capacity of public health professionals in Hanoi School of Public Health.

Geographic coverage

Chi Linh is a mountainous town of Hai Duong Province of northern Vietnam. Chi Linh town covers an area of 300,54 square kilometers, contains 17 communes and 3 towns and has a population of 142,278 (50.3% female, 49.7% male). A number of different ethnic minority groups live in the district, including San Diu, Chinese, H'Mong, Tay, Nung and some other ethnic minorities. The population density varies across the district, with higher density in lowland communes and places with access to transportation routes. Approximately 25% of the total population lives in urban areas. The number of adolescents between 15-24 years of age accounts for 19.3% of the total district population according to 1999 population data. Residents receive information, including health information, from several different channels: 100% of the communes/towns are covered with the loud-speaker system; 100% of households have their own radio receivers; about 60% of households have TVs.

Several national roads as well as inter-communal roads run through the town. While the national roads are high quality and therefore can handle a high volume of high speed traffic, the inter-communal roads are ill-maintained due to insufficient funding. As a result, some communes are not very easily accessible. Much of the land in Chi Linh is used for agriculture, including rice and subsidiary dry crop cultivation. In addition, forest resources in the district are currently being exploited. The town also contains two thermal power plants as well as a medical glass plant, a refractory soil mine, and a leather shoe making plant providing employment in the district.

Analysis unit

Individual

Universe

Resident household members of household's resident within the demographic surveillance area.

Kind of data

Event history data

Frequency of data collection

From 2004 to the end of 2007: 4 rounds per year From 2008 to the end of 2012: 2 rounds per year From 2013: 1 round per year

Sampling procedure

The entire population of 7 communes and townships within Chi Linh town will be selected to be the sample of study. This includes approximately 70,000 individuals from 18,000 households. Data was collected at both the household and individual levels. A list of the households in the district was made with the involvement of relevant authorities and use of existing local lists of households at the time when the baseline survey is conducted. Study subjects for specific studies will be selected according to their objectives and aims of those studies.

Sampling deviation

None

Mode of data collection

Face-to-face [f2f]

Research instrument

CHILILAB includes 7 kinds of form to collect data:

Form Q (General Information of household): Contains basic information of household as: Name of household, address, household ID, identifies some events in the household during 3 months (or 4 months or 6 months).

Form H (Household information): some information about the social - economic of household as: list of properties, type of house wall, house roof or house foundation

Form C (Individual information): Date of birth, sex, age, ethnic, religion, health insurance, numbers of children

Form DC (Immigration information): Date of immigration or out-migration, reason of immigration and where do they live to/from?

Form TH (Pregnancy information): Pregnancy status, start-date and end-date of pregnancy

Form CH (Mortality information): Expectation of life, mortality cause

Form CU (Household change information): Date to change, reason to change and the name of new household

Cleaning operations

Data collected from the sample population is stored in the database of the CHILILAB in HSPH. Periodical updates of demographic information and socio-economic, cultural, health conditions are extracted from the database for analysis. Findings are then disseminated to the community and policy makers in annual workshops with local authorities, though manuscripts, and factsheets in Vietnamese as well as in English. Individual and household data in CHILILAB could be accessed if the request for data use is sent to the CHILILAB coordinator for consideration.

Response rate

100% household in the field have joined in CHILILAB
D
NTD Annual Data View - Employees (By Mode)
data.transportation.gov
application/rdfxml +5
Updated Dec 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Transit Administration (2024). NTD Annual Data View - Employees (By Mode) [Dataset]. https://data.transportation.gov/w/wsxw-2rpq/m7rw-edbr?cur=gXKkrSEk6Ux&from=xwPw7zR3Ahe
Explore at:
json, csv, application/rssxml, xml, tsv, application/rdfxmlAvailable download formats
Dataset updated
Dec 16, 2024
Dataset authored and provided by
Federal Transit Administration
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
This dataset details data on hours worked by public transportation employees and the head counts of employees for each applicable agency reporting to the National Transit Database in the 2022 and 2023 report years at the mode and type of service level.

NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 and 2023 Transit Agency Employees database files.

In years 2015-2021, you can find this data in the "Employees" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
U
Dataset for "Highly multi-mode hollow core fibres"
researchdata.bath.ac.uk
7z
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robbie Mears; Kerrianne Harrington; William Wadsworth; James Stone; Tim Birks (2025). Dataset for "Highly multi-mode hollow core fibres" [Dataset]. http://doi.org/10.15125/BATH-01499
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.15125/BATH-01499
Dataset updated
Jun 9, 2025
Dataset provided by
University of Bath
Authors
Robbie Mears; Kerrianne Harrington; William Wadsworth; James Stone; Tim Birks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
Engineering and Physical Sciences Research Council
Description
This repository contains all the raw data and raw images used in the paper titled 'Highly multi-mode hollow core fibres'. It is grouped into two folders of raw data and raw images. In the raw data there are a number of .dat files which contain alternating columns of wavelength and signal for the different measurements of transmission, cutback and bend loss for the different fibres. In the raw images, simple .tif files of the different fibres are given and different near field and far field images used in Figure 2.
e
Employment and Unemployment Survey, EUS 2016 - Jordan
erfdataportal.com
Updated Oct 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic Research Forum (2017). Employment and Unemployment Survey, EUS 2016 - Jordan [Dataset]. http://www.erfdataportal.com/index.php/catalog/133
Explore at:
Dataset updated
Oct 22, 2017
Dataset provided by
Economic Research Forum
Department of Statistics
Time period covered
2016
Area covered
Jordan
Description
Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

The Department of Statistics (DOS) carried out four rounds of the 2016 Employment and Unemployment Survey (EUS). The survey rounds covered a sample of about fourty nine thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design.

It is worthy to mention that the DOS employed new technology in data collection and data processing. Data was collected using electronic questionnaire instead of a hard copy, namely a hand held device (PDA).

The survey main objectives are: - To identify the demographic, social and economic characteristics of the population and manpower. - To identify the occupational structure and economic activity of the employed persons, as well as their employment status. - To identify the reasons behind the desire of the employed persons to search for a new or additional job. - To measure the economic activity participation rates (the number of economically active population divided by the population of 15+ years old). - To identify the different characteristics of the unemployed persons. - To measure unemployment rates (the number of unemployed persons divided by the number of economically active population of 15+ years old) according to the various characteristics of the unemployed, and the changes that might take place in this regard. - To identify the most important ways and means used by the unemployed persons to get a job, in addition to measuring durations of unemployment for such persons. - To identify the changes overtime that might take place regarding the above-mentioned variables.

The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

Geographic coverage

Covering a sample representative on the national level (Kingdom), governorates, and the three Regions (Central, North and South).

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey covered a national sample of households and all individuals permanently residing in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

Mode of data collection

Computer Assisted Personal Interview [capi]

Cleaning operations

----> Raw Data

A tabulation results plan has been set based on the previous Employment and Unemployment Surveys while the required programs were prepared and tested. When all prior data processing steps were completed, the actual survey results were tabulated using an ORACLE package. The tabulations were then thoroughly checked for consistency of data. The final report was then prepared, containing detailed tabulations as well as the methodology of the survey.

----> Harmonized Data

The SPSS package is used to clean and harmonize the datasets.

The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.

All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.

A post-harmonization cleaning process is then conducted on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
Modal comparisons (TSGB01)
gov.uk
tnaqa.mirrorweb.com
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Transport (2024). Modal comparisons (TSGB01) [Dataset]. https://www.gov.uk/government/statistical-data-sets/tsgb01-modal-comparisons
Explore at:
Dataset updated
Dec 19, 2024
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Transport
Description

Accessibility of tables

The department is currently working to make our tables accessible for our users. The data tables for these statistics are now accessible.

We would welcome any feedback on the accessibility of our tables, please email us.

Passenger transport

TSGB0101: https://assets.publishing.service.gov.uk/media/6762e055cdb5e64b69e307ab/tsgb0101.ods">Passenger transport by mode from 1952 (ODS, 24.2 KB)

TSGB0102: https://assets.publishing.service.gov.uk/media/6762e05eff2c870561bde7ef/tsgb0102.ods">Passenger journeys on public transport vehicles from 1950 (ODS, 13.9 KB)

Mode share

TSGB0103 (NTS0303): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821414/nts0303.ods" class="govuk-link">Average number of trips, stages, miles and time spent travelling by main mode (ODS, 55KB)

TSGB0104 (NTS0409a): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821479/nts0409.ods" class="govuk-link">Average number of trips by purpose and main mode (ODS, 122KB)

TSGB0105 (NTS0409b): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821479/nts0409.ods" class="govuk-link">Average distance travelled by purpose and main mode (ODS, 122KB)

Table TSGB0106 - people entering central London during the morning peak, since 1996

The data source for this table has been discontinued since it was last updated in December 2019.

Passenger casualty rates

TSGB0107 (RAS0203): https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods" class="govuk-link">Passenger casualty rates by mode (ODS, 21KB)

Travel to work

TSGB0108: https://assets.publishing.service.gov.uk/media/675968b1403b5cf848a292b2/tsgb0108.ods">Usual method of travel to work by region of residence (ODS, 50.1 KB)

TSGB0109: https://assets.publishing.service.gov.uk/media/6751b8c60191590a5f351191/tsgb0109.ods">Usual method of travel to work by region of workplace (ODS, 51.9 KB)

TSGB0110: https://assets.publishing.service.gov.uk/media/6751b8cf19e0c816d18d1e13/tsgb0110.ods">Time taken to travel to work by region of workplace (ODS, 40 KB)

TSGB0111: https://assets.publishing.service.gov.uk/media/6751b8e72086e98fae35119d/tsgb0111.ods">Average time taken to travel to work by region of workplace and usual method of travel (ODS, 42.5 KB)

TSGB0112: https://assets.publishing.service.gov.uk/media/6751b8f26da7a3435fecbd60/tsgb0112.ods">How workers usually travel to work by car by region of workplace (ODS, 24.7 KB)

<h2 id=
Data from: Preclinical PET data
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ville-Veikko Wettenhovi; Ville-Veikko Wettenhovi; Kimmo Jokivarsi; Kimmo Jokivarsi (2021). Preclinical PET data [Dataset]. http://doi.org/10.5281/zenodo.3528056
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3528056
Dataset updated
Apr 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ville-Veikko Wettenhovi; Ville-Veikko Wettenhovi; Kimmo Jokivarsi; Kimmo Jokivarsi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An open preclinical PET dataset. This dataset has been measured with the preclinical Siemens Inveon PET machine. The measured target is a (naive) rat with an injected dose of 21.4 MBq of FDG. The injection was done intravenously (IV) to the tail vein. No specific organ was investigated, but rather the glucose metabolism as a whole. The examination is a 60 minute dynamic acquisition. The measurement was conducted according to the ethical standards set by the University of Eastern Finland.

The dataset contains the original list-mode data, the (dynamic) sinogram created by the Siemens Inveon Acquisition Workplace (IAW) software (28 frames), the (dynamic) scatter sinogram created by the IAW software (28 frames), the attenuation sinogram created by the IAW software and the normalization coefficients created by the IAW software. Header files are included for all the different data files.

For documentation on reading the list-mode binary data, please ask Siemens.

This dataset can be used in the OMEGA software, including the list-mode data, to import the data to MATLAB/Octave, create sinograms from the list-mode data and reconstruct the imported data. For help on using the dataset with OMEGA, see the wiki.
D
NTD Annual Data View - Stations (by Agency and Age)
data.transportation.gov
application/rdfxml +5
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Transit Administration (2024). NTD Annual Data View - Stations (by Agency and Age) [Dataset]. https://data.transportation.gov/Public-Transit/NTD-Annual-Data-View-Stations-by-Agency-and-Age-/puti-h4fa
Explore at:
application/rssxml, xml, application/rdfxml, tsv, csv, jsonAvailable download formats
Dataset updated
Dec 16, 2024
Dataset authored and provided by
Federal Transit Administration
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Provides agency-wide totals for data pertaining to the age of transit stations reported to the National Transit Database in the 2022 and 2023 report years. This view is based off of the "2022 - 2023 NTD Annual Data - Stations (by Mode and Age)" dataset, which displays the same data at a lower level of aggregation. This view displays the data at a higher level (by agency).

In many cases, stations are reported by each mode and type of service that uses them. For example, a single station used by bus - directly operated, bus - purchased transportation, and commuter bus - directly operated would be reported three times. For more detail, please see the NTD Policy Manual.

Rural reporters do not report passenger stations and are not included in this file. Modes Demand Response, Demand Response - Taxi, Vanpool, and Publico also do not report stations and are also excluded.

NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 and 2023 Facility Inventory database files.

In years 2015-2021, you can find this data in the "Stations" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
C
Commuter Mode Share
data.ccrpc.org
csv
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Champaign County Regional Planning Commission (2024). Commuter Mode Share [Dataset]. https://data.ccrpc.org/dataset/commuter-mode-share
Explore at:
csvAvailable download formats
Dataset updated
Oct 2, 2024
Dataset authored and provided by
Champaign County Regional Planning Commission
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This commuter mode share data shows the estimated percentages of commuters in Champaign County who traveled to work using each of the following modes: drove alone in an automobile; carpooled; took public transportation; walked; biked; went by motorcycle, taxi, or other means; and worked at home. Commuter mode share data can illustrate the use of and demand for transit services and active transportation facilities, as well as for automobile-focused transportation projects.

Driving alone in an automobile is by far the most prevalent means of getting to work in Champaign County, accounting for over 69 percent of all work trips in 2023. This is the same rate as 2019, and the first increase since 2017, both years being before the COVID-19 pandemic began.

The percentage of workers who commuted by all other means to a workplace outside the home also decreased from 2019 to 2021, most of these modes reaching a record low since this data first started being tracked in 2005. The percentage of people carpooling to work in 2023 was lower than every year except 2016 since this data first started being tracked in 2005. The percentage of people walking to work increased from 2022 to 2023, but this increase is not statistically significant.

Meanwhile, the percentage of people in Champaign County who worked at home more than quadrupled from 2019 to 2021, reaching a record high over 18 percent. It is a safe assumption that this can be attributed to the increase of employers allowing employees to work at home when the COVID-19 pandemic began in 2020.

The work from home figure decreased to 11.2 percent in 2023, but which is the first statistically significant decrease since the pandemic began. However, this figure is still about 2.5 times higher than 2019, even with the COVID-19 emergency ending in 2023.

Commuter mode share data was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.

As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Means of Transportation to Work.

Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (18 September 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (10 October 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (14 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).
D
NTD Annual Data View- Metrics (by Agency)
data.transportation.gov
application/rdfxml +5
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Transit Administration (2024). NTD Annual Data View- Metrics (by Agency) [Dataset]. https://data.transportation.gov/w/g27i-aq2u/m7rw-edbr?cur=15tAXhzdS-r&from=E-kP7_kNZI-
Explore at:
application/rssxml, xml, csv, json, application/rdfxml, tsvAvailable download formats
Dataset updated
Dec 16, 2024
Dataset authored and provided by
Federal Transit Administration
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Provides agency-wide totals for service and cost efficiency metrics for data reported to the National Transit Database in the 2022 and 2023 report years. This view is based off of the "2022 - 2023 NTD Annual Data - Metrics" dataset, which displays the same data at a lower level of aggregation (by mode). This view displays the data at a higher level (by agency).

Only Full Reporters report data on Passenger Miles. The columns containing ratios have been calculated as the average across all reporting modes, not as the ratio of summed data. Thus, each transit agency received equal weight, regardless of that agency's total ridership.

NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This view and its parent dataset are based on the 2022 and 2023 Federal Funding Allocation, Operating Expenses, and Service database files.

In years 2015-2021, you can find this data in the "Metrics" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

In versions of the NTD data tables from before 2014, you can find data on metrics in the files called "Fare per Passenger and Recovery Ratio" and "Service Supplied and Consumed Ratios."

If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
e
Periodicities in KIC 3527751 from 3yr Kepler data - Dataset - B2FIND
b2find.eudat.eu
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Periodicities in KIC 3527751 from 3yr Kepler data - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/36b938f0-daa7-5086-8f55-581076a73d55
Explore at:
Dataset updated
May 9, 2023
Description
We analyze 3yr of nearly continuous Kepler spacecraft short cadence observations of the pulsating subdwarf B (sdB) star KIC 3527751. We detect a total of 251 periodicities, most in the g-mode domain, but some where p-modes occur, confirming that KIC3527751 is a hybrid pulsator. We apply seismic tools to the periodicities to characterize the properties of KIC3527751. Techniques to identify modes include asymptotic period spacing relationships, frequency multiplets, and the separation of multiplet splittings. These techniques allow for 189 (75%) of the 251 periods to be associated with pulsation modes. Included in these are three sets of l=4 multiplets and possibly an l=9 multiplet. Period spacing sequences indicate l=1 and 2 overtone spacings of 266.4+/-0.2 and 153.2+/-0.2s, respectively. We also calculate reduced periods, from which we find evidence of trapped pulsations. Such mode trappings can be used to constrain the core/atmosphere transition layers. Interestingly, frequency multiplets in the g-mode region, which sample deep into the star, indicate a rotation period of 42.6+/-3.4days while p-mode multiplets, which sample the outer envelope, indicate a rotation period of 15.3+/-0.7days. We interpret this as differential rotation in the radial direction with the core rotating more slowly. This is the first example of differential rotation for a sdB star.
Zero Modes and Classification of Combinatorial Metamaterials
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Nov 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais (2022). Zero Modes and Classification of Combinatorial Metamaterials [Dataset]. http://doi.org/10.5281/zenodo.7070963
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7070963
Dataset updated
Nov 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.

In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.

Modescaling_raw_data.zip

This file contains uniformly sampled unit cell designs for metamaterial M2 and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.

Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

col 0: label number to keep track

col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.

col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4)\}\).

Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).

Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

col 0: label number to keep track

col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.

col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4), M_k(5), M_k(6)\}\).

Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

col 0: label number to keep track

col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.

col k*k+1 - k*k+9: number of zero modes \(M_k(n_x, n_y)\) in order: \(\{M_k(1, 1), M_k(2, 2), M_k(3, 2), M_k(4, 2), M_k(1, 1), M_k(2, 2), M_k(2, 3), M_k(2, 4)\}\).

Simulation data of metamaterial M1 for \(k_x \times k_y\) metamaterials are stored in compressed numpy array format (.npz) and can be loaded in Python with the Numpy package using the numpy.load command. These files are named "smiley_cube_x_y_\(k_x\)x\(k_y\).npz", which contain all possible metamaterial designs, and "smiley_cube_uniform_sample_x_y_\(k_x\)x\(k_y\).npz", which contain uniformly sampled metamaterial designs. The configurations are accessed with the keyword argument 'configs'. The classification is accessed with the keyword argument 'compatible'. The configurations array is of shape [Nsim, \(k_x\), \(k_y\)], the classification array is of shape [Nsim]. The building blocks in the configuration are denoted by 0 or 1, which correspond to the red/green and white/dashed building blocks respectively. Classification is 0 or 1, which corresponds to I and C respectively.

Modescaling_classification_results.zip

This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells of metamaterial M2 in Modescaling_raw_data.zip. The data is organized as follows.

The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

col 0: label number to keep track

col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))

col 2: slope from \(n \geq 2\) onward (undefined for class X)

col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

col 4: \(M_k(1)\)

The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

col 0: label number to keep track

col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))

col 2: slope from \(n \geq 2\) onward (undefined for class X)

col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

col 4: \(M_k(1)\)

The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

col 0: label number to keep track

col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))

col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))

col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)

col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)

col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)

col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)

col 7: (M_k(1,
f
Data from: New Variable Selection Method Using Interval Segmentation Purity...
acs.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu (2023). New Variable Selection Method Using Interval Segmentation Purity with Application to Blockwise Kernel Transform Support Vector Machine Classification of High-Dimensional Microarray Data [Dataset]. http://doi.org/10.1021/ci900032q.s001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1021/ci900032q.s001
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Li-Juan Tang; Wen Du; Hai-Yan Fu; Jian-Hui Jiang; Hai-Long Wu; Guo-Li Shen; Ru-Qin Yu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant, or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. A new method for key gene selection has been proposed on the basis of interval segmentation purity that is defined as the purity of samples belonging to a certain class in intervals segmented by a mode search algorithm. This method identifies key variables most discriminative for each class, which offers possibility of unraveling the biological implication of selected genes. A salient advantage of the new strategy over existing methods is the capability of selecting genes that, though possibly exhibit a multimodal distribution, are the most discriminative for the classes of interest, considering that the expression levels of some genes may reflect systematic difference in within-class samples derived from different pathogenic mechanisms. On the basis of the key genes selected for individual classes, a support vector machine with block-wise kernel transform is developed for the classification of different classes. The combination of the proposed gene mining approach with support vector machine is demonstrated in cancer classification using two public data sets. The results reveal that significant genes have been identified for each class, and the classification model shows satisfactory performance in training and prediction for both data sets.
S
2022 NTD Annual Data - Track & Roadway (by Mode)
splitgraph.com
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datahub-transportation-gov (2024). 2022 NTD Annual Data - Track & Roadway (by Mode) [Dataset]. https://www.splitgraph.com/datahub-transportation-gov/2022-ntd-annual-data-track-roadway-by-mode-fzbb-f6kc
Explore at:
application/vnd.splitgraph.image, application/openapi+json, jsonAvailable download formats
Dataset updated
Sep 10, 2024
Authors
datahub-transportation-gov
Description
This dataset details track and roadway mileage/characteristics for each agency, mode, and type of service, as reported to the National Transit Database in Report Year 2022. These data include the types of track/roadway elements employed in transit operation, as well as the length and/or count of certain elements.

NTD Data Tables organize and summarize data from the 2022 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 Transit Way Mileage database file.

In years 2015-2021, you can find this data in the "Track and Roadway" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.

In versions of the data tables from before 2015, you can find corresponding data in the file called "Transit Way Mileage - Rail Modes" and "Transit Way Mileage - Non-Rail Modes."

If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
NewsMediaBias-Plus Dataset
zenodo.org
huggingface.co
bin, zip
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaina Raza; Shaina Raza (2024). NewsMediaBias-Plus Dataset [Dataset]. http://doi.org/10.5281/zenodo.13961155
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13961155
Dataset updated
Nov 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shaina Raza; Shaina Raza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NewsMediaBias-Plus Dataset

Overview

The NewsMediaBias-Plus dataset is designed for the analysis of media bias and disinformation by combining textual and visual data from news articles. It aims to support research in detecting, categorizing, and understanding biased reporting in media outlets.

Dataset Description

NewsMediaBias-Plus pairs news articles with relevant images and annotations indicating perceived biases and the reliability of the content. It adds a multimodal dimension for bias detection in news media.

Contents

unique_id: Unique identifier for each news item. Each unique_id matches an image for the same article.

outlet: The publisher of the article.

headline: The headline of the article.

article_text: The full content of the news article.

image_description: Description of the paired image.

image: The file path of the associated image.

date_published: The date the article was published.

source_url: The original URL of the article.

canonical_link: The canonical URL of the article.

new_categories: Categories assigned to the article.

news_categories_confidence_scores: Confidence scores for each category.

Annotation Labels

text_label: Indicates the likelihood of the article being disinformation:

Likely: Likely to be disinformation.

Unlikely: Unlikely to be disinformation.

multimodal_label: Indicates the likelihood of disinformation from the combination of the text snippet and image content:

Likely: Likely to be disinformation.

Unlikely: Unlikely to be disinformation.

Getting Started

Prerequisites

Python 3.6+

Pandas

Hugging Face Datasets

Hugging Face Hub

Installation

Load the dataset into Python:

python

Copy code

from datasets import load_dataset ds = load_dataset("vector-institute/newsmediabias-plus") print(ds) # View structure and splits print(ds['train'][0]) # Access the first record of the train split print(ds['train'][:5]) # Access the first five records

Load a Few Records

python

Copy code

from datasets import load_dataset # Load the dataset in streaming mode streamed_dataset = load_dataset("vector-institute/newsmediabias-plus", streaming=True) # Get an iterable dataset dataset_iterable = streamed_dataset['train'].take(5) # Print the records for record in dataset_iterable: print(record)

Contributions

Contributions are welcome! You can:

Add Data: Contribute more data points.

Refine Annotations: Improve annotation accuracy.

Share Usage Examples: Help others use the dataset effectively.

To contribute, fork the repository and create a pull request with your changes.

License

This dataset is released under a non-commercial license. See the LICENSE file for more details.

Citation

Please cite the dataset using this BibTeX entry:

bibtex

Copy code

@misc{vector_institute_2024_newsmediabias_plus, title={NewsMediaBias-Plus: A Multimodal Dataset for Analyzing Media Bias}, author={Vector Institute Research Team}, year={2024}, url={https://huggingface.co/datasets/vector-institute/newsmediabias-plus} }

Contact

For questions or support, contact Shaina Raza at: shaina.raza@vectorinstitute.ai

Disclaimer and User Guidance

Disclaimer: The labels Likely and Unlikely are based on LLM annotations and expert assessments, intended for informational use only. They should not be considered final judgments.

Guidance: This dataset is for research purposes. Cross-reference findings with other reliable sources before drawing conclusions. The dataset aims to encourage critical thinking, not provide definitive classifications.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). YouTube Videos and Channels Metadata [Dataset]. https://www.kaggle.com/datasets/thedevastator/revealing-insights-from-youtube-video-and-channe

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 14, 2022

Dataset provided by

Kaggle

Authors

The Devastator

Area covered

YouTube

Description

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

By VISHWANATH SESHAGIRI [source]

About this dataset

This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use This Dataset

In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.

To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released

Research Ideas

Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.

Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.

Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...

Clear search

Close search

Google apps

Main menu

YouTube Videos and Channels Metadata

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use This Dataset

Research Ideas

Acknowledgements

License

Columns

2022 - 2023 NTD Annual Data - Service (by Mode and Time Period)

MMS 3 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Survey...

MMS 2 Search Coil Magnetometer (SCM) AC Magnetic Field Level 2 (L2), Burst...

NTD Annual Data View - Track & Roadway (by Agency)

Figure data sets for the paper "Non-classical correlations over 1250 modes...

Chililab Surveillance Dataset 2004 -2011 - Vietnam

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Frequency of data collection

Sampling procedure

Sampling deviation

Mode of data collection

Research instrument

Cleaning operations

Response rate

NTD Annual Data View - Employees (By Mode)

Dataset for "Highly multi-mode hollow core fibres"

Employment and Unemployment Survey, EUS 2016 - Jordan

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Modal comparisons (TSGB01)

Passenger transport

Mode share

Passenger casualty rates

Travel to work

Data from: Preclinical PET data

NTD Annual Data View - Stations (by Agency and Age)

Commuter Mode Share

NTD Annual Data View- Metrics (by Agency)

Periodicities in KIC 3527751 from 3yr Kepler data - Dataset - B2FIND

Zero Modes and Classification of Combinatorial Metamaterials

Data from: New Variable Selection Method Using Interval Segmentation Purity...

2022 NTD Annual Data - Track & Roadway (by Mode)

NewsMediaBias-Plus Dataset

NewsMediaBias-Plus Dataset

Overview

Dataset Description

Contents

Annotation Labels

Getting Started

Prerequisites

Installation

Load a Few Records

Contributions

License

Citation

Contact

Disclaimer and User Guidance

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use This Dataset

Research Ideas

Acknowledgements