By VISHWANATH SESHAGIRI [source]
This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.
To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released
Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.
Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.
Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types
If you use this dataset in your research, please credit the original authors.
License
Unknown License - Please check the dataset description for more information.
File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...
This represents the Service data reported to the NTD by transit agencies to the NTD. In versions of the data tables from before 2014, you can find data on service in the file called "Transit Operating Statistics: Service Supplied and Consumed." If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
Search Coil Magnetometer (SCM) AC Magnetic Field (32 samples/s), Level 2, Survey Mode Data. The tri-axial Search-Coil Magnetometer with its associated preamplifier measures three-dimensional magnetic field fluctuations. The analog magnetic waveforms measured by the SCM are digitized and processed inside the Digital Signal Processor (DSP), collected and stored by the Central Instrument Data Processor (CIDP) via the Fields Central Electronics Box (CEB). Prior to launch, all SCM Flight models were calibrated by LPP team members at the National Magnetic Observatory, Chambon-la-Foret (Orleans). Once per orbit, each SCM transfer function is checked thanks to the onboard calibration signal provided by the DSP. The SCM is operated for the entire MMS orbit in survey mode. Within scientific Regions Of Interest (ROI), burst mode data are also acquired as well as high speed burst mode data. This SCM data set corresponds to the AC magnetic field waveforms in nanoTesla and in the GSE frame. The SCM instrument paper can be found at http://link.springer.com/article/10.1007/s11214-014-0096-9 and the SCM data product guide at https://lasp.colorado.edu/mms/sdc/public/datasets/fields/.
Search Coil Magnetometer (SCM) AC Magnetic Field (8192 samples/s), Level 2, Burst Mode Data. The tri-axial Search-Coil Magnetometer with its associated preamplifier measures three-dimensional magnetic field fluctuations. The analog magnetic waveforms measured by the SCM are digitized and processed inside the Digital Signal Processor (DSP), collected and stored by the Central Instrument Data Processor (CIDP) via the Fields Central Electronics Box (CEB). Prior to launch, all SCM Flight models were calibrated by LPP team members at the National Magnetic Observatory, Chambon-la-Foret (Orleans). Once per orbit, each SCM transfer function is checked thanks to the onboard calibration signal provided by the DSP. The SCM is operated for the entire MMS orbit in survey mode. Within scientific Regions Of Interest (ROI), burst mode data are also acquired as well as high speed burst mode data. This SCM data set corresponds to the AC magnetic field waveforms in nanoTesla and in the GSE frame. The SCM instrument paper for SCM can be found at http://link.springer.com/article/10.1007/s11214-014-0096-9 and the SCM data product guide at https://lasp.colorado.edu/mms/sdc/public/datasets/fields/.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Provides agency-wide totals for track and roadway components. Data is from the National Transit Database in the 2022 and 2023 report years. These data include the types of track/roadway elements employed in transit operation, as well as the length and/or count of certain elements. This view is based off of the "2022 - 2023 NTD Annual Data - Track & Roadway (by Mode)" dataset, which displays the same data at a lower level of aggregation. This view displays the data at a higher level (by agency).
NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. The dataset that this view references is based on the 2022 and 2023 Transit Way Mileage database files.
In years 2015-2021, you can find this data in the "Track and Roadway" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.
In versions of the data tables from before 2015, you can find corresponding data in the file called "Transit Way Mileage - Rail Modes" and "Transit Way Mileage - Non-Rail Modes."
If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
{"references": ["M. Businger et al., "Remote distribution of non-classical correlations over 1250 modes between telecom photons and 978 nm photons stored in 171Yb3+:Y2SiO5 crystal",\u00a0arXiv:2205.01481"]} Processed datasets corresponding to the Figures published in the article.
CHILILAB is designed as a site for training and research for Hanoi School of Public Health. CHILILAB is based on the foundation of an active and sustainable surveillance system monitoring basic morbidity and mortality as well as other demographic, socio-economic and culture characteristics. CHILILAB was designed to focus on issues related to adolescent health and injury. The entire population of 3 selected townships and 4 communes were invited to participate in the study, included 57161 participants from 17993 household. There are 6 objectives of CHILILAB, includes:
1.Establish a demographic and epidemiologic surveillance system on cyclic morbidity and mortality in Chi Linh district
2.Identify patterns and trends of morbidity and mortality in the district population periodically and longitudinally
3.Provide data on adolescent health especially risk and protective factor affecting adolescent health issues
4.Pilot and evaluate community health intervention strategies periodically in order to form a basic for developing health policies in the locality and elsewhere
5.Improve the procedures for data collection, analysis and application at the community level in a reliable and effective manner
6.Strengthen the capacity of public health professionals in Hanoi School of Public Health.
Chi Linh is a mountainous town of Hai Duong Province of northern Vietnam. Chi Linh town covers an area of 300,54 square kilometers, contains 17 communes and 3 towns and has a population of 142,278 (50.3% female, 49.7% male). A number of different ethnic minority groups live in the district, including San Diu, Chinese, H'Mong, Tay, Nung and some other ethnic minorities. The population density varies across the district, with higher density in lowland communes and places with access to transportation routes. Approximately 25% of the total population lives in urban areas. The number of adolescents between 15-24 years of age accounts for 19.3% of the total district population according to 1999 population data. Residents receive information, including health information, from several different channels: 100% of the communes/towns are covered with the loud-speaker system; 100% of households have their own radio receivers; about 60% of households have TVs.
Several national roads as well as inter-communal roads run through the town. While the national roads are high quality and therefore can handle a high volume of high speed traffic, the inter-communal roads are ill-maintained due to insufficient funding. As a result, some communes are not very easily accessible. Much of the land in Chi Linh is used for agriculture, including rice and subsidiary dry crop cultivation. In addition, forest resources in the district are currently being exploited. The town also contains two thermal power plants as well as a medical glass plant, a refractory soil mine, and a leather shoe making plant providing employment in the district.
Individual
Resident household members of household's resident within the demographic surveillance area.
Event history data
From 2004 to the end of 2007: 4 rounds per year From 2008 to the end of 2012: 2 rounds per year From 2013: 1 round per year
The entire population of 7 communes and townships within Chi Linh town will be selected to be the sample of study. This includes approximately 70,000 individuals from 18,000 households. Data was collected at both the household and individual levels. A list of the households in the district was made with the involvement of relevant authorities and use of existing local lists of households at the time when the baseline survey is conducted. Study subjects for specific studies will be selected according to their objectives and aims of those studies.
None
Face-to-face [f2f]
CHILILAB includes 7 kinds of form to collect data:
Form Q (General Information of household): Contains basic information of household as: Name of household, address, household ID, identifies some events in the household during 3 months (or 4 months or 6 months).
Form H (Household information): some information about the social - economic of household as: list of properties, type of house wall, house roof or house foundation
Form C (Individual information): Date of birth, sex, age, ethnic, religion, health insurance, numbers of children
Form DC (Immigration information): Date of immigration or out-migration, reason of immigration and where do they live to/from?
Form TH (Pregnancy information): Pregnancy status, start-date and end-date of pregnancy
Form CH (Mortality information): Expectation of life, mortality cause
Form CU (Household change information): Date to change, reason to change and the name of new household
Data collected from the sample population is stored in the database of the CHILILAB in HSPH. Periodical updates of demographic information and socio-economic, cultural, health conditions are extracted from the database for analysis. Findings are then disseminated to the community and policy makers in annual workshops with local authorities, though manuscripts, and factsheets in Vietnamese as well as in English. Individual and household data in CHILILAB could be accessed if the request for data use is sent to the CHILILAB coordinator for consideration.
100% household in the field have joined in CHILILAB
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
This dataset details data on hours worked by public transportation employees and the head counts of employees for each applicable agency reporting to the National Transit Database in the 2022 and 2023 report years at the mode and type of service level.
NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 and 2023 Transit Agency Employees database files.
In years 2015-2021, you can find this data in the "Employees" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.
If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains all the raw data and raw images used in the paper titled 'Highly multi-mode hollow core fibres'. It is grouped into two folders of raw data and raw images. In the raw data there are a number of .dat files which contain alternating columns of wavelength and signal for the different measurements of transmission, cutback and bend loss for the different fibres. In the raw images, simple .tif files of the different fibres are given and different near field and far field images used in Figure 2.
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
The Department of Statistics (DOS) carried out four rounds of the 2016 Employment and Unemployment Survey (EUS). The survey rounds covered a sample of about fourty nine thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design.
It is worthy to mention that the DOS employed new technology in data collection and data processing. Data was collected using electronic questionnaire instead of a hard copy, namely a hand held device (PDA).
The survey main objectives are: - To identify the demographic, social and economic characteristics of the population and manpower. - To identify the occupational structure and economic activity of the employed persons, as well as their employment status. - To identify the reasons behind the desire of the employed persons to search for a new or additional job. - To measure the economic activity participation rates (the number of economically active population divided by the population of 15+ years old). - To identify the different characteristics of the unemployed persons. - To measure unemployment rates (the number of unemployed persons divided by the number of economically active population of 15+ years old) according to the various characteristics of the unemployed, and the changes that might take place in this regard. - To identify the most important ways and means used by the unemployed persons to get a job, in addition to measuring durations of unemployment for such persons. - To identify the changes overtime that might take place regarding the above-mentioned variables.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample representative on the national level (Kingdom), governorates, and the three Regions (Central, North and South).
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
Computer Assisted Personal Interview [capi]
----> Raw Data
A tabulation results plan has been set based on the previous Employment and Unemployment Surveys while the required programs were prepared and tested. When all prior data processing steps were completed, the actual survey results were tabulated using an ORACLE package. The tabulations were then thoroughly checked for consistency of data. The final report was then prepared, containing detailed tabulations as well as the methodology of the survey.
----> Harmonized Data
Accessibility of tables
The department is currently working to make our tables accessible for our users. The data tables for these statistics are now accessible.
We would welcome any feedback on the accessibility of our tables, please email us.
TSGB0101: https://assets.publishing.service.gov.uk/media/6762e055cdb5e64b69e307ab/tsgb0101.ods">Passenger transport by mode from 1952 (ODS, 24.2 KB)
TSGB0102: https://assets.publishing.service.gov.uk/media/6762e05eff2c870561bde7ef/tsgb0102.ods">Passenger journeys on public transport vehicles from 1950 (ODS, 13.9 KB)
TSGB0103 (NTS0303): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821414/nts0303.ods" class="govuk-link">Average number of trips, stages, miles and time spent travelling by main mode (ODS, 55KB)
TSGB0104 (NTS0409a): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821479/nts0409.ods" class="govuk-link">Average number of trips by purpose and main mode (ODS, 122KB)
TSGB0105 (NTS0409b): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/821479/nts0409.ods" class="govuk-link">Average distance travelled by purpose and main mode (ODS, 122KB)
Table TSGB0106 - people entering central London during the morning peak, since 1996
The data source for this table has been discontinued since it was last updated in December 2019.
TSGB0107 (RAS0203): https://assets.publishing.service.gov.uk/media/67600227b745d5f7a053ef74/ras0203.ods" class="govuk-link">Passenger casualty rates by mode (ODS, 21KB)
TSGB0108: https://assets.publishing.service.gov.uk/media/675968b1403b5cf848a292b2/tsgb0108.ods">Usual method of travel to work by region of residence (ODS, 50.1 KB)
TSGB0109: https://assets.publishing.service.gov.uk/media/6751b8c60191590a5f351191/tsgb0109.ods">Usual method of travel to work by region of workplace (ODS, 51.9 KB)
TSGB0110: https://assets.publishing.service.gov.uk/media/6751b8cf19e0c816d18d1e13/tsgb0110.ods">Time taken to travel to work by region of workplace (ODS, 40 KB)
TSGB0111: https://assets.publishing.service.gov.uk/media/6751b8e72086e98fae35119d/tsgb0111.ods">Average time taken to travel to work by region of workplace and usual method of travel (ODS, 42.5 KB)
TSGB0112: https://assets.publishing.service.gov.uk/media/6751b8f26da7a3435fecbd60/tsgb0112.ods">How workers usually travel to work by car by region of workplace (ODS, 24.7 KB)
<h2 id=
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An open preclinical PET dataset. This dataset has been measured with the preclinical Siemens Inveon PET machine. The measured target is a (naive) rat with an injected dose of 21.4 MBq of FDG. The injection was done intravenously (IV) to the tail vein. No specific organ was investigated, but rather the glucose metabolism as a whole. The examination is a 60 minute dynamic acquisition. The measurement was conducted according to the ethical standards set by the University of Eastern Finland.
The dataset contains the original list-mode data, the (dynamic) sinogram created by the Siemens Inveon Acquisition Workplace (IAW) software (28 frames), the (dynamic) scatter sinogram created by the IAW software (28 frames), the attenuation sinogram created by the IAW software and the normalization coefficients created by the IAW software. Header files are included for all the different data files.
For documentation on reading the list-mode binary data, please ask Siemens.
This dataset can be used in the OMEGA software, including the list-mode data, to import the data to MATLAB/Octave, create sinograms from the list-mode data and reconstruct the imported data. For help on using the dataset with OMEGA, see the wiki.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Provides agency-wide totals for data pertaining to the age of transit stations reported to the National Transit Database in the 2022 and 2023 report years. This view is based off of the "2022 - 2023 NTD Annual Data - Stations (by Mode and Age)" dataset, which displays the same data at a lower level of aggregation. This view displays the data at a higher level (by agency).
In many cases, stations are reported by each mode and type of service that uses them. For example, a single station used by bus - directly operated, bus - purchased transportation, and commuter bus - directly operated would be reported three times. For more detail, please see the NTD Policy Manual.
Rural reporters do not report passenger stations and are not included in this file. Modes Demand Response, Demand Response - Taxi, Vanpool, and Publico also do not report stations and are also excluded.
NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 and 2023 Facility Inventory database files.
In years 2015-2021, you can find this data in the "Stations" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.
If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This commuter mode share data shows the estimated percentages of commuters in Champaign County who traveled to work using each of the following modes: drove alone in an automobile; carpooled; took public transportation; walked; biked; went by motorcycle, taxi, or other means; and worked at home. Commuter mode share data can illustrate the use of and demand for transit services and active transportation facilities, as well as for automobile-focused transportation projects.
Driving alone in an automobile is by far the most prevalent means of getting to work in Champaign County, accounting for over 69 percent of all work trips in 2023. This is the same rate as 2019, and the first increase since 2017, both years being before the COVID-19 pandemic began.
The percentage of workers who commuted by all other means to a workplace outside the home also decreased from 2019 to 2021, most of these modes reaching a record low since this data first started being tracked in 2005. The percentage of people carpooling to work in 2023 was lower than every year except 2016 since this data first started being tracked in 2005. The percentage of people walking to work increased from 2022 to 2023, but this increase is not statistically significant.
Meanwhile, the percentage of people in Champaign County who worked at home more than quadrupled from 2019 to 2021, reaching a record high over 18 percent. It is a safe assumption that this can be attributed to the increase of employers allowing employees to work at home when the COVID-19 pandemic began in 2020.
The work from home figure decreased to 11.2 percent in 2023, but which is the first statistically significant decrease since the pandemic began. However, this figure is still about 2.5 times higher than 2019, even with the COVID-19 emergency ending in 2023.
Commuter mode share data was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.
As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.
Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.
For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Means of Transportation to Work.
Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (18 September 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (10 October 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (14 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Provides agency-wide totals for service and cost efficiency metrics for data reported to the National Transit Database in the 2022 and 2023 report years. This view is based off of the "2022 - 2023 NTD Annual Data - Metrics" dataset, which displays the same data at a lower level of aggregation (by mode). This view displays the data at a higher level (by agency).
Only Full Reporters report data on Passenger Miles. The columns containing ratios have been calculated as the average across all reporting modes, not as the ratio of summed data. Thus, each transit agency received equal weight, regardless of that agency's total ridership.
NTD Data Tables organize and summarize data from the 2022 and 2023 National Transit Database in a manner that is more useful for quick reference and summary analysis. This view and its parent dataset are based on the 2022 and 2023 Federal Funding Allocation, Operating Expenses, and Service database files.
In years 2015-2021, you can find this data in the "Metrics" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.
In versions of the NTD data tables from before 2014, you can find data on metrics in the files called "Fare per Passenger and Recovery Ratio" and "Service Supplied and Consumed Ratios."
If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
We analyze 3yr of nearly continuous Kepler spacecraft short cadence observations of the pulsating subdwarf B (sdB) star KIC 3527751. We detect a total of 251 periodicities, most in the g-mode domain, but some where p-modes occur, confirming that KIC3527751 is a hybrid pulsator. We apply seismic tools to the periodicities to characterize the properties of KIC3527751. Techniques to identify modes include asymptotic period spacing relationships, frequency multiplets, and the separation of multiplet splittings. These techniques allow for 189 (75%) of the 251 periods to be associated with pulsation modes. Included in these are three sets of l=4 multiplets and possibly an l=9 multiplet. Period spacing sequences indicate l=1 and 2 overtone spacings of 266.4+/-0.2 and 153.2+/-0.2s, respectively. We also calculate reduced periods, from which we find evidence of trapped pulsations. Such mode trappings can be used to constrain the core/atmosphere transition layers. Interestingly, frequency multiplets in the g-mode region, which sample deep into the star, indicate a rotation period of 42.6+/-3.4days while p-mode multiplets, which sample the outer envelope, indicate a rotation period of 15.3+/-0.7days. We interpret this as differential rotation in the radial direction with the core rotating more slowly. This is the first example of differential rotation for a sdB star.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.
In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.
Modescaling_raw_data.zip
This file contains uniformly sampled unit cell designs for metamaterial M2 and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Simulation data of metamaterial M1 for \(k_x \times k_y\) metamaterials are stored in compressed numpy array format (.npz) and can be loaded in Python with the Numpy package using the numpy.load command. These files are named "smiley_cube_x_y_\(k_x\)x\(k_y\).npz", which contain all possible metamaterial designs, and "smiley_cube_uniform_sample_x_y_\(k_x\)x\(k_y\).npz", which contain uniformly sampled metamaterial designs. The configurations are accessed with the keyword argument 'configs'. The classification is accessed with the keyword argument 'compatible'. The configurations array is of shape [Nsim, \(k_x\), \(k_y\)], the classification array is of shape [Nsim]. The building blocks in the configuration are denoted by 0 or 1, which correspond to the red/green and white/dashed building blocks respectively. Classification is 0 or 1, which corresponds to I and C respectively.
Modescaling_classification_results.zip
This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells of metamaterial M2 in Modescaling_raw_data.zip. The data is organized as follows.
The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))
col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))
col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)
col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)
col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)
col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)
col 7: (M_k(1,
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant, or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. A new method for key gene selection has been proposed on the basis of interval segmentation purity that is defined as the purity of samples belonging to a certain class in intervals segmented by a mode search algorithm. This method identifies key variables most discriminative for each class, which offers possibility of unraveling the biological implication of selected genes. A salient advantage of the new strategy over existing methods is the capability of selecting genes that, though possibly exhibit a multimodal distribution, are the most discriminative for the classes of interest, considering that the expression levels of some genes may reflect systematic difference in within-class samples derived from different pathogenic mechanisms. On the basis of the key genes selected for individual classes, a support vector machine with block-wise kernel transform is developed for the classification of different classes. The combination of the proposed gene mining approach with support vector machine is demonstrated in cancer classification using two public data sets. The results reveal that significant genes have been identified for each class, and the classification model shows satisfactory performance in training and prediction for both data sets.
This dataset details track and roadway mileage/characteristics for each agency, mode, and type of service, as reported to the National Transit Database in Report Year 2022. These data include the types of track/roadway elements employed in transit operation, as well as the length and/or count of certain elements.
NTD Data Tables organize and summarize data from the 2022 National Transit Database in a manner that is more useful for quick reference and summary analysis. This dataset is based on the 2022 Transit Way Mileage database file.
In years 2015-2021, you can find this data in the "Track and Roadway" data table on NTD Program website, at https://transit.dot.gov/ntd/ntd-data.
In versions of the data tables from before 2015, you can find corresponding data in the file called "Transit Way Mileage - Rail Modes" and "Transit Way Mileage - Non-Rail Modes."
If you have any other questions about this table, please contact the NTD Help Desk at NTDHelp@dot.gov.
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NewsMediaBias-Plus dataset is designed for the analysis of media bias and disinformation by combining textual and visual data from news articles. It aims to support research in detecting, categorizing, and understanding biased reporting in media outlets.
NewsMediaBias-Plus pairs news articles with relevant images and annotations indicating perceived biases and the reliability of the content. It adds a multimodal dimension for bias detection in news media.
unique_id
: Unique identifier for each news item. Each unique_id
matches an image for the same article.outlet
: The publisher of the article.headline
: The headline of the article.article_text
: The full content of the news article.image_description
: Description of the paired image.image
: The file path of the associated image.date_published
: The date the article was published.source_url
: The original URL of the article.canonical_link
: The canonical URL of the article.new_categories
: Categories assigned to the article.news_categories_confidence_scores
: Confidence scores for each category.text_label
: Indicates the likelihood of the article being disinformation:
Likely
: Likely to be disinformation.Unlikely
: Unlikely to be disinformation.multimodal_label
: Indicates the likelihood of disinformation from the combination of the text snippet and image content:
Likely
: Likely to be disinformation.Unlikely
: Unlikely to be disinformation.Load the dataset into Python:
from datasets import load_dataset
ds = load_dataset("vector-institute/newsmediabias-plus")
print(ds) # View structure and splits
print(ds['train'][0]) # Access the first record of the train split
print(ds['train'][:5]) # Access the first five records
from datasets import load_dataset
# Load the dataset in streaming mode
streamed_dataset = load_dataset("vector-institute/newsmediabias-plus", streaming=True)
# Get an iterable dataset
dataset_iterable = streamed_dataset['train'].take(5)
# Print the records
for record in dataset_iterable:
print(record)
Contributions are welcome! You can:
To contribute, fork the repository and create a pull request with your changes.
This dataset is released under a non-commercial license. See the LICENSE file for more details.
Please cite the dataset using this BibTeX entry:
@misc{vector_institute_2024_newsmediabias_plus,
title={NewsMediaBias-Plus: A Multimodal Dataset for Analyzing Media Bias},
author={Vector Institute Research Team},
year={2024},
url={https://huggingface.co/datasets/vector-institute/newsmediabias-plus}
}
For questions or support, contact Shaina Raza at: shaina.raza@vectorinstitute.ai
Disclaimer: The labels Likely
and Unlikely
are based on LLM annotations and expert assessments, intended for informational use only. They should not be considered final judgments.
Guidance: This dataset is for research purposes. Cross-reference findings with other reliable sources before drawing conclusions. The dataset aims to encourage critical thinking, not provide definitive classifications.
By VISHWANATH SESHAGIRI [source]
This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.
To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released
Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.
Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.
Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types
If you use this dataset in your research, please credit the original authors.
License
Unknown License - Please check the dataset description for more information.
File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...