12 datasets found
  1. Deep Video Understanding Annotations Dataset

    • catalog.data.gov
    • data.nist.gov
    • +2more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). Deep Video Understanding Annotations Dataset [Dataset]. https://catalog.data.gov/dataset/deep-video-understanding-annotations-dataset
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The BBC Land Girls TV series is a 3 season series. Each season is 5 episodes of about 45mins each. The TRECVID group at NIST worked with the BBC Corp. to release the dataset to the research community to work on video understanding tasks. Unfortunately, the hosting arrangement for the dataset was not successful and the release of the video dataset couldn't be done. We are releasing the annotations conducted by NIST, without any video data, so that the researchers interested in working on knowledge graph understanding and natural language analysis can take advantage of them.

  2. h

    Data from: BBC Desert Island Discs Dataset v 1.0

    • works.hcommons.org
    xlsx
    Updated Nov 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paige Morgan; Paige Morgan (2024). BBC Desert Island Discs Dataset v 1.0 [Dataset]. http://doi.org/10.17613/m6rj48t8x
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 18, 2024
    Dataset provided by
    unknown
    Authors
    Paige Morgan; Paige Morgan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Aug 2018
    Description

    This is the first version of a larger project that I've been working on in my spare time to create a dataset of the guests, songs, books, and luxuries on the long-running BBC radio program Desert Island Discs (https://www.bbc.co.uk/programmes/b006qnmr, https://en.wikipedia.org/wiki/Desert_Island_Discs). Originally, I began with data gathered by webscraping with Python for the Guardian's Datablog in November 2011 (https://www.theguardian.com/tv-and-radio/datablog/2010/nov/11/desert-island-discs-radio4).

    However, I knew that that version of the dataset was incomplete and had errors. In addition to wanting to develop a more complete version of the dataset (encompassing as much info as possible from the eight decades that Desert Island Discs has been on the air), I wanted to add information that would make it possible to ask other sorts of questions, for example, about the gender balance of the program, and the different roles of people who have been invited onto the show.

    To that end, I have modified the original Guardian spreadsheet by adding columns for gender, role, and for composer, as well as for performer. To create the information within these columns, I drew on info from the BBC Desert Island Discs website. In particular, for "role", I was drawing on the short description used in the archives of the program by date (https://www.bbc.co.uk/programmes/b006qnmr/broadcasts/2001/12) -- chiefly because the description ("Sue Lawley's castaway is TV chef Jamie Oliver") is what I have heard used on BBC 4 to announce and promote the program; so that label is an indication of what the BBC has seen as a legible and meaningful identity for each guest. It should be said that I see this as a dataset about the BBC program, and the choices its creators have made. It is certainly a better representation of the BBC and its choices, more than it is a complete representation of any of the guests themselves.

  3. e

    Annotating speaker stance in discourse: the Brexit Blog Corpus (BBC) -...

    • b2find.eudat.eu
    Updated Oct 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Annotating speaker stance in discourse: the Brexit Blog Corpus (BBC) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/6ffbf023-b3df-5d79-bc4c-74aedea6930b
    Explore at:
    Dataset updated
    Oct 10, 2024
    Description

    In this study, we explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts, where speakers take stance and position themselves, was compiled, the Brexit Blog Corpus (BBC). An analytical interface for the annotations was set up and the data were annotated independently by two annotators. The annotation procedure, the annotation agreement and the co-occurrence of more than one stance category in the utterances are described and discussed. The careful, analytical annotation process has by and large returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC. Purpose: The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers in the area of stance-taking in discourse. The BBC is a collection of texts from blog sources. The corpus texts are thematically related to the 2016 UK referendum concerning whether the UK should remain members of the European Union or not. The texts were extracted from the Internet from June to August 2015. With the Gavagai API (https://developer.gavagai.se), the texts were detected using seed words, such as Brexit, EU referendum, pro-Europe, europhiles, eurosceptics, United States of Europe, David Cameron, or Downing Street. The retrieved URLs were filtered so that only entries described as blogs in English were selected. Each downloaded document was split into sentential utterances, from which 2,200 utterances were randomly selected as the analysis data set. The final size of the corpus is 1,682 utterances, 35,492 words (169,762 characters without spaces). Each utterance contains from 3 to 40 words with a mean length of 21 words. For the data annotation process the Active Learning and Visual Analytics (ALVA) system (https://doi.org/10.1145/3132169 and https://doi.org/10.2312/eurp.20161139) was used. Two annotators, one who is a professional translator with a Licentiate degree in English Linguistics and the other one with a PhD in Computational Linguistics, carried out the annotations independently of one another. The data set can be downloaded in two different formats: a standard Microsoft Excel format and a raw data format (ZIP archive) which can be useful for analytical and machine learning purposes, for example, with the Python library scikit-learn. The Excel file includes one additional variable (utterance word length). The ZIP archive contains a set of directories (e.g., "contrariety" and "prediction") corresponding to the stance categories. Inside of each such directory, there are two nested directories corresponding to annotations which assign or not assign the respective category to utterances (e.g., inside the top-level category "prediction" there are two directories, "prediction" with utterances which were labeled with this category, and "no" with the rest of the utterances). Inside of the nested directories, there are textual files containing individual utterances. When using data from this study, the primary researcher wishes citation also to be made to the publication: Vasiliki Simaki, Carita Paradis, Maria Skeppstedt, Magnus Sahlgren, Kostiantyn Kucher, and Andreas Kerren. Annotating speaker stance in discourse: the Brexit Blog Corpus. In Corpus Linguistics and Linguistic Theory, 2017. De Gruyter, published electronically before print. https://doi.org/10.1515/cllt-2016-0060

  4. Z

    Audio Un-mixing Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hummersone, Chris (2020). Audio Un-mixing Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_19035
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Brookes, Tim
    Hummersone, Chris
    Stokes, Toby
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Data generated as part of the PhD research project "Improving the perceptual quality of single-channel blind audio source separation" by Toby Stokes.

    Data relate to sound source separation algorithm development and testing and comprise audio source files and mixtures, separation quality measurements, and MATLAB code.

    Further project details can be found at http://iosr.uk/unmixing

    ThesisData/Publication Cross-Reference

    The data archive is structured according to the thesis chapter in which it was generated/used. This document provides a cross-reference to/from other publications in which it has been used.

    ThesisData-->Publications

    --- Chapter 6 --- Data also published at: BBC ARP (2012) AES 134 (2013) Textbook chapter (2014)

    --- Chapter 7 --- Data also used for: AES 137 (2014) BBC SNN (2015)

    Publications-->ThesisData

    --- BBC ARP (2012) --- Uses data in: Chapter 6

    --- AES134 (2013) --- Uses data in: Chapter 6

    --- AES137 (2014) --- Uses data in: Chapter 7

    --- Textbook chapter (2014) --- Uses data in: Chapter 6

    --- BBC SNN (2015) --- Uses data in: Chapter 7

    References

    BBC ARP (2012): Stokes T, Brookes T, Hummersone C. (2012) 'Improving the Quality of Separated Audio: What Works?', BBC ARP Showcase

    AES134 (2013): Stokes T, Hummersone C, Brookes TS. (2013) 'Reducing Binary Masking Artefacts in Blind Audio Source Separation'. Rome, Italy: AES 134th Convention paper 8853

    AES137 (2014): Stokes T, Hummersone C, Brookes TS, Mason A. (2014) 'Perceptual Quality of Audio Separated Using Sigmoidal Masks'. Los Angeles, USA: AES 137th Convention paper 9182

    Textbook chapter (2014): Hummersone C, Stokes T, Brookes T. (2014) 'On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis'. in Naik GR, Wang W (eds.) Blind Source Separation: Advances in Theory, Algorithms and Applications Berlin/Heidelberg : Springer Article number 12 , pp. 349-368.

    BBC SNN (2015): Stokes T, Hummersone C, Brookes T. (2015) 'Improving the Quality of Un-Mixed Audio'. London, UK: BBC Sound Now & Next, 19-20 May

    Thesis (2015): Stokes T, "Improving the perceptual quality of single-channel blind audio source separation", PhD thesis, University of Surrey, UK, 2015

  5. BBC Maida Vale Impulse Response Dataset

    • zenodo.org
    zip
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gavin Kearney; Gavin Kearney; Helena Daffern; Helena Daffern; Patrick Cairns; Patrick Cairns; Anthony Hunt; Ben Lee; Jacob Cooper; Jacob Cooper; Panos Tsagkarakis; Panos Tsagkarakis; Tomasz Rudzki; Tomasz Rudzki; Daniel Johnston; Daniel Johnston; Anthony Hunt; Ben Lee (2024). BBC Maida Vale Impulse Response Dataset [Dataset]. http://doi.org/10.5281/zenodo.10020866
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gavin Kearney; Gavin Kearney; Helena Daffern; Helena Daffern; Patrick Cairns; Patrick Cairns; Anthony Hunt; Ben Lee; Jacob Cooper; Jacob Cooper; Panos Tsagkarakis; Panos Tsagkarakis; Tomasz Rudzki; Tomasz Rudzki; Daniel Johnston; Daniel Johnston; Anthony Hunt; Ben Lee
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This repository presents a dataset of spatial impulse responses measured from the BBC Maida Vale Studios.

    The measurements were undertaken in Summer/Autumn 2021 by researchers from the University of York, led by Prof Gavin Kearney and Prof Helena Daffern and team members of BBC R&D.

    The measured studio live rooms presented here are studios MV4 and MV5.

    The dataset for each room includes:

    - Higher Order Ambisonic (3rd Order) spatial impulse responses for 3DOF/6DOF rendering.

    - Reference KEMAR binaural measurements

    - ISO-3382 measurements.

    - Readme files for each room.

    Important notes:

    - If you use this dataset, please cite the following paper in your work:

    Kearney, G., Daffern, H., Cairns, P., Hunt, A., Lee, B., Cooper, J., Tsagkarakis, P., Rudzki, T. and Johnston, D., 2022, September. Measuring the Acoustical Properties of the BBC Maida Vale Recording Studios for Virtual Reality. In Acoustics (Vol. 4, No. 3, pp. 783-799). MDPI.

    - Source orientations have been measured in the four cardinal directions (N, E, S, W) and impulse responses can be combined to simulate sources with first order directivity patterns. See paper above for further details.

    - For 3DOF/6DOF measurements, Ambix config files are also included for quick audition of the IRs using an appropriate convolver (e.g. MCFX convolver http://www.matthiaskronlachner.com/?p=1910).

    - The ISO measurements should not be used for auralisation.

    - Measurements taken from the same source-receiver position should ideally not be used directly. If you wish to simulate the same source/receiver position for natural reverberation foldback, then the direct sound portion should be removed as the frequency reponse of this component will be imbalanced, and should be replaced with direct monitoring of a close miked source via your soundcard.

    If you have any questions about the dataset, please contact gavin.kearney@york.ac.uk

  6. BBC TV's weekly reach in the United Kingdom (UK) 2015-2024, by channel

    • statista.com
    Updated Jul 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). BBC TV's weekly reach in the United Kingdom (UK) 2015-2024, by channel [Dataset]. https://www.statista.com/statistics/284752/bbc-tv-reach-by-channel-in-the-uk/
    Explore at:
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United Kingdom
    Description

    BBC One had the greatest weekly reach among the BBC's television services in the fiscal year ending March 31, 2024. All channels in the ranking experienced a decline in audience reach over the past years. The BBC celebrates its 100th birthday and enters a period of uncertainty The British Broadcasting Corporation (BBC) is the public broadcasting service of the United Kingdom and the oldest broadcasting organization worldwide. It was established in London on October 18, 1922, but after an entire century on air, the BBC now enters an uncertain future in terms of funding. Around 70 percent of the BBC's annual income is generated via license fees, but as the government announced the termination of this tax by 2027, the BBC will have to explore new avenues to stay afloat. So what will the future hold for the BBC and its programming now that consumers will no longer be required to pay the annual fee? Sources speculate that funding alternatives could include a subscription service, part-privatization, or direct government funding. The portfolio goes with the (digital) flow The BBC operates several television channels in the UK and worldwide, including the news-only channel BBC News and the politics-centered BBC Parliament. As of 2024, BBC One remained the company's flagship service, and data showed that the lion's share of the BBC's total television programming spending was allocated towards BBC One content that year. In addition to offering cross-genre television and radio broadcasts, the corporation keeps expanding its range of online services to resonate with audiences in the UK and abroad. The BBC runs a designated news platform alongside the on-demand video streaming platform BBC iPlayer.

  7. o

    OpenDevelopment

    • data.opendevelopmentmekong.net
    Updated Sep 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). OpenDevelopment [Dataset]. https://data.opendevelopmentmekong.net/dataset/webpage-capture-on-the-news-article-of-probe-finds-bbc-video-on-human-trafficking-baseless
    Explore at:
    Dataset updated
    Sep 5, 2023
    Description

    This webpage capture shows about the video posted on YouTube on March 8 by the BBC Eye, an investigative section of BBC World Service, human trafficking victims are locked up, beaten, starved and forced to work for criminal gangs at several illegal gambling places in Preah Sihanouk province. In the video Preah Sihanouk province is portrayed as a hotbed for human trafficking and forced confinement. The story features the notorious Huang Le, a site owned by a powerful person who is alleged to be close to authorities and government leaders.

  8. Z

    BioPropaPhenKG Towards Monkeypox and COVID-19 Case Tracing and Analysing

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. A. Medeiros, Gabriel (2024). BioPropaPhenKG Towards Monkeypox and COVID-19 Case Tracing and Analysing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10987742
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset authored and provided by
    H. A. Medeiros, Gabriel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains:

    The BioPropaPhen ontology created from PropaPhen, being specialized with UMLS and World Knowledge Graph ontologies;

    A neo4j 4.4.3 dump file of the BioPropaPhenKG knowledge graph with WHO ground truth data about COVID-19 and Monkeypox, and enhanced presence edges between UMLS entities to World KG entities for evaluating the Description-Detection-Prediction Framework

    The datasets used for enhancing the KG are:

    Phenomenon Dataset Period Documents Source Link

    COVID-19 Aylien Nov-2019 8 Online News ttps://aylien.com/resources/datasets/coronavirus-dataset

    COVID-19 CORD-19 Dec-2019 720 Medical Articles https://allenai.org/data/cord-19

    COVID-19 RedditCOVID Feb-2020 4,980 Social Media https://paperswithcode.com/dataset/the-reddit-covid-dataset

    Monkeypox Mined from BBC May-2022 27 Online News

    Monkeypox Mined from Pubmed June-2022 36 Medical Articles

    Monkeypox MonkeyPox2022 May-2022 33,826 Social Media https://doi.org/10.3390/idr14060087

  9. e

    International Relations (February 1960) - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Sep 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). International Relations (February 1960) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/092f08f2-f360-5466-b830-a3e99de80675
    Explore at:
    Dataset updated
    Sep 27, 2023
    Description

    Judgement on American and Soviet foreign policy as well as the competition between the great powers. Topics: Most important domestic and foreign policy problems; perceived changes in the relations between the great powers; attitude to selected countries and politicians; preferred East-West orientation of one´s own country; the peace efforts of China; danger of war; assessment of the credibility of the foreign policy of the USSR and the western powers as well as the seriousness of the disarmament efforts of the great powers; principle agreement of one´s own country with the interests of the USA, the USSR, Great Britain, France and China; expected development of agreement between the USSR and China; expected development of the economic and military competition between the great powers; contribution of NATO to European security; NATO contribution of one´s own country; trust in NATO; judgement on the result of the Paris summit conference and assessment of the readiness of the participants to make concessions; attitude to concessions by the western powers in the Berlin question; comparison of current status and future development of science, the military, the standard of living, industrial and agricultural production, welfare, technology, medicine and space flight in the USA and the USSR; assessment of the steadfastness of the American as well as Soviet population in the respective basic ideas and assessment of the readiness of the peoples to make an effort for this conviction; judgement on the prospects for the future of the two economic systems; frequency of watching television in the evening hours; TV possession; number of adults watching television in the afternoon as well as in the evening; going to the movies; assessment of the influence of foreign films on one´s own country; impression of Americans (tourists, students, business people, musicians, politicians) who have been in one´s country; assessment of the influence of American magazines, books, films, television programs, the Voice of America and Jazz on one´s own country; attitude to stationing of American troops in the country and judgement on their conduct; most important sources of information about the USA; perceived differences between American and British broadcast of news and information; most trustworthy source of news; attitude to construction of nuclear weapons by France and the atomic bomb test in the Sahara; the significance of the visit by Khruschev in France for world peace. The following questions were posed except in Great Britain: media usage in form of a detailed recording of the frequency of listening to foreign radio stations (BBC, BFN, AFN) as well as the Voice of America; self-assessment of knowledge of English and judgement on the understandability of radio announcers; union membership; length of interview. The following questions were posed in France: possession of a motor vehicle; possession of a radio; house ownership. The following questions were posed in Germany: number of contact attempts; willingness of respondent to cooperate. The following questions were posed in Italy: place of interview; day of interview. Beurteilung der amerikanischen und sowjetischen Außenpolitik sowie des Wettstreits zwischen den Großmächten. Themen: Wichtigste innen- und außenpolitische Probleme; empfundene Veränderungen in den Beziehungen zwischen den Großmächten; Einstellung zu ausgewählten Ländern und Politikern; präferierte Ost-West-Orientierung des eigenen Landes; die Friedensbemühungen Chinas; Kriegsgefahr; Einschätzung der Glaubhaftigkeit der Außenpolitik der UdSSR und der Westmächte sowie der Ernsthaftigkeit der Abrüstungsbemühungen der Großmächte; grundsätzliche Übereinstimmung des eigenen Landes mit den Interessen der USA, der UdSSR, Großbritanniens, Frankreichs und Chinas; erwartete Entwicklung der Übereinstimmung von UdSSR und China; erwartete Entwicklung des wirtschaftlichen und militärischen Wettstreits zwischen den Großmächten; Beitrag der Nato zur europäischen Sicherheit; Nato-Beitrag des eigenen Landes; Vertrauen in die Nato; Beurteilung des Ausgangs der Pariser Gipfelkonferenz und Einschätzung der Konzessionsbereitschaft der Teilnehmer; Einstellung zu Zugeständnissen der Westmächte in der Berlin-Frage; Vergleich des derzeitigen Stands und der zukünftigen Entwicklung der Wissenschaft, des Militärs, des Lebensstandards, der industriellen und agrarischen Produktion, der Wohlfahrt, der Technik, der Medizin und der Raumfahrt in den USA und der UdSSR; Einschätzung der Verhaftetheit der amerikanischen sowie der sowjetischen Bevölkerung in den jeweiligen Grundideen und Einschätzung der Bereitschaft der Völker, sich für diese Überzeugung einzusetzen; Beurteilung der Zukunftsaussichten der beiden Wirtschaftssysteme; Fernsehhäufigkeit in den Abendstunden; TV-Besitz; Anzahl der fernsehenden Erwachsenen am Nachmittag sowie am Abend; Kinobesuch; Einschätzung des Einflusses ausländischer Filme auf das eigene Land; Eindruck von Amerikanern (Touristen, Studenten, Geschäftsleuten, Musikern, Politikern), die im eigenen Land aufgetreten sind; Einschätzung des Einflusses amerikanischer Zeitschriften, Bücher, Filme, Fernsehprogramme, der Stimme Amerikas und des Jazz auf das eigene Land; Einstellung zur Stationierung amerikanischer Truppen im Lande und Beurteilung deren Verhaltens; wichtigste Informationsquellen über die USA; wahrgenommene Differenzen zwischen amerikanischer und britischer Übermittlung von Nachrichten und Informationen; vertrauenvollste Nachrichtenquelle; Einstellung zum Bau von Atomwaffen durch Frankreich und zum Atombombenversuch in der Sahara; die Bedeutung des Chruschtschowsbesuchs in Frankreich für den Weltfrieden. Außer in Großbritannien wurde gefragt: Mediennutzung in Form einer detaillierten Erfassung der Häufigkeit des Hörens ausländischer Radiosender (BBC, BFN, AFN) sowie der Stimme Amerikas; Selbsteinschätzung der Englischkenntnisse und Beurteilung der Verständlichkeit der Rundfunksprecher; Gewerkschaftsmitgliedschaft; Interviewdauer. In Frankreich wurde zusätzlich gefragt: Kraftfahrzeugbesitz; Radiobesitz; Hausbesitz. In Deutschland wurde zusätzlich gefragt: Anzahl der Kontaktversuche; Kooperationsbereitschaft des Befragten. In Italien wurde zusätzlich gefragt: Interviewort; Interviewtag.

  10. e

    Media content in a multi-platform context, 2013-2015 - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Apr 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Media content in a multi-platform context, 2013-2015 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/3041041e-d409-576a-82f6-ae0d9a071e15
    Explore at:
    Dataset updated
    Apr 29, 2023
    Description

    Dataset resulting from media content analysis on how the adoption of a multi-platform outlook is affecting the diversity of content output in the UK. A comparison was made how the composition of media output has changed over time during the period 2013-2015, and how strategies have moved towards multi-platform delivery. Data for content analysis was collected from eleven case study media organisations across the newspaper and magazine publishing and broadcasting sectors. Media platforms include print, online and mobile. Data were gathered from publicly available online websites, broadcasting schedules, newspaper and magazine printed and digital editions. Coding and analysis was carried out of a sample of content outputs for selected case studies in the broadcasting, newspaper and magazine sectors in spring 2013, 2014 and 2015. The resulting dataset provides a basis for preliminary comparative analysis, across organisations, sectors of the media and time, of how the composition of media content outputs has changed while suppliers of media have migrated towards multi-platform delivery. This project considers aspects of transformations taking place in the media industry as a result of digital convergence and growth of the internet. The study ran from July 2012 to 2015 and was led by PI Professor Gillian Doyle with a team comprising Co-I Professor Philip Schlesinger and RA Dr Katherine Champion at CCPR, University of Glasgow. It set out to analyse the recent migration of media businesses towards diversified digital distribution and multi-platform growth strategies and the impact this has had on economic efficiency, the organisation of production, and on the nature and diversity of content. What challenges are faced by public policy? Using key a multiple case study approach, the investigation covered the following: economic opportunities and advantages created by multi-platform expansion the role of convergent digital technologies and the internet in encouraging such strategies the impact of multi-platform on production of media content and on diversity and pluralism implications for public policy and regulation. Data was collected from eight case study media organisations drawn from newspaper and magazine publishing and broadcasting. Selected content bundles (a newspaper/ magazine title or broadcasting channel) from case study organisations were explored which included two newspaper titles, The Financial Times and The Telegraph; three broadcast channels, BBC One, MTV, STV; and three magazine titles, Elle UK, T3 and NME. Three further case studies Total Film (Future Publishing), BBC Three and ITV (were added in phase two and phase three (conducted in springs 2014 and 2015) for reasons relating to the wider project, namely relating to issues around access for the collection of interview data. The analysis focused on the programme/ story-type diversity or the range of different shows or stories available. For the purposes of manageability, selected categories or genres of content were primarily focused on, for example ‘celebrity’ from Elle UK and ‘UK companies’ from The Financial Times. All content published, during the chosen periods/parameters, across selected platforms (print, online and mobile) was recorded in an excel spreadsheet. In order to categorise content within the print case studies, story type codes were ascribed. In print, a story value was ascribed linked to the length of the article and the presence or absence of particular features (for example photos or video). In relation to broadcasting, a programme value was attributed based on the length of the programme in the linear or on-demand transmission or in relation to the presence or absence of particular features on the ancillary website.

  11. Football Events

    • kaggle.com
    zip
    Updated Jan 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alin Secareanu (2017). Football Events [Dataset]. http://www.kaggle.com/secareanualin/football-events/home
    Explore at:
    zip(22142158 bytes)Available download formats
    Dataset updated
    Jan 25, 2017
    Authors
    Alin Secareanu
    Description

    Context

    Most publicly available football (soccer) statistics are limited to aggregated data such as Goals, Shots, Fouls, Cards. When assessing performance or building predictive models, this simple aggregation, without any context, can be misleading. For example, a team that produced 10 shots on target from long range has a lower chance of scoring than a club that produced the same amount of shots from inside the box. However, metrics derived from this simple count of shots will similarly asses the two teams.

    A football game generates much more events and it is very important and interesting to take into account the context in which those events were generated. This dataset should keep sports analytics enthusiasts awake for long hours as the number of questions that can be asked is huge.

    Content

    This dataset is a result of a very tiresome effort of webscraping and integrating different data sources. The central element is the text commentary. All the events were derived by reverse engineering the text commentary, using regex. Using this, I was able to derive 11 types of events, as well as the main player and secondary player involved in those events and many other statistics. In case I've missed extracting some useful information, you are gladly invited to do so and share your findings. The dataset provides a granular view of 9,074 games, totaling 941,009 events from the biggest 5 European football (soccer) leagues: England, Spain, Germany, Italy, France from 2011/2012 season to 2016/2017 season as of 25.01.2017. There are games that have been played during these seasons for which I could not collect detailed data. Overall, over 90% of the played games during these seasons have event data.

    The dataset is organized in 3 files:

    • events.csv contains event data about each game. Text commentary was scraped from: bbc.com, espn.com and onefootball.com
    • ginf.csv - contains metadata and market odds about each game. odds were collected from oddsportal.com
    • dictionary.txt contains a dictionary with the textual description of each categorical variable coded with integers

    Past Research

    I have used this data to:

    • create predictive models for football games in order to bet on football outcomes.
    • make visualizations about upcoming games
    • build expected goals models and compare players

    Inspiration

    There are tons of interesting questions a sports enthusiast can answer with this dataset. For example:

    • What is the value of a shot? Or what is the probability of a shot being a goal given it's location, shooter, league, assist method, gamestate, number of players on the pitch, time - known as expected goals (xG) models
    • When are teams more likely to score?
    • Which teams are the best or sloppiest at holding the lead?
    • Which teams or players make the best use of set pieces?
    • In which leagues is the referee more likely to give a card?
    • How do players compare when they shoot with their week foot versus strong foot? Or which players are ambidextrous?
    • Identify different styles of plays (shooting from long range vs shooting from the box, crossing the ball vs passing the ball, use of headers)
    • Which teams have a bias for attacking on a particular flank?

    And many many more...

  12. w

    Dataset of books called Doctor Who and the auton invasion : based on the BBC...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Doctor Who and the auton invasion : based on the BBC television serial Doctor Who and the spearhead from space by Robert Holmes ... [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Doctor+Who+and+the+auton+invasion+%3A+based+on+the+BBC+television+serial+Doctor+Who+and+the+spearhead+from+space+by+Robert+Holmes+...
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Doctor Who and the auton invasion : based on the BBC television serial Doctor Who and the spearhead from space by Robert Holmes .... It features 7 columns including author, publication date, language, and book publisher.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Institute of Standards and Technology (2022). Deep Video Understanding Annotations Dataset [Dataset]. https://catalog.data.gov/dataset/deep-video-understanding-annotations-dataset
Organization logo

Deep Video Understanding Annotations Dataset

Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description

The BBC Land Girls TV series is a 3 season series. Each season is 5 episodes of about 45mins each. The TRECVID group at NIST worked with the BBC Corp. to release the dataset to the research community to work on video understanding tasks. Unfortunately, the hosting arrangement for the dataset was not successful and the release of the video dataset couldn't be done. We are releasing the annotations conducted by NIST, without any video data, so that the researchers interested in working on knowledge graph understanding and natural language analysis can take advantage of them.

Search
Clear search
Close search
Google apps
Main menu