MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Please request access to receive a free sample of 1000 reviews as well as information about larger datasets or custom datasets. Contact us at insights@fantastic.app for more details and questions. Visit https://fantastic.app to learn more about how our dataset is created.
What problem is Fantastic solving? Personalization is no longer a nice to have for today's consumers, it is now a necessity. According to a recent study by Mckinsey "Seventy-one percent of consumers expect companies to… See the full description on the dataset page: https://huggingface.co/datasets/fantasticInsights/fantastic_united_states_consumer_interest_dataset_fall_2017_to_winter_2024_sample.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual asian student percentage from 1992 to 2023 for Read School vs. Connecticut and Bridgeport School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual white student percentage from 1991 to 2023 for Read School vs. Connecticut and Bridgeport School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”
A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org
Please cite this when using the dataset.
Detailed description of the dataset:
1 Film Dataset: Festival Programs
The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.
The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.
The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.
The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.
2 Survey Dataset
The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.
The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.
The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.
The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.
3 IMDb & Scripts
The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.
The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.
The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.
The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.
The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.
The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.
The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.
The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.
The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.
The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.
The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.
The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.
The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.
The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.
The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.
The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.
The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.
The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.
The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.
4 Festival Library Dataset
The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.
The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories, units of measurement, data sources and coding and missing data.
The csv file “4_festival-library_dataset_imdb-and-survey” contains data on all unique festivals collected from both IMDb and survey sources. This dataset appears in wide format, all information for each festival is listed in one row. This
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual black student percentage from 1991 to 2023 for Read School vs. Connecticut and Bridgeport School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual hispanic student percentage from 1991 to 2023 for Read School vs. Connecticut and Bridgeport School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Design Review Equity Areas’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/089df353-bbe6-4256-ad9a-03263e375631 on 27 January 2022.
--- Dataset description provided by original source is as follows ---
Design Review Equity Areas are areas of Seattle where applicants for development projects going through the City’s Design Review program are required to work with staff from the Department of Neighborhoods (DON) to customize their community outreach plan to the needs of historically underrepresented communities.
Equity Areas are identified based on local demographic and socioeconomic characteristics from the US Census Bureau. Equity Areas are census tracts having a census-tract average greater than the city-as-a-whole average for at least two of the following characteristics:
1. Limited English proficiency, identified as percentage of households that
are linguistically isolated households.
2. People of Color, identified as percentage of the population that is not non-Hispanic white; and
3. Income, identified as percentage of population with income below 200% of the federal poverty level.
For more information please see 'http://www.seattle.gov/dpd/codes/dr/DR2018-4.pdf'>Director’s Rule for Early Community Outreach for Design Review. Additional resources and FAQs are available on 'https://www.seattle.gov/neighborhoods/outreach-and-engagement/design-review-for-early-outreach/dr_faq_don'>DON’s Early Community Outreach webpage.
Data Source: US Census Bureau’s 'https://www.census.gov/programs-surveys/acs/'>American Community Survey 2016 Five-Year Estimates.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual two or more races student percentage from 2019 to 2023 for Read School vs. Connecticut and Bridgeport School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
dataPOEM.csv
The dataPOEM.csv data set contains data on the level of each poem.
scoresAes = factor scores of moving, beauty, and melodious ratings.
participant = participant number
poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter)
poemIdentity = poem number
avgWFreq = average word frequency of poem
totalGazeSlopeLineLength
totalGazeWordMeanNAByWordLen
totalGazeWordMeanNADiff
order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor)
firstFixDurMS_MINFIX_AVG = first fixation duration
totalGazeMS_MINFIX_AVG = total gaze durations
fixDurMS_MINFIX_NUM = number of fixations
sacLenMS_MINFIX_AVG = average saccade length
percRegMS_MINFIX_AVG = percentage of regressive eye movements
pupilDial_AVG = average pupil dilation
blink_NUM_TotalRT = number of blinks relative to total reading time
totalReadingTime = total reading time of the poem
areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire
dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem
moving = rating of how moving the poem was
beauty = rating of how beautiful the poem was
melodious = rating of how melodious the poem was
dataROI.csv
The dataROI.csv data set contains data on the level of each line within a poem.
order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor)
participant = participant number
poemIdentity = poem number
lineNr = line number within poem
poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter)
verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza)
BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme)
lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem)
totalGazeByWordNA = total gaze duration of final word of a line relative to word length
gazeByLineLengthNA = total gaze duration of a line relative to line length
dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical Dataset of Read Elementary School is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (1987-2023),Total Classroom Teachers Trends Over Years (1990-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (1990-2023),American Indian Student Percentage Comparison Over Years (1988-2021),Asian Student Percentage Comparison Over Years (1991-2023),Hispanic Student Percentage Comparison Over Years (1988-2023),Black Student Percentage Comparison Over Years (1988-2023),White Student Percentage Comparison Over Years (1991-2023),Two or More Races Student Percentage Comparison Over Years (2013-2023),Diversity Score Comparison Over Years (1993-2023),Free Lunch Eligibility Comparison Over Years (1988-2023),Reduced-Price Lunch Eligibility Comparison Over Years (2000-2023),Reading and Language Arts Proficiency Comparison Over Years (2011-2022),Math Proficiency Comparison Over Years (2011-2022),Science Proficiency Comparison Over Years (2021-2022),Overall School Rank Trends Over Years (2011-2022)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual black student percentage from 1988 to 2023 for Read Elementary School vs. Wisconsin and Oshkosh Area School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual white student percentage from 1991 to 2023 for Read Elementary School vs. Wisconsin and Oshkosh Area School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual asian student percentage from 1991 to 2023 for Read Elementary School vs. Wisconsin and Oshkosh Area School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual hispanic student percentage from 1988 to 2023 for Read Elementary School vs. Wisconsin and Oshkosh Area School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual two or more races student percentage from 2013 to 2023 for Read Elementary School vs. Wisconsin and Oshkosh Area School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual american indian student percentage from 1988 to 2021 for Read Elementary School vs. Wisconsin and Oshkosh Area School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual black student percentage from 1991 to 2023 for Reading School District vs. Massachusetts
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual white student percentage from 1991 to 2023 for Reading Sr. High School vs. Pennsylvania and Reading School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual white student percentage from 1992 to 2023 for Reading-fleming Intermediate School vs. New Jersey and Flemington-Raritan Regional School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual white student percentage from 1992 to 2023 for Port Reading Avenue Elementary School vs. New Jersey and Woodbridge Township School District
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Please request access to receive a free sample of 1000 reviews as well as information about larger datasets or custom datasets. Contact us at insights@fantastic.app for more details and questions. Visit https://fantastic.app to learn more about how our dataset is created.
What problem is Fantastic solving? Personalization is no longer a nice to have for today's consumers, it is now a necessity. According to a recent study by Mckinsey "Seventy-one percent of consumers expect companies to… See the full description on the dataset page: https://huggingface.co/datasets/fantasticInsights/fantastic_united_states_consumer_interest_dataset_fall_2017_to_winter_2024_sample.