21 datasets found
  1. Average daily time spent on social media worldwide 2012-2025

    • statista.com
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

  2. F

    Healthcare Call Center Speech Data: Spanish (USA)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Healthcare Call Center Speech Data: Spanish (USA) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US Spanish Call Center Speech Dataset for the Healthcare domain designed to enhance the development of call center speech recognition models specifically for the Healthcare industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Healthcare domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native US Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of USA, ensuring a balanced representation of US accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgery Consultation
    Consultation regarding Diet, and many more
    Outbound Calls:
    Appointment Reminder
    Health and Wellness Subscription Programs
    Lab Tests Results
    Health Risk Assessments
    Preventive Care Reminders, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Healthcare domain call center conversational AI and ASR models for the US Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of US Spanish call center speech recognition models.

    Usage and Applications

    This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Healthcare domain. Potential use cases include:

  3. F

    Telecom Call Center Speech Data: Spanish (USA)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Telecom Call Center Speech Data: Spanish (USA) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US Spanish Call Center Speech Dataset for the Telecom domain designed to enhance the development of call center speech recognition models specifically for the Telecom industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Telecom domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native US Spanish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of USA, ensuring a balanced representation of US accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refunds and Billing Adjustments
    Emergency Service Access, and many more
    Outbound Calls:
    Welcome Calls / Onboarding Process
    Payment Reminders
    Customer Surveys
    Technical Updates
    Service Usage Reviews
    Network Compliant Status Call, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Telecom domain call center conversational AI and ASR models for the US Spanish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    <b

  4. d

    Data from: Average Well Color Development (AWCD) data based on Community...

    • catalog.data.gov
    • data.usgs.gov
    • +4more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Average Well Color Development (AWCD) data based on Community Level Physiological Profiling (CLPP) of soil samples from 120 point locations within limestone cedar glades at Stones River National Battlefield near Murfreesboro, Tennessee [Dataset]. https://catalog.data.gov/dataset/average-well-color-development-awcd-data-based-on-community-level-physiological-profiling-
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Murfreesboro, Tennessee
    Description

    This dataset contains data collected within limestone cedar glades at Stones River National Battlefield (STRI) near Murfreesboro, Tennessee. This dataset contains information on soil microbial metabolic response for soil samples obtained from certain quadrat locations (points) within 12 selected cedar glades. This information derives from substrate utilization profiles based on Biolog EcoPlates (Biolog, Inc., Hayward, CA, USA) which were inoculated with soil slurries containing the entire microbial community present in each soil sample. EcoPlates contain 31 sole-carbon substrates (present in triplicate on each plate) and one blank (control) well. Once the microbial community from a soil sample is inoculated onto the plates, the plates are incubated and absorbance readings are taken at intervals.For each quadrat location (point), one soil sample was obtained under sterile conditions, using a trowel wiped with methanol and rinsed with distilled water, and was placed into an autoclaved jar with a tight-fitting lid and placed on ice. Soil samples were transported to lab facilities on ice and immediately refrigerated. Within 24 hours after being removed from the field, soil samples were processed for community level physiological profiling (CLPP) using Biolog EcoPlates. First, for each soil sample three measurements were taken of gravimetric soil water content using a Mettler Toledo HB43 halogen moisture analyzer (Mettler Toledo, Columbus, OH, USA) and the mean of these three SWC measurements was used to calculate the 10-gram dry weight equivalent (DWE) for each soil sample. For each soil sample, a 10-gram DWE of fresh soil was added to 90 milliliters of sterile buffer solution in a 125-milliliter plastic bottle to make the first dilution. Bottles were agitated on a wrist-action shaker for 20 minutes, and a 10-milliliter aliquot was taken from each sample using sterilized pipette tips and added to 90 milliliters of sterile buffer solution to make the second dilution. The bottle containing the second dilution for each sample was agitated for 10 seconds by hand, poured into a sterile tray, and the second dilution was inoculated directly onto Biolog EcoPlates using a sterilized pipette set to deliver 150 microliters into each well. Each plate was immediately covered, placed in a covered box and incubated in the dark at 25 degrees Celcius. Catabolism of each carbon substrate produced a proportional color change response (from the color of the inoculant to dark purple) due to the activity of the redox dye tetrazolium violot (present in all wells including blanks). Plates were read at intervals of 24 hours, 48 hours, 72 hours, 96 hours and 120 hours after inoculation using a Biolog MicroStation plate reader (Biolog, Inc., Hayward, CA, USA) reading absorbance at 590 nanometers.For each soil sample and at each incubation time point, average well color development (AWCD) was calculated according to the equation:AWCD = [Σ (C – R)] / n where C represents the absorbance value of control wells (mean of 3 controls), R is the mean absorbance of the response wells (3 wells per carbon substrate), and n is the number of carbon substrates (31 for EcoPlates). For each soil sample, an incubation curve was constructed using AWCD values from 48 hours to 120 hours, and the area under this incubation curve was calculated. The numeric values contained in the fields of this dataset represent areas under these AWCD incubation curves from 48 hours to 120 hours. Detailed descriptions of experimental design, field data collection procedures, laboratory procedures, and data analysis are presented in Cartwright (2014).References:Cartwright, J. (2014). Soil ecology of a rock outcrop ecosystem: abiotic stresses, soil respiration, and microbial community profiles in limestone cedar glades. Ph.D. dissertation, Tennessee State University.Cofer, M., Walck, J., and Hidayati, S. (2008). Species richness and exotic species invasion in Middle Tennessee cedar glades in relation to abiotic and biotic factors. The Journal of the Torrey Botanical Society, 135(4), 540–553.Garland, J., & Mills, A. (1991). Classification and characterization of heterotrophic microbial communities on the basis of patterns of community-level sole-carbon-source utilization. Applied and environmental microbiology, 57(8), 2351–2359.Garland, J. (1997). Analysis and interpretation of community‐level physiological profiles in microbial ecology. FEMS Microbiology Ecology, 24, 289–300.Hackett, C. A., & Griffiths, B. S. (1997). Statistical analysis of the time-course of Biolog substrate utilization. Journal of Microbiological Methods, 30(1), 63–69.Insam, H. (1997). A new set of substrates proposed for community characterization in environmental samples. In H. Insam & A. Rangger (Eds.), Microbial Communities: Functional versus Structural Approaches(pp. 259–260). New York: Springer.Preston-Mafham, J., Boddy, L., & Randerson, P. F. (2002). Analysis of microbial community functional diversity using sole-carbon-source utilisation profiles - a critique. FEMS microbiology ecology, 42(1), 1–14. doi:10.1111/j.1574-6941.2002.tb00990.x

  5. P

    MeetingBank Dataset

    • paperswithcode.com
    Updated Oct 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yebowen Hu; Tim Ganter; Hanieh Deilamsalehy; Franck Dernoncourt; Hassan Foroosh; Fei Liu (2024). MeetingBank Dataset [Dataset]. https://paperswithcode.com/dataset/meetingbank
    Explore at:
    Dataset updated
    Oct 28, 2024
    Authors
    Yebowen Hu; Tim Ganter; Hanieh Deilamsalehy; Franck Dernoncourt; Hassan Foroosh; Fei Liu
    Description

    MeetingBank, a benchmark dataset created from the city councils of 6 major U.S. cities to supplement existing datasets.

    It contains 1,366 meetings with over 3,579 hours of video, as well as transcripts, PDF documents of meeting minutes, agenda, and other metadata. On average, a council meeting is 2.6 hours long and its transcript contains over 28k tokens, making it a valuable testbed for meeting summarizers and for extracting structure from meeting videos. The datasets contains 6,892 segment-level summarization instances for training and evaluating of performance.

  6. h

    wayuu_CO_test

    • huggingface.co
    Updated Mar 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Pablo Correa (2024). wayuu_CO_test [Dataset]. https://huggingface.co/datasets/orkidea/wayuu_CO_test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 5, 2024
    Authors
    Juan Pablo Correa
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Audio Duration

    The dataset consists of 810 audio recordings, each accompanied by its respective transcription. The lexical corpus encompasses approximately 1,000 unique words.

    Total Audio Duration: 2801 seconds (approximately 34 minutes) Average Audio Duration: 3.41 seconds

    The dataset offers valuable insights into the Wayuunaiki language's phonetic and linguistic characteristics. It's important to note that the dataset originates from recordings and transcriptions of the… See the full description on the dataset page: https://huggingface.co/datasets/orkidea/wayuu_CO_test.

  7. F

    US Spanish Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). US Spanish Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native US Spanish speakers from our verified contributor community.
    Regions: Representing different provinces across USA to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

  8. h

    MeetingBank-transcript-de

    • huggingface.co
    Updated Apr 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alio Leuchtmann (2025). MeetingBank-transcript-de [Dataset]. https://huggingface.co/datasets/AlioLeuchtmann/MeetingBank-transcript-de
    Explore at:
    Dataset updated
    Apr 20, 2025
    Authors
    Alio Leuchtmann
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset consists of transcripts from the MeetingBank dataset. Overview MeetingBank, a benchmark dataset created from the city councils of 6 major U.S. cities to supplement existing datasets. It contains 1,366 meetings with over 3,579 hours of video, as well as transcripts, PDF documents of meeting minutes, agenda, and other metadata. On average, a council meeting is 2.6 hours long and its transcript contains over 28k tokens, making it a valuable testbed for meeting summarizers and for… See the full description on the dataset page: https://huggingface.co/datasets/AlioLeuchtmann/MeetingBank-transcript-de.

  9. c

    Travel Time to Work

    • data.ccrpc.org
    csv
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Champaign County Regional Planning Commission (2024). Travel Time to Work [Dataset]. https://data.ccrpc.org/dataset/travel-time-to-work
    Explore at:
    csv(677)Available download formats
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Champaign County Regional Planning Commission
    Description

    The Travel Time to Work indicator compares the mean, or average, commute time for Champaign County residents to the mean commute time for residents of Illinois and the United States as a whole. On its own, mean travel time of all commuters on all mode types could be reflective of a number of different conditions. Congestion, mode choice, changes in residential patterns, changes in the location of major employment centers, and changes in the transit network can all impact travel time in different and often conflicting ways. Since the onset of the COVID-19 pandemic in 2020, the workplace location (office vs. home) is another factor that can impact the mean travel time of an area. We don’t recommend trying to draw any conclusions about conditions in Champaign County, or anywhere else, based on mean travel time alone.

    However, when combined with other indicators in the Mobility category (and other categories), mean travel time to work is a valuable measure of transportation behaviors in Champaign County.

    Champaign County’s mean travel time to work is lower than the mean travel time to work in Illinois and the United States. Based on this figure, the state of Illinois has the longest commutes of the three analyzed areas.

    The year-to-year fluctuations in mean travel time have been statistically significant in the United States since 2014, and in Illinois in 2021 and 2022. Champaign County’s year-to-year fluctuations in mean travel time were statistically significant from 2021 to 2022, the first time since this data first started being tracked in 2005.

    Mean travel time data was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.

    As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

    Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

    For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Travel Time to Work.

    Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (16 October 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (10 October 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (17 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (29 March 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (29 March 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).

  10. d

    National Solar Radiation Database (NSRDB)

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Sep 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2024). National Solar Radiation Database (NSRDB) [Dataset]. https://catalog.data.gov/dataset/national-solar-radiation-database-nsrdb
    Explore at:
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    National Renewable Energy Laboratory
    Description

    The National Solar Radiation Database (NSRDB) is a serially complete collection of meteorological and solar irradiance data sets for the United States and a growing list of international locations for 1998-2023. The NSRDB is updated annually and provides foundational information to support U.S. Department of Energy programs, research, industry and the general public. The NSRDB provides time-series data at 30-minute resolution of resource averaged over surface cells of 0.038 degrees in both latitude and longitude, or nominally 4 km in size. Additionally time series data at 5 minutes for the US and 10 minutes for North, Central and South America at 2 km resolution are produced from the next generation of GOES satellites and made available from 2019. The solar radiation values represent the resource available to solar energy systems. The data was created using cloud properties which are generated using the AVHRR Pathfinder Atmospheres-Extended (PATMOS-x) algorithms developed by the University of Wisconsin. Fast all-sky radiation model for solar applications (FARMS) in conjunction with the cloud properties, and aerosol optical depth (AOD) and precipitable water vapor (PWV) from ancillary source are used to estimate solar irradiance (GHI, DNI, and DHI). The Global Horizontal Irradiance (GHI) is computed for clear skies using the REST2 model. For cloud scenes identified by the cloud mask, FARMS is used to compute GHI and FARMS DNI is used to compute the Direct Normal Irradiance (DNI). The PATMOS-X model uses radiance images in visible and infrared channels from the Geostationary Operational Environmental Satellite (GOES) series of geostationary weather satellites. Ancillary variables needed to run REST2 and FARMS (e.g., aerosol optical depth, precipitable water vapor, and albedo) are derived from NASA's Modern Era-Retrospective Analysis (MERRA-2) dataset. Temperature and wind speed data are also derived from MERRA-2 and provided for use in NREL's System Advisor Model (SAM) to compute PV generation.

  11. o

    Average Commute Time by County

    • documentation-resources.opendatasoft.com
    • data.wu.ac.at
    csv, excel, geojson +1
    Updated Mar 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Average Commute Time by County [Dataset]. https://documentation-resources.opendatasoft.com/explore/dataset/average-commute-time-by-county/
    Explore at:
    geojson, excel, csv, jsonAvailable download formats
    Dataset updated
    Mar 26, 2017
    Description

    Average commute time in each U.S. county in minutes.This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.

  12. T

    Vital Signs: Commute Time (by Place of Residence) – by county (2022)

    • data.bayareametro.gov
    application/rdfxml +5
    Updated Jul 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Vital Signs: Commute Time (by Place of Residence) – by county (2022) [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Commute-Time-by-Place-of-Residence-by-/5bqp-dsj6
    Explore at:
    application/rssxml, csv, json, xml, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Jul 1, 2022
    Description

    VITAL SIGNS INDICATOR
    Commute Time (T3)

    FULL MEASURE NAME
    Commute time by residential location

    LAST UPDATED
    January 2023

    DESCRIPTION
    Commute time refers to the average number of minutes a commuter spends traveling to work on a typical day. The dataset includes metropolitan area, county, city, and census tract tables by place of residence.

    DATA SOURCE
    U.S. Census Bureau: Decennial Census (1980-2000) - via MTC/ABAG Bay Area Census - http://www.bayareacensus.ca.gov/transportation.htm

    U.S. Census Bureau: American Community Survey - https://data.census.gov/
    2006-2021
    Form C08136
    Form C08536
    Form B08301
    Form B08301
    Form B08301

    CONTACT INFORMATION
    vitalsigns.info@bayareametro.gov

    METHODOLOGY NOTES (across all datasets for this indicator)
    For the decennial Census datasets, breakdown of commute times was unavailable by mode; only overall data could be provided on a historical basis.

    For the American Community Survey (ACS) datasets, 1-year rolling average data was used for all metros, region and county geographic levels, while 5-year rolling average data was used for cities and tracts. This is due to the fact that more localized data is not included in the 1-year dataset across all Bay Area cities. Similarly, modal data is not available for every Bay Area city or census tract, even when the 5-year data is used for those localized geographies.

    Regional commute times were calculated by summing aggregate county travel times and dividing by the relevant population; similarly, modal commute times were calculated using aggregate times and dividing by the number of communities choosing that mode for the given geography.

    Census tract data is not available for tracts with insufficient numbers of residents. The metropolitan area comparison was performed for the nine-county San Francisco Bay Area in addition to the primary metropolitan statistical areas (MSAs) for the nine other major metropolitan areas.

  13. F

    American English Call Center Data for Delivery & Logistics AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English Call Center Data for Delivery & Logistics AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US English Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

    Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native US English speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

    Participant Diversity:
    Speakers: 60 native US English speakers from our verified contributor pool.
    Regions: Multiple provinces of United States of America for accent and dialect diversity.
    Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.
    Call Duration: 5 to 15 minutes on average.
    Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in clean, noise-free, echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

    Inbound Calls:
    Order Tracking
    Delivery Complaints
    Undeliverable Addresses
    Return Process Enquiries
    Delivery Method Selection
    Order Modifications, and more
    Outbound Calls:
    Delivery Confirmations
    Subscription Offer Calls
    Incorrect Address Follow-ups
    Missed Delivery Notifications
    Delivery Feedback Surveys
    Out-of-Stock Alerts, and others

    This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

    Transcription

    All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, noise)
    High transcription accuracy with word error rate under 5% via dual-layer quality checks.

    These transcriptions support fast, reliable model development for English voice AI applications in the delivery sector.

    Metadata

    Detailed metadata is included for each participant and conversation:

    Participant Metadata: ID, age, gender, region, accent, dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

    This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

    Usage and Applications

    <p

  14. T

    United States Retail Sales YoY

    • tradingeconomics.com
    • pt.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, United States Retail Sales YoY [Dataset]. https://tradingeconomics.com/united-states/retail-sales-annual
    Explore at:
    json, xml, csv, excelAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1993 - May 31, 2025
    Area covered
    United States
    Description

    Retail Sales in the United States increased 3.30 percent in May of 2025 over the same month in the previous year. This dataset provides - United States Retail Sales YoY - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  15. h

    sf20k

    • huggingface.co
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ridouane Ghermi (2024). sf20k [Dataset]. https://huggingface.co/datasets/rghermi/sf20k
    Explore at:
    Dataset updated
    Jun 28, 2024
    Authors
    Ridouane Ghermi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Long Story Short: Story-level Video Understanding from 20K Short Films

    Website   Code   arXiv   Paper

      Dataset Summary
    

    SF20K the largest publicly available movie dataset. It contains 20,143 amateur films, totaling 3,582 hours of video content, with each video lasting on average 11 minutes.

      Subsets
    

    SF20K-Train: The train set, containing synthetic questions. SF20K-Test: The test benchmark, containing manually curated questions generated from movie synopses.… See the full description on the dataset page: https://huggingface.co/datasets/rghermi/sf20k.

  16. d

    Johns Hopkins COVID-19 Case Tracker

    • data.world
    csv, zip
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 2, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  17. d

    2015-2016 Physical Education - PE Instruction - District Level

    • catalog.data.gov
    Updated Nov 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). 2015-2016 Physical Education - PE Instruction - District Level [Dataset]. https://catalog.data.gov/dataset/2015-2016-physical-education-pe-instruction-district-level
    Explore at:
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    Background, Methodology: Local Law 102 enacted in 2015 requires the Department of Education of the New York City School District to submit to the Council an annual report concerning physical education for the prior school year. This report provides information about average frequency and average total minutes per week of physical education as defined in Local Law 102 as reported through the 2015-2016 STARS database. It is important to note that schools self-report their scheduling information in STARS. The report also includes information regarding the number and ratio of certified physical education instructors and designated physical education instructional space. This report consists of six tabs: PE Instruction Borough-Level PE Instruction District-Level PE Instruction School-Level Certified PE Teachers PE Space Supplemental Programs PE Instruction Borough-Level This tab includes the average frequency and average total minutes per week of physical education by borough, disaggregated by grade, race and ethnicity, gender, special education status and English language learner status. This report only includes students who were enrolled in the same school across all academic terms in the 2015-16 school year. Data on students with disabilities and English language learners are as of the end of the 2015-16 school year. Data on adaptive PE is based on individualized education programs (IEP) finalized on or before 05/31/2016. PE Instruction District-Level This tab includes the average frequency and average total minutes per week of physical education by district, disaggregated by grade, race and ethnicity, gender, special education status and English language learner status. This report only includes students who were enrolled in the same school across all academic terms in the 2015-16 school year. Data on students with disabilities and English language learners are as of the end of the 2015-16 school year. Data on adaptive PE is based on individualized education programs (IEP) finalized on or before 05/31/2016. PE Instruction School-Level This tab includes the average frequency and average total minutes per week of physical education by school, disaggregated by grade, race and ethnicity, gender, special education status and English language learner status. This report only includes students who were enrolled in the same school across all academic terms in the 2015-16 school year. Data on students with disabilities and English language learners are as of the end of the 2015-16 school year. Data on adaptive PE is based on individualized education programs (IEP) finalized on or before 05/31/2016. Certified PE Teachers This tab provides the number of designated full-time and part-time physical education certified instructors. Does not include elementary, early childhood and K-8 physical education teachers that provide physical education instruction under a common branches license. Also includes ratio of full time instructors teaching in a physical education license to students by school. Data reported is for the 2015-2016 school year as of 10/31/2015. PE Space This tab provides information on all designated indoor, outdoor and off-site spaces used by the school for physical education as reported through the Principal Annual Space Survey and the Outdoor Yard Report. It is important to note that information on each room category is self-reported by principals, and principals determine how each room is classified. Data captures if the PE space is co-located, used by another school or used for another purpose. Includes gyms, athletic fields, auxiliary exercise spaces, dance rooms, field houses, multipurpose spaces, outdoor yards, off-site locations, playrooms, swimming pools and weight rooms as designated PE Space. Supplemental Programs This tab provides information on the department's supplemental physical education

  18. u

    Employer pay equity statement (DEMES) - Catalogue - Canadian Urban Data...

    • beta.data.urbandatacentre.ca
    • data.urbandatacentre.ca
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Employer pay equity statement (DEMES) - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://beta.data.urbandatacentre.ca/dataset/gov-canada-e4858612-edf7-408f-b54c-efb18b43d101
    Explore at:
    Dataset updated
    Sep 13, 2024
    Area covered
    Canada
    Description

    La CNESST listens to your data needs! Answer our questionnaire.## Do you use or have specific needs in connection with [CNESST] data (https://www.cnesst.gouv.qc.ca/fr)? Take a few minutes to answer the following questionnaire: Consultation on your CNESST data needs To communicate with us about: - the questionnaire, write to us about: - the questionnaire, write to us at donnees.ouvertes@cnesst.gouv.qc.ca. - of a technical problem, write to us at consultation@cnesst.gouv.qc.ca or call us on 1 donnees.ouvertes@cnesst.gouv.qc.ca 866 216-7918, specifying the title of the consultation. Thanks for your participation! # # The dataset shows the information that employers subject to the Pay Equity Act (LES) have entered in their most recent Employer Pay Equity Statement (DEMES) . Through DEMES, employers report on the progress of the application of the LES to the Commission on Standards, Equity, Health and Safety at Work CNESST — Pay Equity. The LES applies to businesses that reach an average of ten or more workers during a calendar year. An employer has various obligations in connection with the LES: * First, carry out an initial pay equity exercise and post the results (DATE_AFFICH_EXERC) in his company within four years of being subject to the LES (DATE_PREV_EXERC). * Then, assess the maintenance of pay equity, every five years, and post the results (DATE_AFFICH_MAINT) in his company. These evaluations must be carried out on a fixed date (DATE_PREV_MAINT). An empty posting date means that the employer has not indicated that it has fulfilled this obligation. To ensure privacy and the protection of personal information, businesses are listed using an anonymous numerical identifier (ID), created for the dataset.

  19. F

    American English Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US English Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native US English speakers from our verified contributor community.
    Regions: Representing different provinces across United States of America to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for English real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  20. h

    TimeChat-Online-139K

    • huggingface.co
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wyc (2025). TimeChat-Online-139K [Dataset]. https://huggingface.co/datasets/wyccccc/TimeChat-Online-139K
    Explore at:
    Dataset updated
    May 9, 2025
    Authors
    wyc
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset: TimeChat-Online-139K

    For flexible real-time interaction, we introduce a comprehensive streaming video dataset with backward-tracing, real-time visual perception, and future-responding scenarios.

    11,043 visually informative videos (average duration: 11.1 minutes) 139K question-answer pairs covering backward tracing, real-time visual perception, and forward active responding Average of 87.8 scene-oriented key frames per video (~7.14 seconds between consecutive frames)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Organization logo

Average daily time spent on social media worldwide 2012-2025

Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description

How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

Search
Clear search
Close search
Google apps
Main menu