100+ datasets found
  1. F

    Australian English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Australian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-australia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Australia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Australian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Australian English communication.

    Curated by FutureBeeAI, this 40 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Australian accents and dialects.

    Speech Data

    The dataset comprises 40 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Australian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 80 verified native Australian English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Australia to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Australian English.
    Voice Assistants: Build smart assistants capable of understanding natural Australian conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display:

  2. F

    Australian English Call Center Data for Retail & E-Commerce AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Australian English Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-english-australia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Australia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Australian English Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English speakers. Featuring over 40 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.

    Speech Data

    The dataset contains 40 hours of dual-channel call center recordings between native Australian English speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.

    Participant Diversity:
    Speakers: 80 native Australian English speakers from our verified contributor pool.
    Regions: Representing multiple provinces across Australia to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.

    Inbound Calls:
    Product Inquiries
    Order Cancellations
    Refund & Exchange Requests
    Subscription Queries, and more
    Outbound Calls:
    Order Confirmations
    Upselling & Promotions
    Account Updates
    Loyalty Program Offers
    Customer Verifications, and others

    Such variety enhances your model’s ability to generalize across retail-specific voice interactions.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    40 hours-coded Segments
    Non-speech Tags (e.g., pauses, cough)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.

    Usage and Applications

    This dataset is ideal for a range of voice AI and NLP applications:

    Automatic Speech Recognition (ASR): Fine-tune English speech-to-text systems.

  3. SSCN SEGMENT 2-B Great Australian Bight Marine Park Bathymetry Acquisition...

    • researchdata.edu.au
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EGS Survey Pty Ltd; EGS Survey Pty Ltd (2024). SSCN SEGMENT 2-B Great Australian Bight Marine Park Bathymetry Acquisition (20240013S) [Dataset]. http://doi.org/10.26186/150168
    Explore at:
    Dataset updated
    2024
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Authors
    EGS Survey Pty Ltd; EGS Survey Pty Ltd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 17, 2024 - May 4, 2024
    Area covered
    Description
    This dataset is part of a broader survey carried out by EGS onboard the survey vessel RV Bold Explorer (RV BE) during the SUBCO Subsea Cable Network (SSCN) cable route survey.
    The area is confined to the data within the Great Australian Bight Marine Park. The Survey team and client representative joined the RV Bold Explorer on 17th April 2024 at Fremantle Port, Australia. Following the embarkation of the survey team, equipment testing and commissioning was carried out. Calibration and verification of the DGPS positioning system and survey gyros onboard RV BE were performed while the vessel was alongside at Fremantle Port, Australia between 17th April and 18th April 2024.
    Prior to departure, tow sensors and hull-mounted sensors were wet tested. All systems were found to be operational. The RV Bold Explorer departed Fremantle Port, Australia and transited to the first site on 19th April 2024. The offshore calibration of shallow water multibeam echosounder (MBES) and USBL were performed on 19th April 2024 while the calibration of deep water MBES was carried out on 20th April 2024.
    The SSCN Segment 2-B survey comprised an investigation of bathymetry, seabed features and shallow geology along the proposed route. A subsequent geotechnical sampling programme was also executed.
    This dataset is not to be used for navigational purposes.
  4. Australian National Spectral Database

    • ecat.ga.gov.au
    • researchdata.edu.au
    Updated Nov 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2021). Australian National Spectral Database [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/daf4b7d5-d962-492d-82bf-5382b4345dbe
    Explore at:
    Dataset updated
    Nov 5, 2021
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Area covered
    Description

    The National Spectral Database (NSD) houses data taken by Australian remote sensing scientists. The database includes spectra covering targets as diverse as mineralogy, soils, plants, water bodies and various land surfaces.
    Currently the database holds spectral information from multiple locations across the country and as the collection grows in spatial / temporal coverage, the NSD will service continental scale validation requirements of the Earth observation community for satellite-based measurements of surface reflectance. The NSD is accessed with information provided at the NSD Geoscience Australia Content Management Interface (CMI) web page: https://cmi.ga.gov.au/data-products/dea/643/australian-national-spectral-database

    Value: Curated spectral data provides a wealth of knowledge to remote sensing scientists. For other parties interested in calibration and validation (Cal/Val) of surface reflectance products, the Geoscience Australia (GA) Cal/Val dataset provides a useful resource of ground-truth data to compare to reflectance captured by Landsat 8 and Sentinel 2 satellites. The Aquatic Library is a robust collection of Australian datasets from 1994 to present time, primarily of end-member and substratum measurements. The University of Wollongong collection represents immense value in end-member studies, both terrestrial and aquatic.

    Scope: The NSD covers Australian data including historical datasets as old as 1994. Physical study sites encompass locations around Australia, with spectra captured in every state.

    Data types: - Spectral data: raw digital numbers (DN), radiance and reflectance. - From spectral bands VIS-NIR, SWIR1 & SWIR2: wavelengths 350nm - 2500nm collected with instruments in the field or lab setting.

    Contact for further information: NSDB_manager@ga.gov.au

    To view the entire collection click on the keyword "HVC 144490" in the below Keyword listing

  5. F

    Australian English Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Australian English Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-english-australia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Australia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Australian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 40 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 40 hours of dual-channel audio recordings between native Australian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 80 native Australian English contributors from our verified pool.
    Regions: Covering multiple Australia provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train English speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom:

  6. A

    Growing Up in Australia: Longitudinal Study of Australian Children (LSAC)...

    • dataverse.ada.edu.au
    docx, pdf, xlsx, zip
    Updated Aug 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ADA Dataverse (2022). Growing Up in Australia: Longitudinal Study of Australian Children (LSAC) Release 6 (Waves 1-6) [Dataset]. http://doi.org/10.26193/JOZW2U
    Explore at:
    pdf(19138), zip(119449731), pdf(22802), zip(182288285), zip(2163770), pdf(22350), zip(254472558), docx(1890743), pdf(6012563), pdf(19413), zip(362227261), zip(209591094), xlsx(10003332), zip(134128365), zip(164175135)Available download formats
    Dataset updated
    Aug 22, 2022
    Dataset provided by
    ADA Dataverse
    License

    https://dataverse.ada.edu.au/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.26193/JOZW2Uhttps://dataverse.ada.edu.au/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.26193/JOZW2U

    Time period covered
    Mar 2004 - Dec 2014
    Area covered
    Australia
    Dataset funded by
    Department of Social Services
    Description

    Growing Up in Australia: the Longitudinal Study of Australian Children (LSAC) is a major study following the development of 10,090 children and families from all parts of Australia. LSAC explores family and social issues while addressing a range of research questions about children’s development and wellbeing. The Wave 1 data collection was undertaken for AIFS by private social research companies Colmar Brunton Social Research and I-view/NCS Pearson. Data collection for Waves 2-6 was undertaken by the ABS. From 2004, participating families have been interviewed every two years, and between-wave mail-out questionnaires were sent to families in 2005 (Wave 1.5), 2007 (Wave 2.5) and 2009 (Wave 3.5). Additional between-wave questionnaires (Waves 4.5 and 5.5) were undertaken via online web forms from 2009 for the purposes of updating the contact details of study participants. The sampling unit of interest is the study child and there were two cohorts of children selected from children born within two 12-month periods: (1) B cohort ("Baby" cohort) - children born March 2003–February 2004, and (2) K cohort ("Kinder" cohort) - children born March 1999–February 2000. Please note that this release of LSAC is now superseded, and is available by request for approved training courses only. For the current release, please visit https://ada.edu.au/lsac_current

  7. Australian National Seismograph Network Data Collection

    • researchdata.edu.au
    Updated Jan 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2021). Australian National Seismograph Network Data Collection [Dataset]. http://doi.org/10.26186/144675
    Explore at:
    Dataset updated
    Jan 6, 2021
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Authors
    Commonwealth of Australia (Geoscience Australia)
    License

    http://creativecommons.org/licenses/http://creativecommons.org/licenses/

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Time series seismograph data recorded from Australian National Seismograph Network (ANSN) observatories in Australia, islands in the Pacific, Southern and Indian Ocean's and the Australian Antarctic Territory.

    Value: This data is used for earthquake monitoring, measurement, detection and location of earthquakes, which is valuable for emergency response, hazard modelling and mitigation. The dataset is also used to meet a subset of Australia's obligations to the Comprehensive Nuclear-Test-Ban Treaty Organisation (CTBTO) to fulfil Australia's commitment to nuclear explosion monitoring.

    Scope: Observatories in Australia, islands in the Pacific, Southern and Indian Ocean's and the Australian Antarctic Territory

  8. Table 3 in New giant genus of Parabathynellidae (Crustacea: Bathynellacea):...

    • zenodo.org
    html
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana I. Camacho; Kym M. Abrams; Tim Moulds; Ana I. Camacho; Kym M. Abrams; Tim Moulds (2025). Table 3 in New giant genus of Parabathynellidae (Crustacea: Bathynellacea): first record of Bathynellacea in an Australian cave [Dataset]. http://doi.org/10.5281/zenodo.15135663
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Apr 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ana I. Camacho; Kym M. Abrams; Tim Moulds; Ana I. Camacho; Kym M. Abrams; Tim Moulds
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    Table 3. Big and giant species of the world (2.5–4 mm ‘large’ or> 4 ‘giant’): species list and maximum size. Afrobathynella Schminke, 1976; Allobathynella Morimoto & Miura, 1957; Arkaroolabathynella Abrams&King, 2013; Atopobathynella Schminke,1973; Billibathynella Cho,2005; Brevisomabathynella Cho, Park & Ranga Reddy, 2006; Chilibathynella Noodt, 1963; Kampucheabathynella Cho, Kry & Chhenh, 2015; Kimberleybathynella Cho, Park & Humphreys, 2005; Iberobathynella Schminke, 1973; Megabathynella Camacho & Abrams gen. nov.; Montanabathynella Camacho, Stanford & Newell, 2009; Lockyerenella Camacho & Little, 2017; Nipponbathynella Schminke, 1973; Notobathynella Schminke, 1973; Onychobathynella Camacho & Hancock, 2011; Paraeobathynella Camacho, 2005; Parabathynella Chappuis, 1926; Paraiberobathynella Camacho & Serban, 1998; Sinobathynella Camacho, Trontelj & Zagmajster, 2006. In bold the ‘giant’ species.

    GeneraSpeciesSizeCountry
    AfrobathynellaA. trimera2.7South Africa
    AllobathynellaA. donggangensis2.54South Korea
    A. gigantea “pluto ”3.3Japan
    A. maseongensis2.55South Korea
    A. munsui3.41South Korea
    A. okcheonensis2.73South Korea
    ArkaroolabathynellaA. remkoi2.2–3.3Australia (South Australia)
    AtopobathynellaA. wattsi3.0Australia (Western Australia)
    BillibathynellaB. humphreysi5.45 6.30Australia (Western Australia)
    B. ilgarariensis3.0–3.17Australia (Western Australia)
    B. wolframnoodti4.56–5.12Australia (Western Australia)
    BrevisomabathynellaB. changjini4.24Australia (Western Australia)
    B. clayi3.52Australia (Western Australia)
    B. jundeeensis3.42Australia (Western Australia)
    B. magna4.62Australia (Western Australia)
    B. uramurdahensis3.62Australia (Western Australia)
    ChilibathynellaC. joshuai2.8Australia (Queensland)
    KampucheabathynellaK. khaeiptouka4.52–4.72Cambodia
    KimberleybathynellaK. gigantea3.91Australia (Western Australia)
    IberobathynellaI. barcelensis3.4Portugal
    I. gracilipes4.0Portugal
    I. lusitanica3.0Portugal
    I. paragracilipe s3.2Spain
    Megabathynella gen. nov.M. totemensis sp. nov.4.1–5.9Australia (Northern Territory)
    MontanabathynellaM. salish3.0USA (Montana)
    NipponbathynellaN. pectina2.57South Korea
    NotobathynellaN. lemurum2.5Madagascar
    N. octocamura2.7Australia (Queensland)
    OnychobathynellaO. bifurcata2.54Australia (Queensland)
    ParaeobathynellaPe. siamensis2.60Thailand
    ParabathynellaP. badenwuerttembergensis2.5Germany
    ParaiberobathynellaPi. fagei2.8Spain, France
    Pi. maghrebensis2.8Morocco
    SinobathynellaS. decamera3.70China

  9. A

    Growing Up in Australia: Longitudinal Study of Australian Children (LSAC)...

    • dataverse.ada.edu.au
    bin, pdf, xlsx, zip
    Updated Jun 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ADA Dataverse (2022). Growing Up in Australia: Longitudinal Study of Australian Children (LSAC) Release 7.2 (Waves 1-7) [Dataset]. http://doi.org/10.26193/F2YRL5
    Explore at:
    bin(1890743), zip(13296016), pdf(19581), zip(130581964), xlsx(9854301), zip(37878024), pdf(19283), zip(775550138), pdf(3347323), zip(49743), zip(37747084), pdf(74688), zip(2257209), zip(436820999), xlsx(489593), zip(1679110), zip(4770832), zip(561595823), zip(1139915), pdf(16340), pdf(16434)Available download formats
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    ADA Dataverse
    License

    https://dataverse.ada.edu.au/api/datasets/:persistentId/versions/8.4/customlicense?persistentId=doi:10.26193/F2YRL5https://dataverse.ada.edu.au/api/datasets/:persistentId/versions/8.4/customlicense?persistentId=doi:10.26193/F2YRL5

    Time period covered
    Mar 2004 - Dec 2016
    Area covered
    Australia
    Dataset funded by
    Department of Social Services
    Description

    Growing Up in Australia: The Longitudinal Study of Australian Children (LSAC) is a major study following the development of approximately 10,000 young people and their families from all parts of Australia. It is conducted in partnership between the Department of Social Services, the Australian Institute of Family Studies and the Australian Bureau of Statistics with advice provided by a consortium of leading researchers. The study began in 2003 with a representative sample of children (who are now teens and young adults) from urban and rural areas of all states and territories in Australia. The study has a multi-disciplinary base, and examines a broad range of research questions about development and wellbeing over the life course in relation to topics such as parenting, family, peers, education, child care and health. It will continue to follow participants into adulthood. The study informs social policy and is used to identify opportunities for early intervention and prevention strategies. Participating families have been interviewed every two years from 2004, and between-wave mail-out questionnaires were sent to families in 2005 (Wave 1.5), 2007 (Wave 2.5) and 2009 (Wave 3.5). The B cohort (“Baby” cohort) of around 5,000 children was aged 0–1 years in 2003–04, and the K cohort (“Kinder” cohort) of around 5,000 children was aged 4–5 years in 2003–04. Study informants include the young person, their parents (both resident and non-resident), carers and teachers. Please note that this release of LSAC is now superseded, and is available by request for approved training courses only. For the current release, please visit https://ada.edu.au/lsac_current

  10. Table 2 in Taxonomic revision of Australian Erythrophleum (Fabaceae:...

    • zenodo.org
    html
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Russell L. Barrett; Matthew D. Barrett; Russell L. Barrett; Matthew D. Barrett (2025). Table 2 in Taxonomic revision of Australian Erythrophleum (Fabaceae: Caesalpinioideae) including description of two new species [Dataset]. http://doi.org/10.5281/zenodo.15140822
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Apr 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Russell L. Barrett; Matthew D. Barrett; Russell L. Barrett; Matthew D. Barrett
    Description

    Table 2. Taxa analysed, vouchers, and GenBank and ENA reference numbers.

    SpeciesVoucherGenBank and ENA accession numbers
    Erythrophleum arenarium R.L.Barrett & M.D.BarrettWestern Australia (WA), Barrett 9080 (NSW, PERTH)MT581272
    WA, Bean 25079 (BRI)MT581273
    WA, Byrne 1271 (PERTH 07148453)MT581270
    WA, Forbes 2465 (PERTH 01958534)MT581269
    WA, Sweedman 8997 (PERTH 08786607)MT581271
    Erythrophleum chlorostachysWA, Byrne 3693 (PERTH 08793034)MT581275
    (F.Muell.) Baill.Northern Territory (NT), Lazarides 8845 (CANB 295340)MT581276
    WA, Weston 12284 (PERTH 02211750)MT581274
    NT, Larcombe 2 (DNA D0057605)MT581277
    NT, Wightman 5205 (DNA D0051920)MT581278
    Erythrophleum fordii Oliv.China, no voucherITS1 only, consensus assembly from SRR8191117 (Wang et al. 2019) A
    China, no voucherContiguous nearly complete (with gaps) 18S–ITS1–5.8S–ITS2–28S sequence, consensus assembly from SRR8191118 (Wang et al. 2019)
    Erythrophleum ivorense A.Chev.Gabon, Wieringa 5487 (WAG)OQ572325; contiguous complete 18S–ITS1–5.8S–ITS2–28S sequence, consensus assembly from ERR4363217 (Koenen et al. 2020)
    Erythrophleum pubescens R.L.Barrett & M.D.BarrettWA, Barrett MDB5902 (plant 1) (PERTH)MT581285
    WA, Barrett MDB5902 (plant 2) (PERTH)MT581286
    WA, Byrne 3721 (PERTH 08760632)MT581282
    WA, Coate 224 (PERTH 02888580)MT581279
    WA, Dauncey H666 (PERTH 08422060) BMT581297
    WA, Foulkes 340 (PERTH 02523922)MT581280
    NT, Brennan 4583 (DNA D0146433)MT581290
    NT, Clark 1670 (DNA D0034313)MT581291
    NT, Cowie 5303 (DNA D0121980)MT581292
    NT, Dunlop 7145 (NSW 451601)MT581293
    NT, Egan 2873 (DNA D0077644)MT581294
    NT, Evans 3270 (NSW 451600) CMT581289
    NT, Smith 101 (DNA D0044224)MT581295
    NT, Smith 128 (DNA D0029224)MT581296
    NT, Whaite 3979 & Whaite (NSW 415299)MT581283
    Queensland (Qld), Blake 23184 (PERTH 02211556)MT581281
    Qld, Leitch QDA003815 (BRI AQ854110)MT581288
    Qld, McDonald KRM9767 (BRI AQ846978)MT581287
    Qld, Wannan 213 & Lynch (NSW 396373)MT581284
    Qld, McDonald KRM17554 (MEL 2416964A)OQ471964 (dominant copy, contiguous complete 18S–ITS1–5.8S–ITS2–28S sequence) and
    OQ396764 (minor copy, ITS1–5.8S–ITS2 only), consensus assembly from ERR7599610
    Pachyelasma tessmannii (Harms) HarmsGabon, Wieringa 5229 (WAG)OQ572326; contiguous complete 18S–ITS1–5.8S–ITS2–28S sequence, consensus assembly from ERR4363236 (Koenen et al. 2020)

    (Continued on next page)

    A Sample was assembled from short-read archives, but could not be uploaded to GenBank because third-party assemblies require wet-lab experiments to meet requirements. Sequences are provided in the ribosomal alignment (see File S1.nex in the Supplementary material).

    B The only full-length ITS sequence obtained from Sanger sequencing.

    C The sample Evans 3270 was excluded from further analyses. Although the sample was resolved with E. pubescens (77% BS in RAxML tree, data not shown) as expected from its morphology, the sequence chromatograms were messy, resulting in ambiguous base calls, and the edited sequence still somewhat divergent from other E. pubescens. Because it was not possible to distinguish among hybridisation, paralogy or contamination as the cause of different base calls to other Erythrophleum haplotypes, the sequence was excluded from analyses. A second sample (Egan 2873) from the same locality (Cutta Cutta Caves, NT) produced a haplotype belonging to the E. pubescens clade, as expected from its morphological features, and it is likely that Evans 3270, likewise, represents E. pubescens; however a hybrid origin for this specimen with E. chlorostachys cannot be confidently excluded.

  11. p

    ABS - Census of Population and Housing - Country of birth of person by age -...

    • data.peclet.com.au
    • data.cumberland.nsw.gov.au
    csv, excel, json
    Updated Jul 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ABS - Census of Population and Housing - Country of birth of person by age - Suburb Level - G09 [Dataset]. https://data.peclet.com.au/explore/dataset/abs-g09-suburb-level-by-lga-and-state/
    Explore at:
    json, excel, csvAvailable download formats
    Dataset updated
    Jul 31, 2024
    Description

    ABS Census data extract - G09 COUNTRY OF BIRTH OF PERSON BY AGE providing a breakdown of population at Suburb level and by:age groupscountry of birth of person(a)Australia(b)China (excludes SARs and Taiwan)(c)Hong Kong (SAR of China)(c)Born elsewhere(d)This data is based on place of usual residence.(a) This list consists of the most common 50 Country of Birth responses reported in the 2016 Census and 2011 Census.(b) Includes 'Australia', 'Australia (includes External Territories), nfd', 'Norfolk Island' and 'Australian External Territories, nec'.(c) Special Administrative Regions (SARs) comprise 'Hong Kong (SAR of China)' and 'Macau (SAR of China)'. (d) Includes countries not identified individually, 'Inadequately described', and 'At sea'. Excludes not stated.Please note that there are small random adjustments made to all cell values to protect the confidentiality of data. These adjustments may cause the sum of rows or columns to differ by small amounts from table totals.

  12. f

    Redefining the Australian Anthrax Belt: Modeling the Ecological Niche and...

    • plos.figshare.com
    jpeg
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alassane S. Barro; Mark Fegan; Barbara Moloney; Kelly Porter; Janine Muller; Simone Warner; Jason K. Blackburn (2023). Redefining the Australian Anthrax Belt: Modeling the Ecological Niche and Predicting the Geographic Distribution of Bacillus anthracis [Dataset]. http://doi.org/10.1371/journal.pntd.0004689
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Neglected Tropical Diseases
    Authors
    Alassane S. Barro; Mark Fegan; Barbara Moloney; Kelly Porter; Janine Muller; Simone Warner; Jason K. Blackburn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ecology and distribution of B. anthracis in Australia is not well understood, despite the continued occurrence of anthrax outbreaks in the eastern states of the country. Efforts to estimate the spatial extent of the risk of disease have been limited to a qualitative definition of an anthrax belt extending from southeast Queensland through the centre of New South Wales and into northern Victoria. This definition of the anthrax belt does not consider the role of environmental conditions in the distribution of B. anthracis. Here, we used the genetic algorithm for rule-set prediction model system (GARP), historical anthrax outbreaks and environmental data to model the ecological niche of B. anthracis and predict its potential geographic distribution in Australia. Our models reveal the niche of B. anthracis in Australia is characterized by a narrow range of ecological conditions concentrated in two disjunct corridors. The most dominant corridor, used to redefine a new anthrax belt, parallels the Eastern Highlands and runs from north Victoria to central east Queensland through the centre of New South Wales. This study has redefined the anthrax belt in eastern Australia and provides insights about the ecological factors that limit the distribution of B. anthracis at the continental scale for Australia. The geographic distributions identified can help inform anthrax surveillance strategies by public and veterinary health agencies.

  13. d

    Data from: Molecular phylogeny and phylogeography of the Australian...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Jun 14, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter J. Unmack; Justin C. Bagley; Mark Adams; Michael P. Hammer; Jerald B. Johnson (2012). Molecular phylogeny and phylogeography of the Australian freshwater fish genus Galaxiella, with an emphasis on dwarf Galaxias (G. pusilla) [Dataset]. http://doi.org/10.5061/dryad.c3g8h
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2012
    Dataset provided by
    Dryad
    Authors
    Peter J. Unmack; Justin C. Bagley; Mark Adams; Michael P. Hammer; Jerald B. Johnson
    Time period covered
    May 13, 2012
    Area covered
    Australia
    Description

    Cytochrome b data file for all Galaxiella individuals sequencedCytochrome b data for all individuals included in our study in nexus format. Sequence names match the species initials and the field codes shown in locality code column in our locality data table. Samples lacking a field code used the first four letters of the waterbody name, while samples ending in gb were from GenBank.galaxiella.allspp.cytb.all.ind.nexS7 data file for GalaxiellaS7 data for all individuals included in our study in nexus format. Data were aligned with MAFFT and phased in DnaSP (each allele is indicated by .a or .b at the end of the OTU code). Sequence names match the species initials and the field codes shown in locality code column in our locality data table. Samples lacking a field code used the first four letters of the waterbody name, while samples ending in gb were from GenBank.galaxiella.s7.allspp.matft.fftnsi.byphase.nexArlequin analysis file (part 1 of 2) for Galaxiella pusilla Cytb sequencesThe inp...

  14. Australia AU: Iron Ore: Class A and B: Closing Stock

    • ceicdata.com
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2024). Australia AU: Iron Ore: Class A and B: Closing Stock [Dataset]. https://www.ceicdata.com/en/australia/environmental-mineral-and-energy-resources-by-commodity-oecd-member-annual/au-iron-ore-class-a-and-b-closing-stock
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2010 - Dec 1, 2021
    Area covered
    Australia
    Description

    Australia Iron Ore: Class A and B: Closing Stock data was reported at 51.850 Tonne bn in 2021. This records an increase from the previous number of 51.450 Tonne bn for 2020. Australia Iron Ore: Class A and B: Closing Stock data is updated yearly, averaging 17.950 Tonne bn from Dec 1989 (Median) to 2021, with 33 observations. The data reached an all-time high of 53.250 Tonne bn in 2014 and a record low of 12.700 Tonne bn in 2002. Australia Iron Ore: Class A and B: Closing Stock data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s Australia – Table AU.OECD.ESG: Environmental: Mineral and Energy Resources: by Commodity: OECD Member: Annual. Class A refers to commercially recoverable resources; Class B refers to potentially commercially recoverable resources; Class C refers to non-commercial and other known deposits

  15. Motor Vehicle Body and Trailer Manufacturing in Australia - Market Research...

    • ibisworld.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IBISWorld (2025). Motor Vehicle Body and Trailer Manufacturing in Australia - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/au/industry/motor-vehicle-body-and-trailer-manufacturing/251/
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    IBISWorld
    License

    https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/

    Time period covered
    2015 - 2030
    Area covered
    Australia
    Description

    Manufacturers in the Motor Vehicle Body and Trailer Manufacturing industry have faced mixed conditions over the past decade. After Australian car manufacturing collapsed in 2017, industry manufacturers were forced to significantly shift their operations and strategies. Motor vehicle body manufacturers have declined as a source of overall revenue, while RV manufacturing has filled the gaps. The pandemic added further volatility, constraining manufacturing output and generating sharp fluctuations in demand. However, Australia's international border closures during the pandemic were a boon to RV manufacturers, as interest in domestic holidays soared. Overall, revenue is expected to have climbed at an annualised 1.7% over the five years through 2024-25 to $6.7 billion. This includes an anticipated plummet of 6.2% in 2024-25 as cost-of-living pressures resulting from elevated interest rates and high inflation weigh on downstream demand. In 2022, the ACCC took regulatory action against Australian caravan manufacturers following an extensive review of consumer complaints. The review raised issues around the quality of information given to customers when they purchase a defective product. Some customers reported struggling to get properly reimbursed after finding faults with the caravan they purchased. Despite this high-profile action, locally manufactured caravans are more popular than ever. Consumer interest in domestic holidays surged during the pandemic, and domestically manufactured caravans are still overwhelmingly more popular than imported substitutes. Although demand has softened after a spike in 2022-23, industrywide profitability has risen amid falling input costs. Revenue is projected to hike over the coming years, but demand conditions will be challenged by rising outbound travel by Australians and robust import competition. These trends are set to make it more difficult for smaller manufacturers to survive, resulting in an uptick in industry exits. Overall, revenue is forecast to rise at an annualised 1.1% over the five years through 2029-30 to $7.0 billion.

  16. f

    Demographics for HIV-1 B subtype, non-B subtypes and sequences in a network...

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alison Castley; Shailendra Sawleshwarkar; Rick Varma; Belinda Herring; Kiran Thapa; Dominic Dwyer; Doris Chibo; Nam Nguyen; Karen Hawke; Rodney Ratcliff; Roger Garsia; Anthony Kelleher; David Nolan (2023). Demographics for HIV-1 B subtype, non-B subtypes and sequences in a network including age, gender, AMEN centre and sequencing era. [Dataset]. http://doi.org/10.1371/journal.pone.0170601.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alison Castley; Shailendra Sawleshwarkar; Rick Varma; Belinda Herring; Kiran Thapa; Dominic Dwyer; Doris Chibo; Nam Nguyen; Karen Hawke; Rodney Ratcliff; Roger Garsia; Anthony Kelleher; David Nolan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Demographics for HIV-1 B subtype, non-B subtypes and sequences in a network including age, gender, AMEN centre and sequencing era.

  17. A

    Australia AU: Crude Oil: Class A and B: Closing Stock

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, Australia AU: Crude Oil: Class A and B: Closing Stock [Dataset]. https://www.ceicdata.com/en/australia/environmental-mineral-and-energy-resources-by-commodity-oecd-member-annual/au-crude-oil-class-a-and-b-closing-stock
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2012 - Dec 1, 2023
    Area covered
    Australia
    Description

    Australia Crude Oil: Class A and B: Closing Stock data was reported at 0.743 Bar bn in 2023. This records a decrease from the previous number of 0.773 Bar bn for 2022. Australia Crude Oil: Class A and B: Closing Stock data is updated yearly, averaging 1.111 Bar bn from Dec 1988 (Median) to 2023, with 36 observations. The data reached an all-time high of 1.784 Bar bn in 1995 and a record low of 0.743 Bar bn in 2023. Australia Crude Oil: Class A and B: Closing Stock data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s Australia – Table AU.OECD.ESG: Environmental: Mineral and Energy Resources: by Commodity: OECD Member: Annual. Class A refers to commercially recoverable resources; Class B refers to potentially commercially recoverable resources; Class C refers to non-commercial and other known deposits

  18. Australia AU: Copper: Class A and B: Closing Stock

    • ceicdata.com
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Australia AU: Copper: Class A and B: Closing Stock [Dataset]. https://www.ceicdata.com/en/australia/environmental-mineral-and-energy-resources-by-commodity-oecd-member-annual/au-copper-class-a-and-b-closing-stock
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2010 - Dec 1, 2021
    Area covered
    Australia
    Description

    Australia Copper: Class A and B: Closing Stock data was reported at 98.500 Tonne mn in 2021. This records an increase from the previous number of 96.250 Tonne mn for 2020. Australia Copper: Class A and B: Closing Stock data is updated yearly, averaging 41.850 Tonne mn from Dec 1989 (Median) to 2021, with 33 observations. The data reached an all-time high of 98.500 Tonne mn in 2021 and a record low of 6.700 Tonne mn in 1990. Australia Copper: Class A and B: Closing Stock data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s Australia – Table AU.OECD.ESG: Environmental: Mineral and Energy Resources: by Commodity: OECD Member: Annual. Class A refers to commercially recoverable resources; Class B refers to potentially commercially recoverable resources; Class C refers to non-commercial and other known deposits

  19. Australian Episodic GNSS Survey Data Collection

    • researchdata.edu.au
    Updated Jan 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia); Manager Client Services (2021). Australian Episodic GNSS Survey Data Collection [Dataset]. https://researchdata.edu.au/australian-episodic-gnss-data-collection/3404175
    Explore at:
    Dataset updated
    Jan 6, 2021
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Authors
    Commonwealth of Australia (Geoscience Australia); Manager Client Services
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    http://creativecommons.org/licenses/http://creativecommons.org/licenses/

    Area covered
    Description

    This collection includes Global Navigation Satellite System (GNSS) observations from short-term occupations at multiple locations across Australia and its external territories, including the Australian Antarctic Territory.

    Value: The datasets within this collection are available to support a myriad of scientific applications, including research into the crustal deformation of the Australian continent.

    Scope: Data from selected areas of interest across Australia and its external territories, including the Australian Antarctic Territory. Over time there has been a focus on areas with increased risk of seismic activity or areas with observed natural or anthropogenic deformation.

    Access: The datasets within this collection are currently stored offline, to access please send a request to gnss@ga.gov.au

  20. f

    Coding framework: Drivers of lung cancer screening participation in...

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate L. A. Dunlop; Henry M. Marshall; Emily Stone; Ashleigh R. Sharman; Rachael H. Dodd; Joel J. Rhee; Sue McCullough; Nicole M. Rankin (2023). Coding framework: Drivers of lung cancer screening participation in Australia using the COM-B (capability, opportunity, motivation-behaviour) model. [Dataset]. http://doi.org/10.1371/journal.pone.0275361.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kate L. A. Dunlop; Henry M. Marshall; Emily Stone; Ashleigh R. Sharman; Rachael H. Dodd; Joel J. Rhee; Sue McCullough; Nicole M. Rankin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    Coding framework: Drivers of lung cancer screening participation in Australia using the COM-B (capability, opportunity, motivation-behaviour) model.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). Australian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-australia

Australian English General Conversation Speech Dataset for ASR

Australian English General Conversation Speech Corpus

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Area covered
Australia
Dataset funded by
FutureBeeAI
Description

Introduction

Welcome to the Australian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Australian English communication.

Curated by FutureBeeAI, this 40 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Australian accents and dialects.

Speech Data

The dataset comprises 40 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Australian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

Participant Diversity:
Speakers: 80 verified native Australian English speakers from FutureBeeAI’s contributor community.
Regions: Representing various provinces of Australia to ensure dialectal diversity and demographic balance.
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
Recording Details:
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
Duration: Each conversation ranges from 15 to 60 minutes.
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity

The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

Sample Topics Include:
Family & Relationships
Food & Recipes
Education & Career
Healthcare Discussions
Social Issues
Technology & Gadgets
Travel & Local Culture
Shopping & Marketplace Experiences, and many more.

Transcription

Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

Transcription Highlights:
Speaker-segmented dialogues
Time-coded utterances
Non-speech elements (pauses, laughter, etc.)
High transcription accuracy, achieved through double QA pass, average WER < 5%

These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

Metadata

The dataset comes with granular metadata for both speakers and recordings:

Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

Usage and Applications

This dataset is a versatile resource for multiple English speech and language AI applications:

ASR Development: Train accurate speech-to-text systems for Australian English.
Voice Assistants: Build smart assistants capable of understanding natural Australian conversations.
<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display:

Search
Clear search
Close search
Google apps
Main menu