66 datasets found

d
MD COVID-19 - Number of Persons Tested Negative
catalog.data.gov
opendata.maryland.gov
Updated Sep 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
opendata.maryland.gov (2022). MD COVID-19 - Number of Persons Tested Negative [Dataset]. https://catalog.data.gov/dataset/md-covid-19-number-of-persons-tested-negative
Explore at:
Dataset updated
Sep 2, 2022
Dataset provided by
opendata.maryland.gov
Area covered
Maryland
Description
NOTE: This layer is deprecated (last updated 2/16/2022). Was formerly a daily update. Summary The cumulative number of Maryland residents who tested negative for COVID-19. Description The MD COVID-19 - Number of Persons Tested Negative data layer is a collection of the number of people statewide who have tested negative for COVID-19 reported each day by each local health department via the NEDSS system. Terms of Use The Spatial Data, and the information therein, (collectively the "Data") is provided "as is" without warranty of any kind, either expressed, implied, or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted, nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct, indirect, incidental, consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data, nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.
d
COVID-19 Daily Testing - By Person - Historical
datasets.ai
data.cityofchicago.org
+3more
23, 40, 55, 8
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2024). COVID-19 Daily Testing - By Person - Historical [Dataset]. https://datasets.ai/datasets/covid-19-daily-testing-by-person
Explore at:
55, 8, 23, 40Available download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
City of Chicago
Description
This dataset is historical only and ends at 5/7/2021. For more information, please see http://dev.cityofchicago.org/open%20data/data%20portal/2021/05/04/covid-19-testing-by-person.html. The recommended alternative dataset for similar data beyond that date is https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test/gkdw-2tgv.

This is the source data for some of the metrics available at https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html.

For all datasets related to COVID-19, see https://data.cityofchicago.org/browse?limitTo=datasets&sortBy=alpha&tags=covid-19.

This dataset contains counts of people tested for COVID-19 and their results. This dataset differs from https://data.cityofchicago.org/d/gkdw-2tgv in that each person is in this dataset only once, even if tested multiple times. In the other dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in that dataset more than once on the same day unless he/she had both a positive and not-positive test.

Only Chicago residents are included based on the home address as provided by the medical provider.

Molecular (PCR) and antigen tests are included, and only one test is counted for each individual. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table.

Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results.

Demographic data are more complete for those who test positive; care should be taken when calculating percentage positivity among demographic groups.

All data are provisional and subject to change. Information is updated as additional details are received.

Data Source: Illinois National Electronic Disease Surveillance System
O
COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE
data.ct.gov
gimi9.com
+1more
application/rdfxml +5
Updated Jun 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health (2022). COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/COVID-19-Tests-Cases-Hospitalizations-and-Deaths-S/rf3k-f8fg
Explore at:
tsv, application/rdfxml, xml, json, csv, application/rssxmlAvailable download formats
Dataset updated
Jun 24, 2022
Dataset authored and provided by
Department of Public Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

COVID-19 tests, cases, and associated deaths that have been reported among Connecticut residents. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Hospitalization data were collected by the Connecticut Hospital Association and reflect the number of patients currently hospitalized with laboratory-confirmed COVID-19. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update.

Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics

Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes.

Starting in July 2020, this dataset will be updated every weekday.

Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.

A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.

Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.

Starting April 4, 2022, negative rapid antigen and rapid PCR test results for SARS-CoV-2 are no longer required to be reported to the Connecticut Department of Public Health as of April 4. Negative test results from laboratory based molecular (PCR/NAAT) results are still required to be reported as are all positive test results from both molecular (PCR/NAAT) and antigen tests.

On 5/16/2022, 8,622 historical cases were included in the data. The date range for these cases were from August 2021 – April 2022.”
Chicago Contracts
kaggle.com
zip
Updated Nov 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2019). Chicago Contracts [Dataset]. https://www.kaggle.com/chicago/chicago-contracts
Explore at:
zip(7885959 bytes)Available download formats
Dataset updated
Nov 7, 2019
Dataset authored and provided by
City of Chicago
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
Content

Contracts and modifications awarded by the City of Chicago since 1993. This data is currently maintained in the City’s Financial Management and Purchasing System (FMPS), which is used throughout the City for contract management and payment. Legacy System Records: Purchase Order/Contract Numbers that begin with alpha characters identify records imported from legacy systems. Records with a null value in the Contract Type field were imported from legacy systems. "Comptroller-Other" Contract Type: Some records where the Contract Type is "COMPTROLLER-OTHER" are ordinance-based agreements and may have start dates earlier than 1993. Depends Upon Requirements Contracts: If the contract Award Amount is $0, the contract is not cancelled, and the contract is a blanket contract, then the contract award total Depends Upon Requirements. A Depends Upon Requirements contract is an indefinite quantities contract in which the City places orders as needed and the vendor is not guaranteed any particular contract award amount.

Blanket vs. Standard Contracts: Only blanket contracts (contracts for repeated purchases) have FMPS end dates. Standard contracts (for example, construction contracts) terminate upon completion and acceptance of all deliverables. These dates are tracked outside of FMPS.

Negative Modifications: Some contracts are modified to delete scope and money from a contract. These reductions are indicated by negative numbers in the Award Amount field of this dataset.

Data Owner: Procurement Services. Time Period: 1993 to present. Frequency: Data is updated daily.

Context

This is a dataset hosted by the City of Chicago. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Chicago using Kaggle and all of the data sources available through the City of Chicago organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

Cover photo by rawpixel on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
B
The Interpersonal Nature of Self-Talk Dataset
borealisdata.ca
search.dataone.org
Updated Dec 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jean Paul Lefebvre; Pamela Sadler; Ailill Hall; Erik Woody (2021). The Interpersonal Nature of Self-Talk Dataset [Dataset]. http://doi.org/10.5683/SP2/GN357A
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/GN357A
Dataset updated
Dec 23, 2021
Dataset provided by
Borealis
Authors
Jean Paul Lefebvre; Pamela Sadler; Ailill Hall; Erik Woody
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset funded by
Social Sciences and Humanities Research Council of Canada
Description
This is the dataset associated with a daily diary study designed to explore the stylistic quality of people’s reflective self-talk and the effects of those qualities on emotion immediately following that self-talk. Participants engaged in reflective self-talk each day about one positive and one negative event from that day. Participants rated the emotional intensity of each daily event which served as the inspiration for an occasion of self-talk. Occasions of self-talk were rated on the interpersonal circumplex octants with a novel measure created for this study (OLIPS), both by participants themselves and later by independent raters. Participants also reported their positive and negative affect following each instance of self-talk (I-PANAS-SF; Thompson, 2007). Participants also completed a number of trait-level measures either before the diary study or on a follow-up visit after the study: Interpersonal Self-Talk Scale (Price, 2015) Forms of Self-Criticising/Attacking & Self-Reassuring Scale (FSCRS; Gilbert et al., 2004) Revised Interpersonal Adjective Scales (IAS-R; Wiggins, Trapnell, & Phillips, 1988) Neuroticism Scale from the Big Five Version of the Revised Interpersonal Adjectives Scales (IASR-B5; Trapnell & Wiggins, 1990) Data was collected from university undergraduates in southern Ontario, Canada. The data set is associated with the following paper: Lefebvre, J.P., Sadler, S., Hall, A., & Woody, E. (2022). The interpersonal nature of self-talk: Variations across individuals and occasions. Journal of Personality and Social Psychology: Personality Processes and Individual Differences. https://doi.org/10.1037/pspp0000405
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Kishor Datta Gupta
Nafiz Sadman
Nishat Anjum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Temporary Foreign Worker Program Labour Market Impact Assessment Statistics...
open.canada.ca
csv, doc
Updated Dec 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Employment and Social Development Canada (2024). Temporary Foreign Worker Program Labour Market Impact Assessment Statistics 2023Q1-2024Q3 [Dataset]. https://open.canada.ca/data/en/dataset/e8745429-21e7-4a73-b3f5-90a779b78d1e
Explore at:
csv, docAvailable download formats
Dataset updated
Dec 20, 2024
Dataset provided by
Ministry of Employment and Social Development of Canadahttp://esdc-edsc.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jan 1, 2023 - Sep 30, 2024
Description
Overview: Each quarter, the Temporary Foreign Worker Program (TFWP) publishes Labour Market Impact Assessment (LMIA) statistics on Open Government Data Portal, including quarterly and annual LMIA data related to, but not limited to, requested and approved TFW positions, employment location, employment occupations, sectors, TFWP stream and temporary foreign workers by country of origin. The TFWP does not collect data on the number of TFWs who are hired by an employer and have arrived in Canada. The decision to issue a work permit rests with Immigration, Refugees and Citizenship Canada (IRCC) and not all positions on a positive LMIA result in a work permit. For these reasons, data provided in the LMIA statistics cannot be used to calculate the number of TFWs that have entered or will enter Canada. IRCC publishes annual statistics on the number of foreign workers who are issued a work permit: https://open.canada.ca/data/en/dataset/360024f2-17e9-4558-bfc1-3616485d65b9. Please note that all quarterly tables have been updated to NOC 2021 (5 digit and training, education, experience and responsibilities (TEER) based). As such, Table 5, 8, 17, and 24 will no longer be updated but will remain as archived tables. Frequency of Publication: Quarterly LMIA statistics cover data for the four quarters of the previous calendar year and the quarter(s) of the current calendar year. Quarterly data is released within two to three months of the most recent quarter. The release dates for quarterly data are as follows: Q1 (January to March) will be published by early June of the current year; Q2 (April to June) will be published by early September of the current year; Q3 (July to September) will be published by early December of the current year; and Q4 (October to December) will be published by early March of the next year. Annual statistics cover eight consecutive years of LMIA data and are scheduled to be released in March of the next year. Published Data: As part of the quarterly release, the TFWP updates LMIA data for 28 tables broken down by: TFW positions: Tables 1 to 10, 12, 13, and 22 to 24; LMIA applications: Tables 14 to 18; Employers: Tables 11, and 19 to 21; and Seasonal Agricultural Worker Program (SAWP): Tables 25 to 28. In addition, the TFWP publishes 2 lists of employers who were issued a positive or negative LMIA: Employers who were issued a positive LMIA by Program Stream, NOC, and Business Location (https://open.canada.ca/data/en/dataset/90fed587-1364-4f33-a9ee-208181dc0b97/resource/b369ae20-0c7e-4d10-93ca-07c86c91e6fe); and Employers who were issued a negative LMIA by Program Stream, NOC, and Business Location (https://open.canada.ca/data/en/dataset/f82f66f2-a22b-4511-bccf-e1d74db39ae5/resource/94a0dbee-e9d9-4492-ab52-07f0f0fb255b). Things to Remember: 1. When data are presented on positive or negative LMIAs, the decision date is used to allocate which quarter the data falls into. However, when data are presented on when LMIAs are requested, it is based on the date when the LMIA is received by ESDC. 2. As of the publication of 2022Q1- 2023Q4 data (published in April 2024) and going forward, all LMIAs in support of 'Permanent Residence (PR) Only' are included in TFWP statistics, unless indicated otherwise. All quarterly data in this report includes PR Only LMIAs. Dual-intent LMIAs and corresponding positions are included under their respective TFWP stream (e.g., low-wage, high-wage, etc.) This may impact program reporting over time. 3. Attention should be given for data that are presented by ‘Unique Employers’ when it comes to manipulating the data within that specific table. One employer could be counted towards multiple groups if they have multiple positive LMIAs across categories such as program stream, province or territory, or economic region. For example, an employer could request TFWs for two different business locations, and this employer would be counted in the statistics of both economic regions. As such, the sum of the rows within these ‘Unique Employer’ tables will not add up to the aggregate total.
F
Bahasa Conversation Chat Dataset for Travel Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Conversation Chat Dataset for Travel Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/bahasa-travel-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 10,000 chat conversations, each focusing on specific Travel related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 150+ native Bahasa participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on Travel topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Travel use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Calls:
•Booking Inquiries & Assistance
•Destination Information & Recommendations
• Flight Delays or Cancellation Assistance
•Assistance for Disable Passengers
•Travel-related Health & Safety Inquiry
•Lost or Delayed Baggage Assistance, and many more
•Outbound Calls:
•Promotional Offers & Package Deals
•Customer Satisfaction Surveys
•Booking Confirmations & Updates
•Flight Schedule Changes & Notifications
•Customer Feedback Collection
•Visa Expiration Reminders, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in Bahasa Travel interactions. This diversity ensures the dataset accurately represents the language used by Bahasa speakers in Travel contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of Bahasa personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Bahasa-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Bahasa forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Bahasa Travel conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Bahasa Travel interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Travel customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
<span
Data from: Dataset for the paper
figshare.com
txt
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valeriya Koncha; Anton Kazun (2023). Dataset for the paper [Dataset]. http://doi.org/10.6084/m9.figshare.24274720.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24274720.v1
Dataset updated
Oct 9, 2023
Dataset provided by
figshare
Authors
Valeriya Koncha; Anton Kazun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An increase in morbidity and mortality due to COVID-19 in 2020-2022 has forced various countries to introduce lockdowns. Due to unfavorable economic consequences, this measure often caused a negative attitude toward the population, leading to sabotage and even protests. In this study, we question whether it is possible to change the population's attitude towards lockdown by emphasizing economic loss prevention. Based on the results of an online survey of 23,064 residents of Russia, we show that mentioning the negative economic consequences of a lockdown reduces the level of support for it. In contrast, mentioning the possibility of avoiding long-term negative consequences for the economy reinforces this support. The influence of economic loss prevention treatment holds for the poor and people with full-time employment, although these are groups that the lockdown can affect in the first place. Moreover, we show that economic loss prevention treatment can even influence people's opinions who were initially firmly against the lockdown. However, loss prevention treatment is not significant for people who have already experienced the pandemic's direct negative economic consequences.
F
Swedish Conversation Chat Dataset for BFSI Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Swedish Conversation Chat Dataset for BFSI Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/swedish-bfsi-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 10,000 chat conversations, each focusing on specific BFSI-related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 150+ native Swedish participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on BFSI topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various BFSI use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Chats:
•Account Opening
•Account Management
•Transactions
•Loan Inquiries & Applications
•Credit Card Services, and many more
•Outbound Chats:
•Product & Service Promotions
•Cross-selling & Upselling
•Customer Retention & Loyalty Programs
•Loan Application Follow-ups
•Insurance Policy Renewals/Reminders, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in Swedish BFSI interactions. This diversity ensures the dataset accurately represents the language used by Swedish speakers in BFSI contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of Swedish personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Swedish-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Swedish forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Swedish BFSI conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Swedish BFSI interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of BFSI customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
•Closing and Follow-ups
•Feedback, etc
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
Data Format and Structure
<p
F
English Conversation Chat Dataset for Real Estate Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English Conversation Chat Dataset for Real Estate Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-realestate-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 12,000 chat conversations, each focusing on specific Real Estate related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 200+ native English participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on Real Estate topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Real Estate use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Chats:
•Property Inquiry
•Rental Property Search & Availability
•Renovation Inquiries
•Property Features & Amenities Inquiry
•Investment Property Analysis & Advice
•Property History & Ownership Details, and many more
•Outbound Chats:
•New Property Listing Update
•Post Purchase Follow-ups
•Investment Opportunities & Property Recommendations
•Property Value Updates
•Customer Satisfaction Surveys, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in English Real Estate interactions. This diversity ensures the dataset accurately represents the language used by English speakers in Real Estate contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of English personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different English-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in English forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in English Real Estate conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to English Real Estate interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Real Estate customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
•Closing and Follow-ups
<span
Heart Disease Health Indicators Dataset
kaggle.com
Updated Mar 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Teboul (2022). Heart Disease Health Indicators Dataset [Dataset]. https://www.kaggle.com/datasets/alexteboul/heart-disease-health-indicators-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alex Teboul
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Heart Disease is among the most prevalent chronic diseases in the United States, impacting millions of Americans each year and exerting a significant financial burden on the economy. In the United States alone, heart disease claims roughly 647,000 lives each year — making it the leading cause of death. The buildup of plaques inside larger coronary arteries, molecular changes associated with aging, chronic inflammation, high blood pressure, and diabetes are all causes of and risk factors for heart disease.

While there are different types of coronary heart disease, the majority of individuals only learn they have the disease following symptoms such as chest pain, a heart attack, or sudden cardiac arrest. This fact highlights the importance of preventative measures and tests that can accurately predict heart disease in the population prior to negative outcomes like myocardial infarctions (heart attacks) taking place.

The Centers for Disease Control and Prevention has identified high blood pressure, high blood cholesterol, and smoking as three key risk factors for heart disease. Roughly half of Americans have at least one of these three risk factors. The National Heart, Lung, and Blood Institute highlights a wider array of factors such as Age, Environment and Occupation, Family History and Genetics, Lifestyle Habits, Other Medical Conditions, Race or Ethnicity, and Sex for clinicians to use in diagnosing coronary heart disease. Diagnosis tends to be driven by an initial survey of these common risk factors followed by bloodwork and other tests.

Content

The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, I downloaded a csv of the dataset available on Kaggle for the year 2015. This original dataset contains responses from 441,455 individuals and has 330 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

This dataset contains 253,680 survey responses from cleaned BRFSS 2015 to be used primarily for the binary classification of heart disease. Not that there is strong class imbalance in this dataset. 229,787 respondents do not have/have not had heart disease while 23,893 have had heart disease. The question to be explored is:

1. To what extend can survey responses from the BRFSS be used for predicting heart disease risk?

and

2. Can a subset of questions from the BRFSS be used for preventative health screening for diseases like heart disease?

Acknowledgements

It it important to reiterate that I did not create this dataset, it is just a cleaned and consolidated dataset created from the BRFSS 2015 dataset already on Kaggle. That dataset can be found here and the notebook I used for the data cleaning can be found here.

Inspiration

Let's build some predictive models for for heart disease.
F
Norwegian Conversation Chat Dataset for Healthcare Domain
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Norwegian Conversation Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/norwegian-healthcare-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 150+ native Norwegian participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Chats:
•Appointment Scheduling
•New Patient Registration
•Surgery Consultation
•Consultation regarding Diet, and many more
•Outbound Chats:
•Appointment Reminder
•Health & Wellness Subscription Programs
•Lab Test Results
•Health Risk Assessments
•Preventive Care Reminders, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in Norwegian Healthcare interactions. This diversity ensures the dataset accurately represents the language used by Norwegian speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of Norwegian personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Norwegian-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Norwegian forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Norwegian Healthcare conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Norwegian Healthcare interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
•Closing and Follow-ups
•Feedback, etc
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
Data Format and Structure
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers
i
Why do we need crossing structures? An agent based modeling approach. -...
pre.iepnb.es
iepnb.es
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Why do we need crossing structures? An agent based modeling approach. - Dataset - CKAN [Dataset]. https://pre.iepnb.es/catalogo/dataset/why-do-we-need-crossing-structures-an-agent-based-modeling-approach
Explore at:
Dataset updated
Dec 2, 2024
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Road-kill and barrier effect are amongst the most important negative effects of roads. Mammalian carnivores may be particularly vulnerable to these effects given their typical longer dispersal distances and larger home range areas which increase the probability of individuals finding roads. Consequently, given their commonly low density and fecundity, high mortality rates and low connectivity may increase their vulnerability to local extinctions. However, there is virtually no data regarding the effects of road-killing and barrier effects on carnivores’ population persistence. We developed the REPoP model (Road Effects on Population Persistence), a spatial-dynamic agent based model that can be adjusted and parameterized to capture the specific life-history and landscape characteristics associated with a variety of species, to test for population persistence in roaded landscapes. Here we applied the model to stone marten (Martes foina), a mediterranean typically associated to well conserved agro-forestry areas, called montado. Recent research showed that this species although generalist and once abundant throughout their range, may be vulnerable to road mortality. We were interested in identifying which biological features – ‘reproduction success’ (60%, 70%) and ‘number of kits per litter’ (2, 3) -, and road-related characteristics – ‘road-kill probability’ (10%, 30%), ‘road-crossing avoidance’ (20%, 80%), ‘avoidance in settling territories in roaded areas’ (‘true’, ‘false’) -, may drive carnivore species to be more or less vulnerable to roads. We simulated 30 x 30 km landscapes with no roads and with one road (road density ca. 0.02 km.km-2). We assessed both population density and genetic differentiation through 150 year simulations. We then tested if upgrading roads with crossing passages (50% of road segments) together with decreasing the pavement access (simulating fencing) may overcome the effects on population size and genetic differentiation. Each scenario (n = 16) was repeated 15 times. Regarding population size several replicates in roaded landscapes experienced extinction. Passage implementation seemed to diminish the rate of extinction, but didn’t eliminate it completely. Linear Mixed Effects Models revealed that the ‘number of kits per litter’ had higher importance than reproduction success for population persistence in roaded landscapes. Likewise, ‘avoidance in settling territories in roaded areas’ had the highest importance among species-road features. As expected, ‘road-kill probability’ had a significant effect, with higher rates leading to lower population persistence probability. ‘Road-crossing avoidance’ had no effect in final results. As for genetic differentiation results, we found that roaded scenarios showed higher Fst values, significantly higher than roadless simulations. However, scenarios where roads were upgraded with passages showed a significant lower Fst values than simulations without passages. Our results clearly demonstrate that implementing crossing structures is necessary for mitigating road effects, but in some circumstances these measures are not sufficient to prevent population extinction and/or gene flow breakdown.
F
Spanish Conversation Chat Dataset for Real Estate Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish Conversation Chat Dataset for Real Estate Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/spanish-realestate-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 10,000 chat conversations, each focusing on specific Real Estate related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 150+ native Spanish participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on Real Estate topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Real Estate use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Chats:
•Property Inquiry
•Rental Property Search & Availability
•Renovation Inquiries
•Property Features & Amenities Inquiry
•Investment Property Analysis & Advice
•Property History & Ownership Details, and many more
•Outbound Chats:
•New Property Listing Update
•Post Purchase Follow-ups
•Investment Opportunities & Property Recommendations
•Property Value Updates
•Customer Satisfaction Surveys, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in Spanish Real Estate interactions. This diversity ensures the dataset accurately represents the language used by Spanish speakers in Real Estate contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of Spanish personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Spanish-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Spanish forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Spanish Real Estate conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Spanish Real Estate interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Real Estate customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
•Closing and Follow-ups
<span
F
English Conversation Chat Dataset for Healthcare Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
English Conversation Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-healthcare-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 12,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 200+ native English participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Chats:
•Appointment Scheduling
•New Patient Registration
•Surgery Consultation
•Consultation regarding Diet, and many more
•Outbound Chats:
•Appointment Reminder
•Health & Wellness Subscription Programs
•Lab Test Results
•Health Risk Assessments
•Preventive Care Reminders, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in English Healthcare interactions. This diversity ensures the dataset accurately represents the language used by English speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of English personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different English-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in English forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in English Healthcare conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to English Healthcare interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
•Closing and Follow-ups
•Feedback, etc
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
Data Format and Structure
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat
F
French Conversation Chat Dataset for Healthcare Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). French Conversation Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/french-healthcare-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Area covered
French
Dataset funded by
FutureBeeAI
Description
Introduction
The dataset comprises over 10,000 chat conversations, each focusing on specific Healthcare related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.
•
Participants Details: 150+ native French participants from the FutureBeeAI community.

•
Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

Topic Diversity
The chat dataset covers a wide range of conversations on Healthcare topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Healthcare use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.
•Inbound Chats:
•Appointment Scheduling
•New Patient Registration
•Surgery Consultation
•Consultation regarding Diet, and many more
•Outbound Chats:
•Appointment Reminder
•Health & Wellness Subscription Programs
•Lab Test Results
•Health Risk Assessments
•Preventive Care Reminders, and many more
Language Variety & Nuances
The conversations in this dataset capture the diverse language styles and expressions prevalent in French Healthcare interactions. This diversity ensures the dataset accurately represents the language used by French speakers in Healthcare contexts.
The dataset encompasses a wide array of language elements, including:
•
Naming Conventions: Chats include a variety of French personal and business names.

•
Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different French-speaking regions.

•
Temporal and Numeric Expressions: Dates, times, currencies, and numbers in French forms, adhering to local conventions.

•
Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in French Healthcare conversations.

This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to French Healthcare interactions.
Conversational Flow and Interaction Types
The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Healthcare customer-agent interactions.
•Simple Inquiries
•Detailed Discussions
•Transactional Interactions
•Problem-Solving Dialogues
•Advisory Sessions
•Routine Checks and Follow-Ups
Each of these conversations contains various aspects of conversation flow like:
•Greetings
•Authentication
•Information gathering
•Resolution identification
•Solution Delivery
•Closing and Follow-ups
•Feedback, etc
This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.
Data Format and Structure
The dataset is available in JSON, CSV, and TXT formats, with each conversation containing attributes like participant identifiers and chat messages,
d
The World through the Eyes of the People of Today (The World through the...
b2find.dkrz.de
Updated Sep 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). The World through the Eyes of the People of Today (The World through the Eyes of Soviet People) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/7d48e903-c21b-5bd4-9a9e-0850f529a4e4
Explore at:
Dataset updated
Sep 29, 2023
Area covered
Soviet Union
Description
Political attitudes of Soviet citizens. Questions on French-Soviet relations. Topics: judgement on selected government measures in connection with the 27th party convention of the Communist Party of the Soviet Union; attitude to ´Perestroika´; expected influence of ´Perestroika´ on increase in food prices; perception of drug addiction as a danger to the country; judgement on quality of television programs; knowledge about Sacharow; satisfaction with the achievements of the public health system; most significant historical and modern-day personality of the Soviet Union; attitude to the death penalty; expected influence of changes taking place in the USSR on the relationship to the West and the international situation; preferred type of music; assessment of the probability of another accident in a Soviet nuclear power plant as well as of the efforts of the Soviet Government to prevent further nuclear accidents; judgement on relations of the USSR to the USA, France, the Federal Republic of Germany, Great Britain, China and India; satisfaction with the status of the relations between the USSR and France and judgement on changes in these relations in the last year; judgement on economic, cultural and political cooperation between the USSR and France; area with the greatest progress in cooperation; spontaneous naming of three French words; naming preferred representatives of France; knowledge of selected events from French-Soviet history; country with the closest friendship to France; positive or negative judgement on the French people; spontaneous naming of persons associated with France; knowledge about the nationality of the space ship of the first flight of a French Cosmonaut; perceived threat to the Soviet Union or France from nuclear weapons as well as conventional, non-nuclear weapons of the respective other country; knowledge about the French language; use of French-language media and type of media used; judgement on the objectivity of information from French sources or Soviet media; perceived intervention of France in internal matters of the USSR; attitude to nuclear weapons as well as a nuclear or conventional conflict in Europe, an increase and modernization of the French nuclear arsenal, first use of nuclear weapons by the USSR or France as well as peaceful solution of European problems and contribution of abstaining from nuclear weapon tests to the reduction in the arms race; knowledge about the Berlin Wall as well as attitude to removal of the wall; attitude to removal of all nuclear weapons in Europe; preferred area of a pan-European cooperation; judgement on the military balance of powers between NATO and the Warsaw Pact; preferred travel countries outside of the East Bloc; probability of outbreak of a third world war; desire for a meeting between Gorbachev and Reagan as well as judgement on the chances for success of negotiations; perceived danger from a simultaneous reduction in Soviet and American medium-range missiles; judgement on progress in the area of military technology regarding greater security or additional danger of war; most important friend and greatest enemy of the Soviet Union. Demography: age (classified); sex; marital status; respondent has children; current education level; employment; institution at which respondent is studying (e.g. college, technical college, vocational technical school); occupational position; earlier participation in surveys; optimistic or pessimistic future expectations. Politische Einstellungen von Sowjet-Bürgern. Fragen zu den französisch-sowjetischen Beziehungen. Themen: Beurteilung ausgewählter Regierungsmaßnahmen im Anschluß an den 27. Parteitag der KPdSU; Einstellung zur "Perestroika"; vermuteter Einfluß der "Perestroika" auf die Erhöhung der Lebensmittelpreise; Wahrnehmung der Drogensucht als Gefahr für das Land; Beurteilung der Qualität der Fernsehprogramme; Kenntnisse über Sacharow; Zufriedenheit mit den Leistungen des Gesundheitswesens; bedeutendste historische und heutige Persönlichkeit der Sowjetunion; Einstellung zur Todesstrafe; vermuteter Einfluß der sich in der UdSSR vollziehenden Veränderungen auf das Verhältnis zum Westen und die internationale Lage; präferierte Musikrichtung; Einschätzung der Wahrscheinlichkeit eines erneuten Unfalls in einem sowjetischen Atomkraftwerk sowie der Bemühungen der Sowjetregierung zur Verhinderung weiterer Atomunfälle; Beurteilung der Beziehungen der UdSSR zu den USA, Frankreich, der Bundesrepublik Deutschland, Großbritannien, China und Indien; Zufriedenheit mit dem Stand der Beziehungen zwischen der UdSSR und Frankreich und Beurteilung der Veränderungen in diesen Beziehungen im letzten Jahr; Beurteilung der wirtschaftlichen, kulturellen und politischen Zusammenarbeit zwischen der UdSSR und Frankreich; Bereich mit der fortgeschrittensten Zusammenarbeit; spontane Nennung von drei französischen Wörtern; Nennung von präferierten Repräsentanten Frankreichs; Kenntnis ausgewählter Ereignisse aus der französisch-sowjetischen Geschichte; Land mit der engsten Freundschaft zu Frankreich; positive oder negative Beurteilung des französischen Volkes; spontane Nennung von Personen, die mit Frankreich assoziiert werden; Kenntnis der Nationalität des Raumschiffes beim ersten Flug eines französischen Kosmonauten; empfundene Bedrohung der Sowjetunion bzw. Frankreichs durch die Atomwaffen sowie die konventionellen, nicht-atomaren Waffen des jeweils anderen Landes; Kenntnis der französischen Sprache; Nutzung französischsprachiger Medien und Art der genutzten Medien; Beurteilung der Objektivität der Informationen aus französischen Quellen bzw. aus sowjetischen Medien; empfundene Einmischung Frankreichs in die inneren Angelegenheiten der UdSSR; Einstellung zu Atomwaffen sowie einem atomaren bzw. konventionellen Konflikt in Europa, einer Aufstockung und Modernisierung des französischen Atomarsenals, dem Ersteinsatz von Kernwaffen durch die UdSSR bzw. Frankreich sowie zur friedlichen Lösung europäischer Probleme und zum Beitrag des Verzichts von Kernwaffentests auf die Abschwächung des Wettrüstens; Kenntnis der Berliner Mauer sowie Einstellung zu einer Beseitigung der Mauer; Einstellung zur Beseitigung aller Kernwaffen in Europa; präferierter Bereich einer gesamteuropäischen Zusammenarbeit; Beurteilung des militärischen Kräfteverhältnisses zwischen der NATO und dem Warschauer Pakt; präferierte Reiseländer außerhalb des Ostblocks; Wahrscheinlichkeit des Ausbruchs eines Dritten Weltkriegs; Wunsch nach einem Treffen zwischen Gorbatschow und Reagan sowie Beurteilung der Erfolgschancen von Verhandlungen; wahrgenommene Gefahr durch einen gleichzeitigen Abbau sowjetischer und amerikanischer Mittelstreckenraketen; Beurteilung der Fortschritte auf dem Gebiet der Militärtechnik hinsichtlich größerer Sicherheit oder zusätzlicher Kriegsgefahr; wichtigster Freund und größter Feind der Sowjetunion. Demographie: Alter (klassiert); Geschlecht; Familienstand; Befragter hat Kinder; gegenwärtiger Bildungsstand; Erwerbstätigkeit; Institution, an der der Befragte lernt (z.B. Hochschule, Technikum, berufstechnische Schule); berufliche Position; frühere Teilnahme an Befragungen; optimistische oder pessimistische Zukunftserwartungen. Random selection. In Moscow they were obtained from telephone lists and in Indjavino from the voter list.
r
Psychophysiology of Positive and Negative Emotions (POPANE) – a dataset of...
sandbox.rohub.org
rdf, zip
Updated Apr 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Szymon KupiÅ?ski (2021). Psychophysiology of Positive and Negative Emotions (POPANE) – a dataset of over 1000 participants [Dataset]. http://sandbox.rohub.org/rodl/ROs/POPANE/
Explore at:
zip, rdfAvailable download formats
Dataset updated
Apr 28, 2021
Authors
Szymon KupiÅ?ski
Description
Subjective experience along with physiological activity are fundamental components of emotional responding. We present a publicly available dataset of psychophysiological responses to positive and negative emotions of 1157 healthy participants, collected across seven studies. In our studies were continuously recorded affect and physiological activity during resting baseline and emotional responding. We recorded physiological responses using electrocardiography (EKG), impedance cardiography (ICG), electrodermal activity (EDA), photoplethysmography (PPG, the blood pressure measures), respiratory, and temperature sensors. In our studies, we elicited emotions with films, pictures, speech preparation, and expressive writing. We studied a wide range of positive and negative emotions, including: amusement, anger, disgust, excitement, fear, gratitude, sadness, tenderness, and threat. To the best of our knowledge, Psychophysiology of Positive and Negative Emotions (POPANE) database is the largest, consistent psychophysiological dataset on emotions ever collected and publicly shared. We hope that POPANE will provide individuals, companies, and laboratories with the data they need to perform their own analyses, corroborate their results, and create robust psychophysiological models of emotions.
d
The Mitigating Role of Digital Communication Technologies on Negative Affect...
b2find.dkrz.de
Updated Apr 25, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). The Mitigating Role of Digital Communication Technologies on Negative Affect During the COVID-19 Outbreak in Italy (2020) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/6b2666aa-b901-5889-aa72-1b7d05358db0
Explore at:
Dataset updated
Apr 25, 2023
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Data refer to a study concerning the rule and the impact of digital communication technologies during the COVID-19 outbreak. The pandemic of COVID-19 has forced governments to impose restrictions (lockdown), and many people have suddenly found themselves having to reduce their social relations drastically and, therefore, to increase the use of digital communication technology. The study investigates how the increasing use of this technology for virtual meetings (i.e., voice and video calls, online board games and multiplayer video games, or watching movies in party mode) during the lockdown promoted the perception of social support, which in itself mitigated the psychological effects of the lockdown in Italy. Data were collected through a web survey during the lockdown imposed to reduce the COVID-19 spread. By adopting a snowball sampling technique, the participants were recruited through social media and instant messaging systems, by sending them a link to the web survey, and by asking them to forward the link to their contacts. The questionnaire included questions about the amount of the use of technology during and before the lockdown, asking the participants to report how many times a week they had used different tools to stay connected for leisure or work/school activities. The emotional state of the participants during the lockdown was assessed through the use of measures of loneliness, boredom, anxiety, anger, irritability and belongingness. 465 individuals. Convenience sampling. For further information see the attached documentation web-based self-administered questionnaire (CAWI)

Facebook

Twitter

Click to copy link

Link copied

Cite

opendata.maryland.gov (2022). MD COVID-19 - Number of Persons Tested Negative [Dataset]. https://catalog.data.gov/dataset/md-covid-19-number-of-persons-tested-negative

MD COVID-19 - Number of Persons Tested Negative

Explore at:

Dataset updated

Sep 2, 2022

Dataset provided by

opendata.maryland.gov

Area covered

Maryland

Description

NOTE: This layer is deprecated (last updated 2/16/2022). Was formerly a daily update. Summary The cumulative number of Maryland residents who tested negative for COVID-19. Description The MD COVID-19 - Number of Persons Tested Negative data layer is a collection of the number of people statewide who have tested negative for COVID-19 reported each day by each local health department via the NEDSS system. Terms of Use The Spatial Data, and the information therein, (collectively the "Data") is provided "as is" without warranty of any kind, either expressed, implied, or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted, nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct, indirect, incidental, consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data, nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.

Clear search

Close search

Google apps

Main menu

MD COVID-19 - Number of Persons Tested Negative

COVID-19 Daily Testing - By Person - Historical

COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE

Chicago Contracts

Content

Context

Acknowledgements

The Interpersonal Nature of Self-Talk Dataset

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

Temporary Foreign Worker Program Labour Market Impact Assessment Statistics...

Bahasa Conversation Chat Dataset for Travel Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

Data from: Dataset for the paper

Swedish Conversation Chat Dataset for BFSI Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

Data Format and Structure

English Conversation Chat Dataset for Real Estate Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

Heart Disease Health Indicators Dataset

Context

Content

1. To what extend can survey responses from the BRFSS be used for predicting heart disease risk?

2. Can a subset of questions from the BRFSS be used for preventative health screening for diseases like heart disease?

Acknowledgements

Inspiration

Norwegian Conversation Chat Dataset for Healthcare Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

Data Format and Structure

Why do we need crossing structures? An agent based modeling approach. -...

Spanish Conversation Chat Dataset for Real Estate Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

English Conversation Chat Dataset for Healthcare Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

Data Format and Structure

French Conversation Chat Dataset for Healthcare Domain

Introduction

Topic Diversity

Language Variety & Nuances

Conversational Flow and Interaction Types

Data Format and Structure

The World through the Eyes of the People of Today (The World through the...

Psychophysiology of Positive and Negative Emotions (POPANE) – a dataset of...

The Mitigating Role of Digital Communication Technologies on Negative Affect...

MD COVID-19 - Number of Persons Tested NegativeSee More Versions

MD COVID-19 - Number of Persons Tested Negative