Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Are Suggestions about Coupled Changes Interesting? – Software Repository Mining Case Study
Survey questions
Interview questions
Grounded theory open coding
Analysis results
questionnaire results
Facebook
TwitterThe study in three coal mining regions: Lower Silesia, Upper Silesia and Lublin (each N=500) was conducted using Computer Assisted Web Interview (CAWI). The questionnaire includes the block of questions concerning mine water awareness, climate change and local/place attachment. The survey online took 15 to 20 minutes and was prepared after in-depth pilot research among participants with different education level from the mining regions. We used the uninformed approach to the survey, so there were no additional questions nor requirements for participants prior to the survey. Since the mine water energy extraction is a technical issue that is neither well known nor commonly used in the narratives of Poles, we tested survey questions with pilot cognitive interviews to remove the technical language and reduce the number of replies without understanding. The interviews were conducted with 10 participants in July 2020 and due to the pilot's recommendations and results, we implemented additional changes in the final version of the questionnaire. Specifically, some questions were simplified and the background information on mine water extraction was simplified and shortened The survey CAWI was completed by adult people aged 18-65 (N=1500) between 14-19 August 2020 by Kantar Research Agency. The sample was constructed using KANTAR’s internet panel profiled for the basic demographics, such as gender, age, and the town size. Particular attention paid to the quality of the panel is reflected in its structure. Kantar’s internet panel reflects the profile of the Polish population of Internet users in terms of its participants’ demographic characteristics. The sample from each region was 500 respondents and among the full sample (N=1500) we reached only 192 people who chose to call “mining areas” as best description of the area where they live. Although the three voivodships were chosen due to its mining industry the selected sample covers the region in general in which mining communities are statically not fully represented. We also asked about the subjective perception of the area respondents live in, which we further analysed with spatial distribution. The dataset was created within SECURe project (Subsurface Evaluation of CCS and Unconventional Risks) - https://www.securegeoenergy.eu/. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 764531
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundRare diseases (RD) result in a wide variety of clinical presentations, and this creates a significant diagnostic challenge for health care professionals. We hypothesized that there exist a set of consistent and shared phenomena among all individuals affected by (different) RD during the time before diagnosis is established.ObjectiveWe aimed to identify commonalities between different RD and developed a machine learning diagnostic support tool for RD.Methods20 interviews with affected individuals with different RD, focusing on the time period before their diagnosis, were performed and qualitatively analyzed. Out of these pre-diagnostic experiences, we distilled key phenomena and created a questionnaire which was then distributed among individuals with the established diagnosis of i.) RD, ii.) other common non-rare diseases (NRO) iii.) common chronic diseases (CD), iv.), or psychosomatic/somatoform disorders (PSY). Finally, four combined single machine learning methods and a fusion algorithm were used to distinguish the different answer patterns of the questionnaires.ResultsThe questionnaire contained 53 questions. A total sum of 1763 questionnaires (758 RD, 149 CD, 48 PSY, 200 NRO, 34 healthy individuals and 574 not evaluable questionnaires) were collected. Based on 3 independent data sets the 10-fold stratified cross-validation method for the answer-pattern recognition resulted in sensitivity values of 88.9% to detect the answer pattern of a RD, 86.6% for NRO, 87.7% for CD and 84.2% for PSY.ConclusionDespite the great diversity in presentation and pathogenesis of each RD, patients with RD share surprisingly similar pre-diagnosis experiences. Our questionnaire and data-mining based approach successfully detected unique patterns in groups of individuals affected by a broad range of different rare diseases. Therefore, these results indicate distinct patterns that may be used for diagnostic support in RD.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper outlines the compilation and annotation process of the CORHOH: Text Corpus of Holocaust Oral Histories. The corpus consists of 500 oral histories from Holocaust survivors, with each narrative retrieved from the Let Them Speak Project (Toth 2021). The text is processed and annotated with metadata detailing both the testimony givers and the interviews themselves. All technical content has been removed, and a unique identifier has been assigned to each question (posed by the interviewer) and answer (provided by the survivor). The corpus complies with TEI guidelines (TEI Consortium 2023). The dataset includes 106,519 questions and 107,125 answers, making it a valuable interdisciplinary resource. Researchers can retrieve and analyse questions and answers separately based on their specific research objectives. This corpus is particularly suited for studies on trauma expression and psychological concepts embedded in survivors' narratives. Additionally, it offers potential for data mining to uncover patterns (e.g., migration trends) and supports natural language processing techniques such as topic modelling, sentiment analysis, and named entity recognition. The CORHOH data is sourced from the United States Holocaust Memorial Museum (USHMM) and is publicly available under the CC BY-NC-SA 4.0 license.
Facebook
TwitterThis research was carried out in Lao PDR between May and October 2012 as a joint Enterprise Survey and Skills Toward Employment and Productivity (STEP) survey, and included a large panel component based on the 2009 data collection efforts.
The objective of Enterprise Surveys is to obtain feedback from businesses on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.
For Lao PDR 2012 study additional interviews were conducted in the following sectors: mining and quarrying, electricity, gas and water supply, financial intermediation, real estate, and education. The observations collected in these sectors were not used to compute indicators shown on the Enterprise Surveys website (www.enterprisesurveys.org) as they are not comparable to other countries surveyed.
Vientiane Capital, Champasack, Luang Prabang, Luang Namtha, Khammouane, and Savannakhet.
The primary sampling unit of the study is an establishment.The establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
In addition to the sectors common to the global methodology for the Enterprise Survey, additional interviews were conducted in the following sectors: Mining and Quarrying (group C), Electricity, gas and water supply (group E), Financial intermediation (group J), Real estate (group K), and Education (group M).
Sample survey data [ssd]
The sample for Lao PDR was selected using stratified random sampling. Three levels of stratification were used in this country: industry, establishment size, and region.
Industry stratification was designed in the way that follows: the universe was stratified into 23 manufacturing industries, 2 services industries; retail, and other services as defined in the sampling manual. Additional stratification took place in the following sectors: mining and quarrying (group C), electricity, gas and water supply (group E), financial intermediation (group J), real estate (group K), and education (group M).
Size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.
Regional stratification was defined in six regions: Vientiane Capital, Champasack, Luang Prabang, Luang Namtha, Khammouane, and Savannakhet.
One frame was used for Lao PDR. The sample frame used in Lao PDR was obtained from the "Preliminary Survey in the Business Sector" (2008), maintained by the National Statistic Centre, Department of Statistics under the Ministry of Planning and Investment, Government of Lao PDR. This listing was updated by the Department of Statistics in 2012 as part of the implementation of this survey.
The enumerated establishments were then used as the frame for the selection of a sample with the aim of obtaining interviews at 380 establishments with five or more employees.
The quality of the frame was assessed at the onset of the project through calls to a random subset of firms and local contractor knowledge. The sample frame was not immune from the typical problems found in establishment surveys: positive rates of non-eligibility, repetition, non-existent units, etc. Due to response rate and ineligibility issues, additional sample had to be extracted by DCS and the World Bank in order to obtain enough eligible contacts and meet the sample targets.
Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 14% (116 out of 830 establishments).
Face-to-face [f2f]
Only one questionnaire was used for all sectors. This questionnaire had two versions: one for manufacturing and one for services firms. This questionnaire was also split into two sections with one containing the standard Enterprise Survey questions and the second containing the STEP skills, training, and education questions.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country’s business environment. The remaining questions assess the survey respondents’ opinions on what are the obstacles to firm growth and performance.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
The number of contacted establishments per realized interview was 2.27. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 0.043.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as a different option from don’t know. b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.
Facebook
Twitter_Title: Life Strategies of Workers in Poland’s New CapitalismDescription:The presented dataset includes 174 biographical interviews with manual workers, gathered in 2001–2004 by Adam Mrozowicki himself and as part of his students’ field practice at the Institute of Sociology, University of Wrocław. The main data collection method was a biographical narrative interview (modelled after Fritz Schütze). Each respondent was asked to tell the whole story of his/her life, from childhood until the moment of the interview. Subsequently, the respondents were asked to answer additional questions related to unclear or concise parts of the narratives. The last part focused on answers to questions related to the thematic sections analysed in the project in case the answers were not provided in the biographical sections. These questions were related to the social background, career, subjectively perceived living standard, leisure activities, involvement in non-governmental organisations (including trade unions), class identity, political orientations and perception of the transformations after 1989. The interviews are varied in quality: about 60 of them can be described as quality narrative interviews (high level of speech indexability, expanded part one of the interview), whereas others can be better described as in-depth interviews with a biographical component. The research was carried out in Lower Silesia (primarily in Wrocław), Upper Silesia (primarily in the Upper Silesian agglomeration), the Cieszyn Silesia region (mainly in Cieszyn) and the Opole Silesia region (the whole area of the Opole Province). Workers were defined as hired labourers performing physical or mental-and-physical work, with limited control over their work process. Moreover, narratives from a limited number of foremen and masters were also collected (22 cases). The sample had a prevalence of men (131 cases), skilled workers (95 cases), industrial workers (107 interviews), people employed in mining (16 interviews) and, to a lesser extent, in the services sector (35 interviews) and construction (15 interviews). The collected narratives represent three basic age cohorts of workers: (a) 35 years or less at the time of the interview – 57 cases; (b) 36–49 years – 94 cases; (c) 50 or more years – 23 cases.So far (until 2013), the interviewes have been used for analysing life strategies, the workers’ ethos and resources, identities of trade union activists and the collective memory of workers. However, the presented data clearly have a considerable potential to be used for other types of social analysis. Among other things, the data could be used for research on the regional identity in Silesia, the class identity of workers, workers’ lifestyles and political awareness, as well as the analysis of the early stages of precarization of labour (interviews in the services sector) and research on selected occupational categories of workers (e.g. in mining, metallurgy or services).All interviews were recorded entirely on tapes. The interviews lasted between 30 minutes and 1 hour (the shortest ones, representing about 20% of the total database), 1–1.5 hours (about 60%), and over 1.5 hours (maximum 4 hours) (about 20% of the database). The audio recordings, however, have been preserved only for 50 interviews, primarily from the Wrocław part of the research (2001–2002) and research carried out in Upper Silesia, in the Dąbrowa Basin and Cieszyn Silesia region (2003). Transcripts of almost all interviews are available in electronic format. The student transcripts have been partially cleaned in terms of spelling. The transcripts also include: (1) the basic information about the respondents’ employers (legal form, employment); (2) information about the context of interviewing. For some cases, summaries of the most important life events were prepared in the form of ‘biographical portraits’. In addition to the interviews in Wrocław (2001–2002), interviews were not anonymised at the stage of transcription. Anonymisation of basic data was carried out in these cases at the time of archiving.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Borax use among the Artisanal small scale gold miners (follow-up questions used).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Are Suggestions about Coupled Changes Interesting? – Software Repository Mining Case Study
Survey questions
Interview questions
Grounded theory open coding
Analysis results
questionnaire results