Facebook
TwitterThis document presents the Concise Experiment Plan for NASA's Arctic-Boreal Vulnerability Experiment (ABoVE) to serve as a guide to the Program as it identifies the research to be conducted under this study. Research for ABoVE will link field-based, process-level studies with geospatial data products derived from airborne and satellite remote sensing, providing a foundation for improving the analysis and modeling capabilities needed to understand and predict ecosystem responses and societal implications. The ABoVE Concise Experiment Plan (ACEP) outlines the conceptual basis for the Field Campaign and expresses the compelling rationale explaining the scientific and societal importance of the study. It presents both the science questions driving ABoVE research as well as the top-level requirements for a study design to address them.
Facebook
TwitterThis dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.
This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.
The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.
Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.
For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.
This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.
Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.
Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).
1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
To assist our country's industrial innovation, the Intellectual Property Office of the Ministry of Economic Affairs, through its patent database, compiled the number of patent applications filed by legal entities and individuals in our country from 2007 to 2009, and conducted an in-depth analysis of patent application trends, including analysis by patent classification and analysis of six emerging industries: "green energy," "biotechnology," "medical care," "refined agriculture," "cultural creativity," and "tourism." Through the patent application status, analytical observations and concise charts, they have identified the products and market trends that the industries plan to launch in the next two to three years.
Facebook
TwitterThere has been an exponential increase of interest in the dark side of human nature during the last decade. To better understand this dark side, the authors developed and validated a concise, 12-item measure of the Dark Triad: narcissism, psychopathy, Machiavellianism. In 4 studies involving 1,085 participants, they examined its structural reliability, convergent and discriminant validity (Studies 1, 2, and 4), and test–retest reliability (Study 3). Their measure retained the flexibility needed to measure these 3 independent-yet-related constructs while improving its efficiency by reducing its item count by 87% (from 91 to 12 items). The measure retained its core of disagreeableness, short-term mating, and aggressiveness. They call this measure the Dirty Dozen, but it cleanly measures the Dark Triad.
It was constructed with several scales, this data covers two of them:
These two yielded the following items that were rated on a five point scale (1 labelled as "Disagree", 3 labelled as "Neutral", 5 labelled as "Agree"; 0 = missed)
HSNS1 I can become entirely absorbed in thinking about my personal affairs, my health, my cares or my relations to others. HSNS2 My feelings are easily hurt by ridicule or the slighting remarks of others. HSNS3 When I enter a room I often become self conscious and feel that the eyes of others are upon me. HSNS4 I dislike sharing the credit of an achievement with others. HSNS5 I feel that I have enough on my hands without worrying about other people's troubles. HSNS6 I feel that I am temperamentally different from most people. HSNS7 I often interpret the remarks of others in a personal way. HSNS8 I easily become wrapped up in my own interests and forget the existence of others. HSNS9 I dislike being with a group unless I know that I am appreciated by at least one of those present. HSNS10 I am secretly "put out" or annoyed when other people come to me with their troubles, asking me for my time and sympathy. DDM1 I tend to manipulate others to get my way. DDM2 I have used deceit or lied to get my way. DDM3 I have used flattery to get my way. DDM4 I tend to exploit others towards my own end. DDP1 I tend to lack remorse. DDP2 I tend to not be too concerned with morality or the morality of my actions. DDP3 I tend to be callous or insensitive. DDP4 I tend to be cynical. DDN1 I tend to want others to admire me. DDN2 I tend to want others to pay attention to me. DDN3 I tend to seek prestige or status. DDN4 I tend to expect special favors from others.
On the next page the following information was requested:
age entered as text gender chosen from list where 1=Male, 2=Female, 3=Other (0=missed) accuracy participants were instructed to rate how accurate their answers were about themselves on a scale from 0 (random) to 100 (tried their best), participants were instructed to answer 0 if they did not want their data used. Only participants who gave a positive non zero answer were retained.
And then: country The location of the participant. Determined with technical information. ISO country code.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset, titled "📚 Data Science Books: Goodreads," offers a curated snapshot of 185 data science-related books as cataloged on the Goodreads platform. It has been meticulously compiled to serve enthusiasts, professionals, and researchers interested in exploring the diverse applications of data science through literature. Each entry within this dataset not only reflects the community's engagement but also serves as a beacon for emerging trends and foundational knowledge in the field of data science.
Given its concise nature, this dataset is perfectly suited for:
The dataset is structured with the following key columns, each providing valuable insights:
authorName: The author(s) of the book, offering insights into prolific contributors in the data science literature space.bookFormat: The format of the book (e.g., paperback, ebook), useful for format preference studies.bookId: A unique identifier for each book on Goodreads, facilitating easy reference.description: A brief overview of the book's content, ripe for text analysis and NLP applications.numberOfPages: The total page count, offering a measure of the book's length.numberOfRatings: The total number of ratings the book has received on Goodreads, indicative of its popularity and engagement.numberOfReviews: The total number of reviews, providing a deeper look into reader engagement.publishedBy: The publishing entity, offering insights into the book's distribution and reach.publishedDate: The date the book was published, useful for temporal analysis and trend mapping.rating: The average Goodreads rating, key for performance and preference analysis.title: The title of the book, essential for identification and thematic analysis.This dataset has been ethically mined, adhering to best practices in data collection and respecting the terms of service of the Goodreads platform. It serves as a testament to responsible data handling and usage.
We extend our gratitude to the Goodreads platform for hosting a wealth of valuable data, enabling the creation of this dataset. Goodreads has proven to be an indispensable resource for literature enthusiasts and researchers alike, fostering a community that values knowledge sharing and discovery.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionTerritory view based on families’ vulnerability strata allows identifying different health needs that can guide healthcare at primary care scope. Despite the availability of tools designed to measure family vulnerability, there is still a need for substantial validity evidence, which limits the use of these tools in a country showing multiple socioeconomic and cultural realities, such as Brazil. The primary objective of this study is to develop and gather evidence on the validity of the Family Vulnerability Scale for Brazil, commonly referred to as EVFAM-BR (in Portuguese).MethodsItems were generated through exploratory qualitative study carried out by 123 health care professionals. The data collected supported the creation of 92 initial items, which were then evaluated by a panel of multi-regional and multi-disciplinary experts (n = 73) to calculate the Content Validity Ratio (CVR). This evaluation process resulted in a refined version of the scale, consisting of 38 items. Next, the scale was applied to 1,255 individuals to test the internal-structure validity by using the Exploratory Factor Analysis (EFA). Dimensionality was evaluated using Robust Parallel Analysis, and the model underwent cross-validation to determine the final version of EVFAM-BR.ResultsThis final version consists of 14 items that are categorized into four dimensions, accounting for an explained variance of 79.02%. All indicators were within adequate and satisfactory limits, without any cross-loading or Heywood Case issues. Reliability indices also reached adequate levels (α = 0.71; ω = 0.70; glb = 0.83 and ORION ranging from 0.80 to 0.93, between domains). The instrument scores underwent a normalization process, revealing three distinct vulnerability strata: low (0 to 4), moderate (5 to 6), and high (7 to 14).ConclusionThe scale exhibited satisfactory validity evidence, demonstrating consistency, reliability, and robustness. It resulted in a concise instrument that effectively measures and distinguishes levels of family vulnerability within the primary care setting in Brazil.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Dive into the vibrant world of gaming culture with the "Amazon Best Sellers 2024: Gaming Merchandise 🎮" dataset. This concise yet insightful collection features around 300 top-rated gaming merchandise items from Amazon, spanning a variety of categories from innovative gaming gear to quirky gamer-centric novelties. It's a snapshot of what's trending in the gaming community, offering a peek into the preferences of gamers in 2024.
Given its focused nature, this dataset is ideal for targeted analyses, such as: - Trend Identification: Pinpoint popular themes and items within the gaming merchandise space, identifying what resonates with the gaming community. - Brand Analysis: Assess which brands are leading in the gaming merchandise sector and how brand presence correlates with customer ratings. - Consumer Sentiment Analysis: Perform a qualitative analysis of the reviews and ratings to understand consumer satisfaction levels and common feedback themes.
title: The product name, reflecting the variety and creativity in gaming merchandise.brand: The producing brand, indicating the diversity in manufacturers catering to the gaming market.description: Detailed product descriptions (where available), providing deeper insights into the product features.stars & reviewsCount: Reflective of consumer opinions, these columns offer a quantitative measure of product popularity and quality.price/currency & price/value: While pricing data might be sparse, available entries can shed light on the pricing landscape for gaming merchandise.breadCrumbs: Product categorization data, useful for understanding how items are grouped on Amazon.asin: The unique identifier for each product, facilitating further individual product research.url: Direct links to the product pages, allowing for easy access to additional product details.This dataset aggregates publicly available information from Amazon, compiled with a commitment to ethical data practices, respecting user privacy and Amazon's data policies.
A heartfelt thank you to Amazon for being a treasure trove of data and a hub for gaming enthusiasts worldwide. This dataset celebrates the rich tapestry of gaming merchandise available on their platform.
Special thanks for the thumbnail image inspiration goes to an intriguing product listed on Amazon. For more details, visit the Geek Alerts Game Controller LED Backlight Decoration on Amazon.
Facebook
TwitterThe Core Welfare Indicators Questionnaire (CWIQ) currently constitutes one of the largest socio-economic household survey databases on Tanzania. Since 2003 EDI has interviewed roughly 20,000 households in 35 different districts. For 9 districts repeat surveys were organised to track changes over time.
Rationale: Absence of district level survey data does not rhyme with the devolution of power to districts. Tanzania is undergoing a decentralisation process whereby each of its roughly 128 districts is becoming an increasingly important policy actor. A district taking on this challenge needs accurate information to monitor and develop its own policies. Much relevant information is currently not available as national statistics are not representative at district level and many of the routine data collection mechanisms are still under development. CWIQ then provides an attractive, one-stop survey-based method to collect basic development indicators. Furthermore, the survey results can be disseminated - through Swahili briefs and posters - to a district's population; thus increasing the extent to which people are able to hold their local governments accountable. Exciting new ground is being broken on such population-wide dissemination by the Prime Minister's Office.
Methodology: The data are collected through a small 10-page questionnaire, called the Core Welfare Indicators Questionnaire (CWIQ). The questionnaire and data software constitute an off-the-shelf survey package developed by the World Bank to produce standardised monitoring indicators of welfare. The questionnaire is purposively concise and is designed to collect information on household demographics, employment, education, health and nutrition as well as utilisation and satisfaction with social services. Questionnaires are scannable, with interviewers shading bubbles and writing numbers later recognised by the scanning software. The data system is fully automated allowing the results to roll out within weeks of the fieldwork.
Funding: projects are typically funded by organisations that care about making decentralisation work in Tanzania. CWIQ is a method to promote evidence-based policy formulation and debate in the district and a tool for the population to hold their local governments accountable. With funding from the RNE (Royal Netherlands Embassy) and SNV (Stichting Nederlands Vrijwilligers), CWIQ surveys were implemented between 2003-2005 in 16 districts. In 2006/07 PMO-RALG (Prime Minister's Office - Regional Administration and Local Government) commissioned EDI to cover a further 28 districts. In 9 of these districts this constituted a repeat survey and thus a unique opportunity arises to monitor changes that occurred in the district over this time period.
Dissemination: EDI disseminated the results of CWIQ on posters and briefs to district level stakeholders (councillors, district officials, NGOs, CBOs, Advocacy Groups, MPs, 'interested citizens', etc.), with the aim at district level, to: (i) promote evidence-based policy debate, (ii) promote evidence-based policy formulation, (iii) provide tools for district level M&E and (iv) increase accountability of LGA to citizens.
Subnational
Sample survey data [ssd]
The CWIQ surveys were sampled to be representative at district level. Data from the 2002 Census was used to put together a list of all villages in each district. In the first stage of the sampling process villages were chosen proportional to their population size. In a second stage the subvillage (kitongoji) was chosen within the village through simple random sampling. In the selected sub-village (also referred to as cluster or enumeration area), all households were listed and 15 households were randomly selected. In total 450 households in 30 clusters were visited. All households were given statistical weights reflecting the number of households that they represent.
Face-to-face [f2f]
CWIQ is an off-the-shelf survey package developed by the World Bank to produce standardised monitoring indicators of welfare. The questionnaire is purposively concise and is designed to collect information on household demographics, employment, education, health and nutrition, as well as utilisation of and satisfaction with social services. An extra section on governance and satisfaction with people in public office was added specifically for this survey.
The standardised nature of the questionnaire allows comparison between districts and regions within and across countries, as well as monitoring change in a district or region over time.
The 2006/7 questionnaire is in Swahili, but it closely follows the 2000 generic CWIQ questionnaire, which is included in external resources, and all variables and values are labeled in English.
The data entry was done by scanning the questionnaires, to minimise data entry errors and thus ensure high quality in the final dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description:
The "Global News Articles" dataset was acquired through the NewsAPI, a powerful tool that provides access to a vast collection of news articles from various sources around the world. The dataset contains a curated selection of news articles covering a wide range of topics, including politics, business, technology, health, and more.
Context:
In today's fast-paced world, staying informed about global events is essential. This dataset aims to provide researchers, journalists, and analysts with a comprehensive source of news articles for analysis and insight generation. By leveraging the NewsAPI, we have gathered a diverse set of articles to facilitate research, trend analysis, sentiment analysis, and other data-driven tasks.
Inspiration:
The inspiration behind creating this dataset stems from the growing need for reliable and easily accessible news data for analytical purposes. With the proliferation of digital media and the abundance of news sources available online, there is a wealth of information waiting to be tapped into. This dataset serves as a valuable resource for anyone interested in studying trends, patterns, and developments in the global news landscape.
Sources:
The primary source of the data is the NewsAPI, which aggregates news articles from thousands of sources worldwide. The dataset includes articles from reputable news outlets, blogs, and online publications. Only the title, content, and headlines features have been extracted from the articles to provide concise yet informative data for analysis.
Acquisition of Data through NewsAPI:
By leveraging the capabilities of NewsAPI, we have curated a valuable dataset that provides insights into global news trends, enabling informed decision-making and analysis in diverse fields.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset on 460,452 individuals employed by the Dutch East India Company (Verenigde Oost-Indische Compagnie, VOC) in the seventeenth and especially the eighteenth centuries, developed from 774,200 muster records in the ‘VOC-opvarenden’ collection. The original data has been enhanced through the disambiguation of individual records, the standardization of 44,152 unique place names, and the addition of wage details and rank structure.
This collection includes the original ‘VOC-opvarenden’ dataset (comprising three files), enriched files (totaling nine), integrated external data, and Jupyter notebooks documenting the transformation from original to enriched datasets. The accompanying data paper provides an in-depth overview of the original dataset, the enhancement process, and potential applications. Additionally, it features appendices serving as codebooks, offering concise descriptions of each variable present in the enriched data files.
Enabling research into career patterns, network structures, and migration trends, this resource is of significant value to the study of early modern history, social and economic history, and sociology.
Facebook
TwitterA collection of concise summaries of scientific articles that link back to the physical article. Users can contribute resources to the site and may choose to provide a summary; if they do not, Useful Science will do so. Summaries are then tagged with the appropriate topic. Topics that relate to everday life such as health, fitness, nutrition, and sleep are the main focus of the site.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full dataset: Factor loading, communality, and Eta.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Training dataset: Factorial loads, communality, and Eta.
Facebook
TwitterThis dataset contains different people from Germany and different variables with specific values. The concise features allow for an interesting analysis. I collected this data when I studied in Germany two semesters political science. It was interesting for me to see the differences between income or living space to respect to the former division.
D_Nr - ID number Gender - 1 female/ 0 male Height in meters BMI - Body mass Index Grade - Highschool final grade Age - Age Federal State - The federal state the person lives Income - Monthly income after taxes in € Former GDR or FRG - Federal state was former: GDR/East: 1 or FRG/West: 0 Living space - Living space in sqm
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis document presents the Concise Experiment Plan for NASA's Arctic-Boreal Vulnerability Experiment (ABoVE) to serve as a guide to the Program as it identifies the research to be conducted under this study. Research for ABoVE will link field-based, process-level studies with geospatial data products derived from airborne and satellite remote sensing, providing a foundation for improving the analysis and modeling capabilities needed to understand and predict ecosystem responses and societal implications. The ABoVE Concise Experiment Plan (ACEP) outlines the conceptual basis for the Field Campaign and expresses the compelling rationale explaining the scientific and societal importance of the study. It presents both the science questions driving ABoVE research as well as the top-level requirements for a study design to address them.