100+ datasets found

Main challenges affecting data analytics for CX in the U.S. 2021
statista.com
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Main challenges affecting data analytics for CX in the U.S. 2021 [Dataset]. https://www.statista.com/statistics/1196851/main-challenges-affecting-data-analytics-for-cx-in-the-us/
Explore at:
Dataset updated
Dec 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2021 - Jun 2021
Area covered
United States
Description
According to the results of a survey on customer experience (CX) among businesses conducted in the United States in 2021, the main challenge affecting data analysis capability for CX is the lack of reliability and integrity of available data. Data security followed, being chosen by almost 46 percent of the respondents.
Top challenges of merging linear TV and digital campaign data in the U.S....
statista.com
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Top challenges of merging linear TV and digital campaign data in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/1401528/leading-challenges-merging-linear-tv-digital-campaign-data-us/
Explore at:
Dataset updated
Dec 6, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
During a survey conducted among TV marketers in the United States and released in May 2023, the main challenge of merging linear and digital data was identified by 53 percent of respondents with the lack of common metrics across channels. The creation of a holistic framework for planning and measurement was mentioned by 41 percent of respondents, while 40 percent cited data-sharing restrictions by walled gardens.
Challenges to health data sharing in the U.S. in 2020, by payers and...
statista.com
Updated Jul 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Challenges to health data sharing in the U.S. in 2020, by payers and providers [Dataset]. https://www.statista.com/statistics/1314771/barriers-to-health-data-sharing-in-the-us-by-healthcare-actor/
Explore at:
Dataset updated
Jul 5, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2020, 54 percent of healthcare providers and 50 percent of healthcare payers surveyed in the United States indicated that lack of technical interoperability was the biggest challenge around health data sharing. Among 52 percent of providers, noted that timeliness of data that is shared was a challenge, in comparison only 21 percent of payers shared the same concern.
Large Scale International Boundaries
catalog.data.gov
geodata.state.gov
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (Point of Contact) (2025). Large Scale International Boundaries [Dataset]. https://catalog.data.gov/dataset/large-scale-international-boundaries
Explore at:
Dataset updated
Feb 28, 2025
Dataset provided by
United States Department of Statehttp://state.gov/
Description
Overview The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control. National Geospatial Data Asset This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee. Dataset Source Details Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground. Cartographic Visualization The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below. Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://hiu.state.gov/data/cartographic_guidance_bulletins/ Contact Direct inquiries to internationalboundaries@state.gov. Direct download: https://data.geodata.state.gov/LSIB.zip Attribute Structure The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB. Core Attributes The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields. County Code and Country Name Fields “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard. The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user. Descriptive Fields The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line. A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively. The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps. The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line. Use of Core Attributes in Cartographic Visualization Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: - International Boundaries (Rank 1); - Other Lines of International Separation (Rank 2); and - Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction. The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling. Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic labeling purposes is prohibited. Extension Attributes Certain elements of the attributes within the LSIB dataset extend data functionality to make the data more interoperable or to provide clearer linkages to other datasets. The fields “CC1_GENC3” and “CC2_GENC” contain the corresponding three-character GENC code to the “CC1” and “CC2” attributes. The code “QX2” is the three-character counterpart of the code “Q2,” which denotes a line in the LSIB representing a boundary associated with a geographic area not contained within the GENC standard. To allow for linkage between individual lines in the LSIB and World Polygons dataset, the “CC1_WPID” and “CC2_WPID” fields contain a Universally Unique Identifier (UUID), version 4, which provides a stable description of each geographic entity in a boundary pair relationship. Each UUID corresponds to a geographic entity listed in the World Polygons dataset. These fields allow for linkage between individual lines in the LSIB and the overall World Polygons dataset. Five additional fields in the LSIB expand on the UUID concept and either describe features that have changed across space and time or indicate relationships between previous versions of the feature. The “LSIB_ID” attribute is a UUID value that defines a specific instance of a feature. Any change to the feature in a lineset requires a new “LSIB_ID.” The “ANTECIDS,” or antecedent ID, is a UUID that references line geometries from which a given line is descended in time. It is used when there is a feature that is entirely new, not when there is a new version of a previous feature. This is generally used to reference countries that have dissolved. The “PREVIDS,” or Previous ID, is a UUID field that contains old versions of a line. This is an additive field, that houses all Previous IDs. A new version of a feature is defined by any change to the feature—either line geometry or attribute—but it is still conceptually the same feature. The “PARENTID” field
Large Scale International Boundaries (LSIB)
data.amerigeoss.org
shp
Updated Jan 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UN Humanitarian Data Exchange (2024). Large Scale International Boundaries (LSIB) [Dataset]. https://data.amerigeoss.org/dataset/large-scale-international-boundaries-lsib
Explore at:
shp(46321649)Available download formats
Dataset updated
Jan 17, 2024
Dataset provided by
United Nationshttp://un.org/
Description
Large Scale International Boundaries

Version 11.1 Release Date: August 22, 2022

Overview

The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. These data and their derivatives are the only international boundary lines approved for U.S. Government use. They reflect U.S. Government policy, and not necessarily de facto limits of control. This dataset is a National Geospatial Data Asset.

Details

Sources for these data include treaties, relevant maps, and data from boundary commissions and national mapping agencies. Where available, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery of the data involves analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground.

Attributes

The dataset uses the following attributes: Attribute Name Explanation Country Code Country-level codes are from the Geopolitical Entities, Names, and Codes Standard (GENC). The Q2 code denotes a line representing a boundary associated with an area not in GENC. Country Names Names approved by the U.S. Board on Geographic Names (BGN). Names for lines associated with a Q2 code are descriptive and are not necessarily BGN-approved. Label Required text label for the line segment where scale permits Rank/Status Rank 1: International Boundary Rank 2: Other Line of International Separation Rank 3: Special Line Notes Explanation of any applicable special circumstances Cartographic Usage Depiction of the LSIB requires a visual differentiation between the three categories of boundaries: International Boundaries (Rank 1), Other Lines of International Separation (Rank 2), and Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Additional cartographic information can be found in Guidance Bulletins (https://hiu.state.gov/data/cartographic_guidance_bulletins/) published by the Office of the Geographer and Global Issues. Please direct inquiries to internationalboundaries@state.gov.

Credits

The lines in the LSIB dataset are the product of decades of collaboration between geographers at the Department of State and the National Geospatial-Intelligence Agency with contributions from the Central Intelligence Agency and the UK Defence Geographic Centre. Attribution is welcome: U.S. Department of State, Office of the Geographer and Global Issues.

Changes from Prior Release

This version of the LSIB contains changes and accuracy refinements for the following line segments. These changes reflect improvements in spatial accuracy derived from newly available source materials, an ongoing review process, or the publication of new treaties or agreements. Changes to lines include: • Akrotiri (UK) / Cyprus • Albania / Montenegro • Albania / Greece • Albania / North Macedonia • Armenia / Turkey • Austria / Czechia • Austria / Slovakia • Austria / Hungary • Austria / Slovenia • Austria / Germany • Austria / Italy • Austria / Switzerland • Azerbaijan / Turkey • Azerbaijan / Iran • Belarus / Latvia • Belarus / Russia • Belarus / Ukraine • Belarus / Poland • Bhutan / India • Bhutan / China • Bulgaria / Turkey • Bulgaria / Romania • Bulgaria / Serbia • Bulgaria / Romania • China / Tajikistan • China / India • Croatia / Slovenia • Croatia / Hungary • Croatia / Serbia • Croatia / Montenegro • Czechia / Slovakia • Czechia / Poland • Czechia / Germany • Finland / Russia • Finland / Norway • Finland / Sweden • France / Italy • Georgia / Turkey • Germany / Poland • Germany / Switzerland • Greece / North Macedonia • Guyana / Suriname • Hungary / Slovenia • Hungary / Serbia • Hungary / Romania • Hungary / Ukraine • Iran / Turkey • Iraq / Turkey • Italy / Slovenia • Italy / Switzerland • Italy / Vatican City • Italy / San Marino • Kazakhstan / Russia • Kazakhstan / Uzbekistan • Kosovo / north Macedonia • Kosovo / Serbia • Kyrgyzstan / Tajikistan • Kyrgyzstan / Uzbekistan • Latvia / Russia • Latvia / Lithuania • Lithuania / Poland • Lithuania / Russia • Moldova / Ukraine • Moldova / Romania • Norway / Russia • Norway / Sweden • Poland / Russia • Poland / Ukraine • Poland / Slovakia • Romania / Ukraine • Romania / Serbia • Russia / Ukraine • Syria / Turkey • Tajikistan / Uzbekistan

This release also contains topology fixes, land boundary terminus refinements, and tripoint adjustments.

Copyright Notice and Disclaimer

While U.S. Government works prepared by employees of the U.S. Government as part of their official duties are not subject to Federal copyright protection (see 17 U.S.C. § 105), copyrighted material incorporated in U.S. Government works retains its copyright protection. The works on or made available through download from the U.S. Department of State’s website may not be used in any manner that infringes any intellectual property rights or other proprietary rights held by any third party. Use of any copyrighted material beyond what is allowed by fair use or other exemptions may require appropriate permission from the relevant rightsholder. With respect to works on or made available through download from the U.S. Department of State’s website, neither the U.S. Government nor any of its agencies, employees, agents, or contractors make any representations or warranties—express, implied, or statutory—as to the validity, accuracy, completeness, or fitness for a particular purpose; nor represent that use of such works would not infringe privately owned rights; nor assume any liability resulting from use of such works; and shall in no way be liable for any costs, expenses, claims, or demands arising out of use of such works.
U
United States SBOI: sa: Most Pressing Problem: Survey High: Competit'n frm...
ceicdata.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, United States SBOI: sa: Most Pressing Problem: Survey High: Competit'n frm Big Bus [Dataset]. https://www.ceicdata.com/en/united-states/nfib-index-of-small-business-optimism/sboi-sa-most-pressing-problem-survey-high-competitn-frm-big-bus
Explore at:
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 1, 2024 - Feb 1, 2025
Area covered
United States
Variables measured
Business Confidence Survey
Description
United States SBOI: sa: Most Pressing Problem: Survey High: Competit'n frm Big Bus data was reported at 14.000 % in Feb 2025. This stayed constant from the previous number of 14.000 % for Jan 2025. United States SBOI: sa: Most Pressing Problem: Survey High: Competit'n frm Big Bus data is updated monthly, averaging 14.000 % from Jan 2014 (Median) to Feb 2025, with 130 observations. The data reached an all-time high of 14.000 % in Feb 2025 and a record low of 14.000 % in Feb 2025. United States SBOI: sa: Most Pressing Problem: Survey High: Competit'n frm Big Bus data remains active status in CEIC and is reported by National Federation of Independent Business. The data is categorized under Global Database’s United States – Table US.S032: NFIB Index of Small Business Optimism. [COVID-19-IMPACT]
g
Coronavirus (Covid-19) Data in the United States
github.com
openicpsr.org
+3more
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
Explore at:
csvAvailable download formats
Dataset provided by
New York Times
License
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
T
Final Report of the Asian American Quality of Life (AAQoL)
datahub.austintexas.gov
data.austintexas.gov
+5more
application/rdfxml +5
Updated Jul 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Final Report of the Asian American Quality of Life (AAQoL) [Dataset]. https://datahub.austintexas.gov/dataset/Final-Report-of-the-Asian-American-Quality-of-Life/hc5t-p62z
Explore at:
xml, csv, application/rssxml, json, tsv, application/rdfxmlAvailable download formats
Dataset updated
Jul 12, 2018
Dataset authored and provided by
City of Austin, Texas - data.austintexas.gov
Area covered
Asia
Description
The U.S. Census defines Asian Americans as individuals having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent (U.S. Office of Management and Budget, 1997). As a broad racial category, Asian Americans are the fastest-growing minority group in the United States (U.S. Census Bureau, 2012). The growth rate of 42.9% in Asian Americans between 2000 and 2010 is phenomenal given that the corresponding figure for the U.S. total population is only 9.3% (see Figure 1). Currently, Asian Americans make up 5.6% of the total U.S. population and are projected to reach 10% by 2050. It is particularly notable that Asians have recently overtaken Hispanics as the largest group of new immigrants to the U.S. (Pew Research Center, 2015). The rapid growth rate and unique challenges as a new immigrant group call for a better understanding of the social and health needs of the Asian American population.
U
Geochemical Database for the Brackish Groundwater Assessment of the United...
data.usgs.gov
catalog.data.gov
+1more
Updated Apr 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharon Qi; Alta Harris (2024). Geochemical Database for the Brackish Groundwater Assessment of the United States: Major-Ions Dataset [Dataset]. http://doi.org/10.5066/F72F7KK1
Explore at:
Unique identifier
https://doi.org/10.5066/F72F7KK1
Dataset updated
Apr 30, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Sharon Qi; Alta Harris
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Oct 11, 1901 - Sep 5, 2013
Area covered
United States
Description
Brackish groundwater (BGW), defined for this assessment as having a dissolved-solids concentration between 1,000 and 10,000 milligrams per liter is an unconventional source of water that may offer a partial solution to current (2016) and future water challenges. In support of the National Water Census, the U.S. Geological Survey has completed a BGW assessment to gain a better understanding of the occurrence and character of BGW resources of the United States as an alternative source of water. Analyses completed as part of this assessment relied on previously collected data from multiple sources, and no new data were collected. One of the most important contributions of this assessment was the creation of a database containing chemical data and aquifer information for the known quantities of BGW in the United States. Data were compiled from single publications to large datasets and from local studies to national assessments, and includes chemical data on the concentrations of disso ...
h
atcoder_contests
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
atcoder_contests [Dataset]. https://huggingface.co/datasets/Nan-Do/atcoder_contests
Explore at:
Authors
Fernando Tarin Morales
Description
Dataset Summary

This dataset aims to facilitate the creation of sophisticated, multi-turn dialogue datasets focused on coding for Large Language Models (LLMs).

It also serves as a robust foundation for problem-solving in Large Language Models (LLMs).

The dataset includes both accepted and failed solutions from Atcoders's (ABC) contests.

In total, it features 1911 unique problems and 384,536 submissions across over 50 different programming languages.

It covers contests from ABC… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/atcoder_contests.
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Nishat Anjum
Nafiz Sadman
Kishor Datta Gupta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
h
gsm8k
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for GSM8K

Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

Alternative Data Market Analysis North America, Europe, APAC, South America,...

technavio.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio, Alternative Data Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, Canada, China, UK, Mexico, Germany, Japan, India, Italy, France - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/alternative-data-market-industry-analysis

Explore at:

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Europe, Germany, Canada, France, United Kingdom, Mexico, United States, Global

Description

Snapshot img

Alternative Data Market Size 2025-2029

The alternative data market size is forecast to increase by USD 60.32 billion at a CAGR of 52.5% between 2024 and 2029.

The market is experiencing significant growth due to the increased availability and diversity of data sources. This trend is driven by the rise of alternative data-driven investment strategies, which offer unique insights and opportunities for businesses and investors. However, challenges persist in the form of issues related to data quality and standardization. big data analytics and machine learning help businesses gain insights from vast amounts of data, enabling data-driven innovation and competitive advantage. Data governance, data security, and data ethics are crucial aspects of managing alternative data.
As more data becomes available, ensuring its accuracy and consistency is crucial for effective decision-making. The market analysis report provides an in-depth examination of these factors and their impact on the growth of the market. With the increasing importance of data-driven strategies, staying informed about the latest trends and challenges is essential for businesses looking to remain competitive in today's data-driven economy.

What will be the Size of the Alternative Data Market During the Forecast Period?

To learn more about the market report, Request Free Sample

Alternative data, the non-traditional information sourced from various industries and domains, is revolutionizing business landscapes by offering new opportunities for data monetization. This trend is driven by the increasing availability of data from various sources such as credit card transactions, IoT devices, satellite data, social media, and more. Data privacy is a critical consideration in the market. With the increasing focus on data protection regulations, businesses must ensure they comply with stringent data privacy standards. Data storytelling and data-driven financial analysis are essential applications of alternative data, providing valuable insights for businesses to make informed decisions. Data-driven product development and sales prediction are other significant areas where alternative data plays a pivotal role.
Moreover, data management platforms and analytics tools facilitate data integration, data quality, and data visualization, ensuring data accuracy and consistency. Predictive analytics and data-driven risk management help businesses anticipate trends and mitigate risks. Data enrichment and data-as-a-service are emerging business models that enable businesses to access and utilize alternative data. Economic indicators and data-driven operations are other areas where alternative data is transforming business processes.

How is the Alternative Data Market Segmented?

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Type

  Credit and debit card transactions
  Social media
  Mobile application usage
  Web scrapped data
  Others


End-user

  BFSI
  IT and telecommunication
  Retail
  Others


Geography

  North America

    Canada
    Mexico
    US


  Europe

    Germany
    UK
    France
    Italy


  APAC

    China
    India
    Japan


  South America



  Middle East and Africa

By Type Insights

The credit and debit card transactions segment is estimated to witness significant growth during the forecast period.

Alternative data derived from card and debit card transactions offers valuable insights into consumer spending behaviors and lifestyle choices. This data is essential for market analysts, financial institutions, and businesses seeking to enhance their strategies and customer experiences. The two primary categories of card transactions are credit and debit. Credit card transactions provide information on discretionary spending, luxury purchases, and credit management skills. In contrast, debit card transactions reveal essential spending habits, budgeting strategies, and daily expenses. By analyzing this data using advanced methods, businesses can gain a competitive advantage, understand market trends, and cater to consumer needs effectively. IT & telecommunications companies, hedge funds, and other organizations rely on web scraped data, social and sentiment analysis, and public data to supplement their internal data sources. Adhering to GDPR regulations ensures ethical data usage and compliance.

Get a glance at the market report of share of various segments. Request Free Sample

The credit and debit card transactions segment was valued at USD 228.40 million in 2019 and showed a gradual increase during the forecast period.

Regional Analysis

North America is estimated to contribute 56% to the growth of the global market during the forecast period.

d
Problems of the Presence of American Troops in Germany - Dataset - B2FIND
b2find.dkrz.de
Updated Apr 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Problems of the Presence of American Troops in Germany - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/d9fefcd2-77ab-559a-ba74-12a77f7d219a
Explore at:
Dataset updated
Apr 7, 2023
Area covered
Germany
Description
Judgement on the presence of American troops in West Germany. Topics: Most important problems of the FRG; attitude to participation of the FRG in the costs of stationing NATO military forces and to American troops remaining in the FRG; attitude to a reduction in American military forces; general judgement on the American soldiers; perceived changes in the relationship of American soldiers to the German civilian population; criticism of the way of life of American soldiers; frequency of contact with American soldiers after the war; attitude to construction of housing settlements for the families living in Germany; perception of the Americans as occupying forces or protective forces; attitude to children of members of the occupying forces and their mothers; judgement on the confiscation of buildings by Americans; residency; participation in the world war and deployment in battle against the Americans. Demography: membership in clubs, trade unions or a party und offices taken on there; party preference; age (classified); sex; marital status; religious denomination; school education; occupation; employment; household income; head of household; state; Interviewer rating: social class and willingness of respondent to cooperate; number of contact attempts; city size. Also encoded was: identification of interviewer; sex of interviewer and age of interviewer. Beurteilung der Anwesenheit der amerikanischen Truppen in Westdeutschland. Themen: Wichtigste Probleme der BRD; Einstellung zu einer Beteiligung der BRD an den Stationierungskosten der NATO-Streitkräfte und zu einem Verbleib der amerikanischen Truppen in der BRD; Einstellung zu einer Verringerung der amerikanischen Streitkräfte; allgemeine Beurteilung der amerikanischen Soldaten; wahrgenommene Veränderungen im Verhältnis der amerikanischen Soldaten zur deutschen Zivilbevölkerung; Kritik an der Lebensweise amerikanischer Soldaten; Kontakthäufigkeit zu amerikanischen Soldaten nach dem Kriege; Einstellung zum Bau von Wohnsiedlungen für die in Deutschland lebenden Familien; Wahrnehmung der Amerikaner als Besatzungstruppen oder Schutztruppe; Einstellung zu Besatzungskindern und ihren Müttern; Beurteilung der Beschlagnahme von Häusern durch Amerikaner; Teilnahme am Weltkrieg und Einsatz im Kampf gegen die Amerikaner. Demographie: Mitgliedschaft in Vereinen, Gewerkschaften oder einer Partei und dabei übernommene Ämter; Parteipräferenz; Alter (klassiert); Geschlecht; Familienstand; Konfession; Schulbildung; Beruf; Berufstätigkeit; Haushaltseinkommen; Haushaltungsvorstand; Bundesland; Flüchtlingsstatus. Interviewerrating: Schichtzugehörigkeit und Kooperationsbereitschaft des Befragten; Anzahl der Kontaktversuche; Ortsgröße. Zusätzlich verkodet wurde: Intervieweridentifikation; Interviewergeschlecht und Intervieweralter.
U
United States SBOI: sa: Most Pressing Problem: Competition from Large...
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, United States SBOI: sa: Most Pressing Problem: Competition from Large Businesses [Dataset]. https://www.ceicdata.com/en/united-states/nfib-index-of-small-business-optimism/sboi-sa-most-pressing-problem-competition-from-large-businesses
Explore at:
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2024 - Jan 1, 2025
Area covered
United States
Variables measured
Business Confidence Survey
Description
United States SBOI: sa: Most Pressing Problem: Competition from Large Businesses data was reported at 5.000 % in Jan 2025. This records an increase from the previous number of 4.000 % for Dec 2024. United States SBOI: sa: Most Pressing Problem: Competition from Large Businesses data is updated monthly, averaging 8.000 % from Jan 2014 (Median) to Jan 2025, with 129 observations. The data reached an all-time high of 11.000 % in Dec 2019 and a record low of 0.000 % in May 2022. United States SBOI: sa: Most Pressing Problem: Competition from Large Businesses data remains active status in CEIC and is reported by National Federation of Independent Business. The data is categorized under Global Database’s United States – Table US.S032: NFIB Index of Small Business Optimism. [COVID-19-IMPACT]
O
Civic Innovation Challenge Inventory
data.cambridgema.gov
application/rdfxml +5
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Cambridge (2025). Civic Innovation Challenge Inventory [Dataset]. https://data.cambridgema.gov/widgets/x96z-hdnh
Explore at:
csv, tsv, application/rdfxml, application/rssxml, xml, jsonAvailable download formats
Dataset updated
Mar 16, 2025
Dataset authored and provided by
City of Cambridge
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
Use Cambridge's open data to help our city come up with innovative solutions to its biggest challenges. This dataset lists city issues that you can help us solve by analyzing or hacking on our open data. It's certainly not an exhaustive list, but we hope it will at least point you in the right direction. Feel free to reach out at OpenData@cambridgema.gov with questions or ideas. Thanks for your help. We're glad you're on our team!
c
English Poor Law Cases, 1690-1815
datacatalogue.cessda.eu
beta.ukdataservice.ac.uk
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deakin, S; Shuku, L; Cheok, V (2025). English Poor Law Cases, 1690-1815 [Dataset]. http://doi.org/10.5255/UKDA-SN-856924
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-856924
Dataset updated
Mar 26, 2025
Dataset provided by
University of Cambridge
Authors
Deakin, S; Shuku, L; Cheok, V
Time period covered
Jan 1, 2020 - Jan 31, 2023
Area covered
United Kingdom
Variables measured
Text unit
Measurement technique
The cases were sourced from original texts of legal judgments. A text file was first created for each judgment and a separate word file was then created. The word files were annotated for subsequent use in computational analysis. In the current dataset the cases are ordered alphabetically in a single word document. The annotations (colour coding for words (yellow) and certain longer phrases (green) of interest) have been retained.
Description
This dataset of historical poor law cases was created as part of a project aiming to assess the implications of the introduction of Artificial Intelligence (AI) into legal systems in Japan and the United Kingdom. The project was jointly funded by the UK’s Economic and Social Research Council, part of UKRI, and the Japanese Society and Technology Agency (JST), and involved collaboration between Cambridge University (the Centre for Business Research, Department of Computer Science and Faculty of Law) and Hitotsubashi University, Tokyo (the Graduate Schools of Law and Business Administration). As part of the project, a dataset of historic poor law cases was created to facilitate the analysis of legal texts using natural language processing methods. The dataset contains judgments of cases which have been annotated to facilitate computational analysis. Specifically, they make it possible to see how legal terms have evolved over time in the area of disputes over the law governing settlement by hiring.
A World Economic Forum meeting at Davos 2019 heralded the dawn of 'Society 5.0' in Japan. Its goal: creating a 'human-centred society that balances economic advancement with the resolution of social problems by a system that highly integrates cyberspace and physical space.' Using Artificial Intelligence (AI), robotics and data, 'Society 5.0' proposes to '...enable the provision of only those products and services that are needed to the people that need them at the time they are needed, thereby optimizing the entire social and organizational system.' The Japanese government accepts that realising this vision 'will not be without its difficulties,' but intends 'to face them head-on with the aim of being the first in the world as a country facing challenging issues to present a model future society.' The UK government is similarly committed to investing in AI and likewise views the AI as central to engineering a more profitable economy and prosperous society.

This vision is, however, starting to crystallise in the rhetoric of LegalTech developers who have the data-intensive-and thus target-rich-environment of law in their sights. Buoyed by investment and claims of superior decision-making capabilities over human lawyers and judges, LegalTech is now being deputised to usher in a new era of 'smart' law built on AI and Big Data. While there are a number of bold claims made about the capabilities of these technologies, comparatively little attention has been directed to more fundamental questions about how we might assess the feasibility of using them to replicate core aspects of legal process, and ensuring the public has a meaningful say in the development and implementation.

This innovative and timely research project intends to approach these questions from a number of vectors. At a theoretical level, we consider the likely consequences of this step using a Horizon Scanning methodology developed in collaboration with our Japanese partners and an innovative systemic-evolutionary model of law. Many aspects of legal reasoning have algorithmic features which could lend themselves to automation. However, an evolutionary perspective also points to features of legal reasoning which are inconsistent with ML: including the reflexivity of legal knowledge and the incompleteness of legal rules at the point where they encounter the 'chaotic' and unstructured data generated by other social sub-systems. We will test our theory by developing a hierarchical model (or ontology), derived from our legal expertise and public available datasets, for classifying employment relationships under UK law. This will let us probe the extent to which legal reasoning can be modelled using less computational-intensive methods such as Markov Models and Monte Carlo Trees.

Building upon these theoretical innovations, we will then turn our attention from modelling a legal domain using historical data to exploring whether the outcome of legal cases can be reliably predicted using various technique for optimising datasets. For this we will use a data set comprised of 24,179 cases from the High Court of England and Wales. This will allow us to harness Natural Language Processing (NLP) techniques such as named entity recognition (to identify relevant parties) and sentiment analysis (to analyse opinions and determine the disposition of a party) in addition to identifying the main legal and factual points of the dispute, remedies, costs, and trial durations. By trailing various predictive heuristics and ML techniques against this dataset we hope to develop a more granular understanding as to the feasibility of predicting dispute outcomes and insight to what factors are relevant for legal decision-making. This will allow us to then undertake a comparative analysis with the results of existing studies and shed light on the legal contexts and questions where AI can and cannot be used to produce accurate and repeatable results.
d
Attitude Study about Present International and National Questions - Dataset...
b2find.dkrz.de
Updated Oct 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Attitude Study about Present International and National Questions - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/2ad6d3cb-f4ae-5fae-a864-0b27ec8cdfc6
Explore at:
Dataset updated
Oct 23, 2023
Description
Attitudes to current national and international questions. Topics: most important national problem; most important international problem; countries in conflict with the FRG; major problems and differences between FRG and USA; major problems between FRG and other countries; opinion on France, Great Britain, USA, USSR, Red China; reasons for negative and positive attitude to countries USA, USSR and China; trust in USA and USSR in treatment of world problems; reasons for little trust in USA and USSR; effort of USA and USSR for world peace; relationship of USA to USSR; strongest current nuclear power; strongest nuclear power in 5 years; desired strongest nuclear power; reasons for desire for balanced nuclear potential between USA and USSR; knowledge about the SALT negotiations; countries participating in the SALT negotiations; purpose and chances for success of the SALT negotiations; beneficiary of a treaty between USA and USSR; relying on USA in negotiations; security conference; threat to national security of Germany; support for FRG in the case of conflict; knowledge of international organizations; purpose of NATO; membership in NATO; reasons for desired membership; trust in defense ability of NATO; stationing troops in Western Europe; reduction of US troop strength in Europe; necessity of USA for security of Western Europe; defense budget of FRG; navy forces in the Mediterranean; strongest naval power in the Mediterranean; relationship of Israel and Arab nations; support of FRG for Israel; significance of result of the Middle East Conflict for FRG; peace process in the Middle East; European unification process; powers of a European Government; attitude of the USA to European integration; solving the problem of environmental pollution by international organizations; economic aid for other countries. Demography: age; marital status; education; occupation; income; religious denomination; church attendance; sex; city size; state. Also encoded was: length of interview; number of contact attempts; presence of others during interview; willingness to cooperate; difficulty; end time; date of interview; interviewer number. Einstellungen zu aktuellen nationalen und internationalen Fragen. Themen: wichtigstes nationales Problem; wichtigstes internationales Problem; Länder im Konflikt mit der BRD; Hauptprobleme und Differenzen zwischen BRD und USA; Hauptprobleme zwischen BRD und anderen Ländern; Meinung über Frankreich, Großbritannien, USA, UdSSR, Rot-China; Gründe für negative und positive Einstellung zu den Ländern USA, UdSSR und China; Vertrauen in die USA und die UdSSR bei der Behandlung von Weltproblemen; Gründe für geringes Vertrauen in die USA und UdSSR; Bemühen der USA und der UdSSR um den Weltfrieden; Verhältnis der USA zur UdSSR; stärkste derzeitige Atommacht; Stärkste Atommacht in 5 Jahren; gewünschte stärkste Atommacht; Gründe für Wunsch nach ausgeglichenem Nuklearpotential zwischen USA und UdSSR; Kenntnis der SALT-Verhandlungen; Teilnehmerstaaten der SALT-Verhandlungen; Zweck und Erfolgschancen der SALT-Verhandlungen; Nutznießer eines Abkommens zwischen USA und UdSSR; Verlaß auf USA bei Verhandlungen; Sicherheitskonferenz; Bedrohung der nationalen Sicherheit Deutschlands; Beistand für BRD im Konfliktfall; Kenntnis internationaler Organisationen; Zweck der NATO; Mitgliedschaft in der NATO; Gründe für gewünschte Mitgliedschaft; Vertrauen in Verteidigungsfähigkeit der NATO; Truppenstationierungen in Westeuropa; Reduktion der US-Truppenstärke in Europa; Notwendigkeit der USA für die Sicherheit Westeuropas; Verteidigungsbudget der BRD; Marinestreitkräfte im Mittelmeer; stärkste Seemacht im Mittelmeer; Verhältnis Israel und arabische Staaten; Unterstützung der BRD für Israel; Bedeutung des Ausganges des Nahostkonfliktes für die BRD; Friedensprozess im Nahen Osten; europäischer Einigungsprozess; Kompetenzen einer europäischen Regierung; Haltung der USA zur europäischen Integration; Lösung des Problems der Umweltverschmutzung durch internationale Organisationen; Wirtschaftshilfe für andere Staaten. Demographie: Alter; Familienstand; Bildung; Beruf; Einkommen; Konfession; Kirchgang; Geschlecht; Ortsgröße; Bundesland. Zusätzlich verkodet wurden: Interviewdauer; Anzahl der Kontaktversuche; Anwesenheit anderer während des Interviews; Kooperationsbereitschaft; Schwierigkeit; Endzeit; Interviewdatum; Interviewer-Nummer.
h
Big-Math-RL-Verified
huggingface.co
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Big-Math-RL-Verified [Dataset]. https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified
Explore at:
Dataset updated
Mar 6, 2025
Dataset authored and provided by
SynthLabs
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.

Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
PLACES: Local Data for Better Health, Census Tract Data 2023 release
healthdata.gov
data.virginia.gov
+2more
application/rdfxml +5
Updated Aug 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cdc.gov (2024). PLACES: Local Data for Better Health, Census Tract Data 2023 release [Dataset]. https://healthdata.gov/dataset/PLACES-Local-Data-for-Better-Health-Census-Tract-D/m68h-h4e7
Explore at:
application/rdfxml, csv, tsv, json, application/rssxml, xmlAvailable download formats
Dataset updated
Aug 24, 2024
Dataset provided by
data.cdc.gov
Description
This dataset contains model-based census tract estimates. PLACES covers the entire United States—50 states and the District of Columbia—at county, place, census tract, and ZIP Code Tabulation Area levels. It provides information uniformly on this large scale for local areas at four geographic levels. Estimates were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. PLACES was funded by the Robert Wood Johnson Foundation in conjunction with the CDC Foundation. The dataset includes estimates for 36 measures: 13 for health outcomes, 9 for preventive services use, 4 for chronic disease-related health risk behaviors, 7 for disabilities, and 3 for health status. These estimates can be used to identify emerging health problems and to help develop and carry out effective, targeted public health prevention activities. Because the small area model cannot detect effects due to local interventions, users are cautioned against using these estimates for program or policy evaluations. Data sources used to generate these model-based estimates are Behavioral Risk Factor Surveillance System (BRFSS) 2021 or 2020 data, Census Bureau 2010 population data, and American Community Survey 2015–2019 estimates. The 2023 release uses 2021 BRFSS data for 29 measures and 2020 BRFSS data for seven measures (all teeth lost, dental visits, mammograms, cervical cancer screening, colorectal cancer screening, core preventive services among older adults, and sleeping less than 7 hours) that the survey collects data on every other year. More information about the methodology can be found at www.cdc.gov/places.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2024). Main challenges affecting data analytics for CX in the U.S. 2021 [Dataset]. https://www.statista.com/statistics/1196851/main-challenges-affecting-data-analytics-for-cx-in-the-us/

Main challenges affecting data analytics for CX in the U.S. 2021

Explore at:

Dataset updated

Dec 10, 2024

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

May 2021 - Jun 2021

Area covered

United States

Description

According to the results of a survey on customer experience (CX) among businesses conducted in the United States in 2021, the main challenge affecting data analysis capability for CX is the lack of reliability and integrity of available data. Data security followed, being chosen by almost 46 percent of the respondents.

Clear search

Close search

Google apps

Main menu

Main challenges affecting data analytics for CX in the U.S. 2021

Top challenges of merging linear TV and digital campaign data in the U.S....

Challenges to health data sharing in the U.S. in 2020, by payers and...

Large Scale International Boundaries

Large Scale International Boundaries (LSIB)

Large Scale International Boundaries

Overview

Details

Attributes

Credits

Changes from Prior Release

Copyright Notice and Disclaimer

United States SBOI: sa: Most Pressing Problem: Survey High: Competit'n frm...

Coronavirus (Covid-19) Data in the United States

Final Report of the Asian American Quality of Life (AAQoL)

Geochemical Database for the Brackish Groundwater Assessment of the United...

atcoder_contests

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

gsm8k

Alternative Data Market Analysis North America, Europe, APAC, South America,...

Snapshot img

Problems of the Presence of American Troops in Germany - Dataset - B2FIND

United States SBOI: sa: Most Pressing Problem: Competition from Large...

Civic Innovation Challenge Inventory

English Poor Law Cases, 1690-1815

Attitude Study about Present International and National Questions - Dataset...

Big-Math-RL-Verified

PLACES: Local Data for Better Health, Census Tract Data 2023 release

Main challenges affecting data analytics for CX in the U.S. 2021