100+ datasets found

h
text-stats
huggingface.co
Updated Dec 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Tseng (2024). text-stats [Dataset]. https://huggingface.co/datasets/agentlans/text-stats
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2024
Authors
Alan Tseng
Description
Text statistics

This dataset is a combination of the following datasets:

agentlans/text-quality-v2 agentlans/readability agentlans/twitter-sentiment-meta-analysis

The main purpose is to collect the large data into one place for easy training and evaluation.

Data Preparation and Transformation Quality Score Normalization

The dataset was enhanced with additional columns, and quality scores (n = 909 533) were normalized using Ordered Quantile… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/text-stats.
E
SMS Marketing Statistics By Effectiveness, Sales, Benefits and Facts
electroiq.com
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Electro IQ (2025). SMS Marketing Statistics By Effectiveness, Sales, Benefits and Facts [Dataset]. https://electroiq.com/stats/sms-marketing-statistics/
Explore at:
Dataset updated
Mar 24, 2025
Dataset authored and provided by
Electro IQ
License
https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

SMS Marketing Statistics: With mobile marketing on the rise, itâ€™s no wonderÂ SMS marketingÂ is becoming more popular among businesses. SMS, which stands for short message service, is a key part of mobile and text message marketing. This tool helps you boost brand awareness, engage with customers, and increase sales. It lets e-commerce marketers build a stronger relationship with their customers and connect with them more personally. SMS marketing is both practical and smart, changing how you grow your business.

Using SMS ensures quick delivery and efficiency, as your message can be sent instantly and reach customersâ€™ phones within seconds. SMS is especially effective for advertising time-sensitive sales or special promotions exclusive to your mobile customers. We shall shed more light on SMS Marketing Statistics through this article.
Use of AI for text analysis in Denmark in 2023, by industry
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Use of AI for text analysis in Denmark in 2023, by industry [Dataset]. https://www.statista.com/statistics/1455135/artificial-intelligence-text-analysis-usage-industry-denmark/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Denmark
Description
Information and communication was the industry with the most usage of artificial intelligence (AI) for text analysis in Denmark in 2023 with ** enterprises. Construction made up the least share of only * company.
Lingsoft Text Analysis NN NER
live.european-language-grid.eu
Updated Jul 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lingsoft (2021). Lingsoft Text Analysis NN NER [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/7301
Explore at:
Dataset updated
Jul 28, 2021
Dataset provided by
Lingsoft Oyhttps://lingsoft.ai/
Authors
Lingsoft
License
https://lss01.lingsoft.fi/assets/docs/Lingsoft%20Language%20Management%20Central%20-%20Terms%20of%20Service.pdfhttps://lss01.lingsoft.fi/assets/docs/Lingsoft%20Language%20Management%20Central%20-%20Terms%20of%20Service.pdf
Description
Named entity recognition
h
text-clustering-example-data
huggingface.co
Updated Nov 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Moore (2024). text-clustering-example-data [Dataset]. https://huggingface.co/datasets/billingsmoore/text-clustering-example-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 20, 2024
Authors
Jacob Moore
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Dataset Name

This dataset consists of 925 sentences in English paired with a broad topic descriptor for use as example data in product demonstrations or student projects.

Curated by: billingsmoore Language(s) (NLP): English License: Apache License 2.0

Direct Use

This data can be loaded using the following Python code. from datasets import load_dataset

ds = load_dataset('billingsmoore/text-clustering-example-data')

It can then be clustered using the… See the full description on the dataset page: https://huggingface.co/datasets/billingsmoore/text-clustering-example-data.
Text Analysis
kaggle.com
zip
Updated Apr 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DEBJYOTI SAHA (2021). Text Analysis [Dataset]. https://www.kaggle.com/datasets/debjyotisaha/text-analysis
Explore at:
zip(58574497 bytes)Available download formats
Dataset updated
Apr 17, 2021
Authors
DEBJYOTI SAHA
Description
Dataset

This dataset was created by DEBJYOTI SAHA

Contents
c
emotion analysis based on text Dataset
cubig.ai
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). emotion analysis based on text Dataset [Dataset]. https://cubig.ai/store/products/139/emotion-analysis-based-on-text-dataset
Explore at:
Dataset updated
Feb 25, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data introduction • Emotion-analysis dataset is data for analyzing the emotions of text.

2) Data utilization (1) Emotion-analysis data has characteristics that: • Contains a variety of texts that convey emotions ranging from happiness to anger to sadness. The goal is to build an efficient model for detecting emotions in text. (2) Emotion-analysis data can be used to: • Sentiment classification models: This dataset can be used to train machine learning models that classify text based on sentiment, which helps companies and researchers understand public opinion and sentiment trends. • Market research: Researchers can analyze sentiment data to understand consumer preferences and market trends and support data-driven decision making.
Natural Language Processing Text Data from Final Contractor/Grantee Reports...
catalog.data.gov
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Natural Language Processing Text Data from Final Contractor/Grantee Reports and Evaluation Reports (2011-2021) [Dataset]. https://catalog.data.gov/dataset/natural-language-processing-text-data-from-final-contractor-grantee-reports-and-evalu-2011
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
United States Agency for International Developmenthttp://usaid.gov/
Description
This data asset contains data files of text extracted from pdf reports on the Development Experience Clearinghouse (DEC) for the years 2011 to 2021 (as of July 2021). It includes three specific "Document types" identified by the DEC: Final Contractor/Grantee Report, Final Evaluation Report, and Special Evaluation. Each PDF document labeled as one of these three document types and labeled with a publication year from 2011 to 2021 was downloaded from the DEC in July 2011. The dataset includes text data files from 2,579 Final Contractor/Grantee Reports, 1,299 Final Evaluation reports, and 1,323 Special Evaluation reports. Raw text from each of these PDFs was extracted and saved as individual csv files, the names of which correspond to the Document ID of the PDF document on the DEC. Within each csv file, the raw text is split into paragraphs and corresponding sentences. In addition, to enable Natural Language Processing of the data, the sentences are cleaned by removing unnecessary special characters, punctuation, and numbers, and each word is stemmed to its root to remove inflections (e.g. pluralization and conjugation). This data could be used to analyze trends in USAID's programming approaches and terminology. This data was compiled for USAID/PPL/LER with the Program Cycle Mechanism.
Number of text messages sent in the U.S. 2004-2014
statista.com
Updated Dec 10, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2015). Number of text messages sent in the U.S. 2004-2014 [Dataset]. https://www.statista.com/statistics/215776/mobile-messaging-volumes-in-the-us/
Explore at:
Dataset updated
Dec 10, 2015
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2004 - 2014
Area covered
United States
Description
This statistic shows mobile messaging volumes in the U.S. for selected years between 2004 and 2014. In 2010, approximately ***** billion messages were sent in total, up from ** billion in 2004.

U.S. mobile messaging volumes - additional information

A total of around *** trillion text messages were sent in the United States in 2012, marking an almost tenfold increase on the figure from 2006. A further ** million MMS messages were sent in the country in 2012, an increase from * million in 2006. In 2013, the United States was the country with the highest average number of text messages sent per month and per mobile connection. Over *** messages were sent monthly per mobile connection in the United States, in comparison to *** in the United Kingdom and *** in Germany.

The most active age group for sending and receiving text messages in the United States were those aged 18 to 29, as ** percent of respondents said that they did use mobile messaging in 2013. By comparison, only ** percent of those aged 65 and older said that they used their mobile phone for text messaging in 2013.

Rather than using a mobile phone’s integrated text messaging service, many users are opting for third party apps to communicate. As of January 2015, mobile messaging service WhatsApp had around 700 million monthly active users, marking double the amount of users it had in October 2013. Within the U.S. market, iOS and Android users spent a total of 680 million minutes on WhatsApp in February 2013, with those aged between 25 and 34 years most likely to use the service in 2014.
E
A Sentiment Analysis Dataset for Code-Mixed Malayalam-English
live.european-language-grid.eu
zenodo.org
tsv
Updated Dec 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). A Sentiment Analysis Dataset for Code-Mixed Malayalam-English [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7634
Explore at:
tsvAvailable download formats
Dataset updated
Dec 13, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.
T
Text Analysis System Report
archivemarketresearch.com
doc, pdf, ppt
Updated May 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Text Analysis System Report [Dataset]. https://www.archivemarketresearch.com/reports/text-analysis-system-561979
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
May 17, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Text Analysis System market is experiencing robust growth, driven by the increasing need for businesses to extract actionable insights from unstructured textual data. The market, valued at approximately $5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This growth is fueled by several key factors, including the rising adoption of cloud-based solutions offering scalability and cost-effectiveness, the expanding use of text analytics in various sectors like customer service, marketing, and risk management, and the increasing availability of sophisticated AI-powered tools capable of handling complex natural language processing tasks. Large enterprises are currently the dominant segment, but the SME sector is demonstrating rapid growth potential, driven by the accessibility of user-friendly and cost-effective solutions. While data privacy and security concerns present a restraint, the overall market trajectory remains positive, fueled by continued technological advancements and growing data volumes. The competitive landscape is marked by a mix of established players like SAP, Microsoft, and IBM, alongside innovative technology providers such as RapidMiner and Luminoso. Regional analysis indicates North America currently holds the largest market share, driven by early adoption of advanced analytics and a strong technology infrastructure. However, significant growth opportunities exist in the Asia Pacific region, particularly in countries like China and India, due to their burgeoning digital economies and increasing demand for data-driven decision-making across various industries. The ongoing development of sophisticated algorithms capable of handling multilingual text and sentiment analysis, along with the integration of text analysis into broader business intelligence platforms, will further propel market expansion in the forecast period (2025-2033). This continuous evolution ensures the text analysis system market remains dynamic and highly lucrative.
T
Text Analysis System Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Text Analysis System Report [Dataset]. https://www.archivemarketresearch.com/reports/text-analysis-system-31151
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Feb 17, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global text analysis system market is estimated to be valued at USD XXX million in 2025 and is projected to reach USD XXX million by 2033, at a CAGR of XX%. The increasing need for extracting insights from unstructured data, government initiatives promoting digital transformation, and the growing adoption of cloud-based solutions drive market growth. Various industries, including healthcare, retail, and finance, are increasingly using text analysis systems to analyze customer feedback, monitor social media trends, and improve product development. Key market trends include the rising adoption of artificial intelligence (AI) and machine learning (ML) algorithms, the proliferation of cloud-based solutions, and the growing emphasis on data privacy and security. The increasing availability of open-source text analysis tools and the emergence of low-code/no-code platforms are also expected to fuel market expansion. Moreover, the growing adoption of text analysis systems in emerging economies presents significant growth opportunities. Key players in the market include SAP SE, Microsoft Corporation, RapidMiner Inc., OpenText Corporation, Luminoso Technologies Inc., Lexalytics Inc., Infegy Inc., Micro Focus International PLC, IBM Corporation, Clarabridge Inc., Medallia Inc., SAS Institute Inc., and others. These companies offer a wide range of solutions, from on-premise to cloud-based, to cater to the diverse needs of various industries. Mergers and acquisitions, strategic partnerships, and new product launches are some of the key growth strategies adopted by these companies.
text-analysis
kaggle.com
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankush Tiwari (2024). text-analysis [Dataset]. https://www.kaggle.com/datasets/tiwariankush/text-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ankush Tiwari
Description
Dataset

This dataset was created by Ankush Tiwari

Contents
F
Data from: A Neural Approach for Text Extraction from Scholarly Figures
data.uni-hannover.de
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). A Neural Approach for Text Extraction from Scholarly Figures [Dataset]. https://data.uni-hannover.de/dataset/a-neural-approach-for-text-extraction-from-scholarly-figures
Explore at:
zip(798357692)Available download formats
Dataset updated
Jan 20, 2022
Dataset authored and provided by
TIB
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
A Neural Approach for Text Extraction from Scholarly Figures

This is the readme for the supplemental data for our ICDAR 2019 paper.

You can read our paper via IEEE here: https://ieeexplore.ieee.org/document/8978202

If you found this dataset useful, please consider citing our paper:

@inproceedings{DBLP:conf/icdar/MorrisTE19, author = {David Morris and Peichen Tang and Ralph Ewerth}, title = {A Neural Approach for Text Extraction from Scholarly Figures}, booktitle = {2019 International Conference on Document Analysis and Recognition, {ICDAR} 2019, Sydney, Australia, September 20-25, 2019}, pages = {1438--1443}, publisher = {{IEEE}}, year = {2019}, url = {https://doi.org/10.1109/ICDAR.2019.00231}, doi = {10.1109/ICDAR.2019.00231}, timestamp = {Tue, 04 Feb 2020 13:28:39 +0100}, biburl = {https://dblp.org/rec/conf/icdar/MorrisTE19.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

This work was financially supported by the German Federal Ministry of Education and Research (BMBF) and European Social Fund (ESF) (InclusiveOCW project, no. 01PE17004).

Datasets

We used different sources of data for testing, validation, and training. Our testing set was assembled by the work we cited by Böschen et al. We excluded the DeGruyter dataset, and use it as our validation dataset.

Testing

These datasets contain a readme with license information. Further information about the associated project can be found in the authors' published work we cited: https://doi.org/10.1007/978-3-319-51811-4_2

Validation

The DeGruyter dataset does not include the labeled images due to license restrictions. As of writing, the images can still be downloaded from DeGruyter via the links in the readme. Note that depending on what program you use to strip the images out of the PDF they are provided in, you may have to re-number the images.

Training

We used label_generator's generated dataset, which the author made available on a requester-pays amazon s3 bucket. We also used the Multi-Type Web Images dataset, which is mirrored here.

Code

We have made our code available in code.zip. We will upload code, announce further news, and field questions via the github repo.

Our text detection network is adapted from Argman's EAST implementation. The EAST/checkpoints/ours subdirectory contains the trained weights we used in the paper.

We used a tesseract script to run text extraction from detected text rows. This is inside our code code.tar as text_recognition_multipro.py.

We used a java script provided by Falk Böschen and adapted to our file structure. We included this as evaluator.jar.

Parameter sweeps are automated by param_sweep.rb. This file also shows how to invoke all of these components.
all-document-text-data
huggingface.co
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Climate Policy Radar (2024). all-document-text-data [Dataset]. http://doi.org/10.57967/hf/5426
Explore at:
Unique identifier
https://doi.org/10.57967/hf/5426
Dataset updated
Nov 4, 2024
Dataset provided by
Climate Policy Radar Cic
Authors
Climate Policy Radar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Climate Policy Radar Open Data

This repo contains the full text data of all of the documents from the Climate Policy Radar database (CPR), which is also available at Climate Change Laws of the World (CCLW). Please note that this replaces the Global Stocktake open dataset: that data, including all NDCs and IPCC reports is now a subset of this dataset.

What’s in this dataset

This dataset contains two corpus types (groups of the same types or sources of documents) which… See the full description on the dataset page: https://huggingface.co/datasets/ClimatePolicyRadar/all-document-text-data.
C
SMS Marketing Statistics By Timings, Frequency, Business, Companies,...
coolest-gadgets.com
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coolest Gadgets (2025). SMS Marketing Statistics By Timings, Frequency, Business, Companies, Conversion Rate, Retention and Facts [Dataset]. https://coolest-gadgets.com/sms-marketing-statistics/
Explore at:
Dataset updated
Jan 20, 2025
Dataset authored and provided by
Coolest Gadgets
License
https://coolest-gadgets.com/privacy-policyhttps://coolest-gadgets.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

SMS Marketing Statistics: SMS marketing is one of the most effective tools for businesses to connect with their customers. With nearly everyone owning a mobile phone, text messages offer a direct and personal way to share information. Statistics show that SMS messages have an incredibly high open rate, often exceeding 90%. Unlike emails that can go unread or calls that may be ignored, texts are usually seen within minutes.

Businesses are using SMS marketing to promote products, share offers, and provide updates quickly. Itâ€™s a cost-effective method that works for companies of all sizes. As mobile phone usage continues to grow, SMS marketing is becoming an essential part of any marketing strategy. Understanding its impact can help businesses improve customer engagement.
D
Tutorial Package for: Text as Data in Economic Analysis
dataverse.nl
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarek Hassan; Stephan Hollander; Aakash Kalyani; Laurence Van Lent; Markus Schwedeler; Ahmed Tahoun; Tarek Hassan; Stephan Hollander; Aakash Kalyani; Laurence Van Lent; Markus Schwedeler; Ahmed Tahoun (2025). Tutorial Package for: Text as Data in Economic Analysis [Dataset]. http://doi.org/10.34894/KNDZ9T
Explore at:
text/markdown(148), bin(493802528), text/markdown(405), csv(6678744), application/x-ipynb+json(56525), text/markdown(136), csv(8712017), txt(1706), text/x-python(3800), text/markdown(131), txt(194), text/markdown(179), csv(89054804), bin(43909246), csv(1600), xlsx(10436), bin(952), text/markdown(1743)Available download formats
Unique identifier
https://doi.org/10.34894/KNDZ9T
Dataset updated
Jun 26, 2025
Dataset provided by
DataverseNL
Authors
Tarek Hassan; Stephan Hollander; Aakash Kalyani; Laurence Van Lent; Markus Schwedeler; Ahmed Tahoun; Tarek Hassan; Stephan Hollander; Aakash Kalyani; Laurence Van Lent; Markus Schwedeler; Ahmed Tahoun
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Time period covered
Jan 1, 2002 - May 31, 2023
Dataset funded by
Institute for New Economic Thinking
Deutsche Forschungsgemeinschaft (403041268-TRR 266)
Description
This tutorial package, comprising both data and code, accompanies the article and is designed primarily to allow readers to explore the various vocabulary-building methods discussed in the paper. The article discusses how to apply computational linguistics techniques to analyze largely unstructured corporate-generated text for economic analysis. As a core example, we illustrate how textual analysis of earnings conference call transcripts can provide insights into how markets and individual firms respond to economic shocks, such as a nuclear disaster or a geopolitical event: insights that often elude traditional non-text data sources. This approach enables extracting actionable intelligence, supporting both policy-making and strategic corporate decision-making. We also explore applications using other sources of corporate-generated text, including patent documents and job postings. By incorporating computational linguistics techniques into the analysis of economic shocks, new opportunities arise for real-time economic data, offering a more nuanced understanding of market and firm responses in times of economic volatility.
D
Data from: Critical biblical studies via word frequency analysis: Unveiling...
research.repository.duke.edu
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Römer, Thomas; Bühler, Axel; Piasetzky, Eli; Faigenbaum-Golovin, Shira; Finkelstein, Israel; Kipnis, Alon (2024). Data from: Critical biblical studies via word frequency analysis: Unveiling text authorship [Dataset]. http://doi.org/10.7924/r42b97k21
Explore at:
Unique identifier
https://doi.org/10.7924/r42b97k21, https://identifiers.org/ark:/87924/r42b97k21
Dataset updated
Aug 12, 2024
Dataset provided by
Duke Research Data Repository
Authors
Römer, Thomas; Bühler, Axel; Piasetzky, Eli; Faigenbaum-Golovin, Shira; Finkelstein, Israel; Kipnis, Alon
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We address the question of authorship of biblical texts by employing statistical analysis to the frequency of words using a new method that is particularly sensitive to deviations in frequencies associated with few words out of potentially many. The data below consists of the "discriminating words" which have the biggest effect on the value of the Higher Criticism statistic from the analyses of 50 chapters.
Z
Data from: A recent overview of the state-of-the-art elements of text...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcin Mirończuk (2020). A recent overview of the state-of-the-art elements of text classification - dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1010057
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Marcin Mirończuk
Description
The two available datasets were used to conduct the quantitative analysis of the text classification area. The set, such as:

biblio.bib contains all articles that are grouped in categories

biblio.csv contains processed records from biblio.bib, based on it were built the statistics presented in the article
d
Data from: ViTexOCR; a script to extract text overlays from digital video
catalog.data.gov
data.usgs.gov
+5more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). ViTexOCR; a script to extract text overlays from digital video [Dataset]. https://catalog.data.gov/dataset/vitexocr-a-script-to-extract-text-overlays-from-digital-video
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
The ViTexOCR script presents a new method for extracting navigation data from videos with text overlays using optical character recognition (OCR) software. Over the past few decades, it was common for videos recorded during surveys to be overlaid with real-time geographic positioning satellite chyrons including latitude, longitude, date and time, as well as other ancillary data (such as speed, heading, or user input identifying fields). Embedding these data into videos provides them with utility and accuracy, but using the location data for other purposes, such as analysis in a geographic information system, is not possible when only available on the video display. Extracting the text data from imagery using software allows these videos to be located and analyzed in a geospatial context. The script allows a user to select a video, specify the text data types (e.g. latitude, longitude, date, time, or other), text color, and the pixel locations of overlay text data on a sample video frame. The script’s output is a data file containing the retrieved geospatial and temporal data. All functionality is bundled in a Python script that incorporates a graphical user interface and several other software dependencies.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alan Tseng (2024). text-stats [Dataset]. https://huggingface.co/datasets/agentlans/text-stats

text-stats

agentlans/text-stats

Explore at:

260 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 14, 2024

Authors

Alan Tseng

Description

Text statistics

This dataset is a combination of the following datasets:

agentlans/text-quality-v2 agentlans/readability agentlans/twitter-sentiment-meta-analysis

The main purpose is to collect the large data into one place for easy training and evaluation.

  Data Preparation and Transformation







  Quality Score Normalization

The dataset was enhanced with additional columns, and quality scores (n = 909 533) were normalized using Ordered Quantile… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/text-stats.

Clear search

Close search

Google apps

Main menu

text-stats

SMS Marketing Statistics By Effectiveness, Sales, Benefits and Facts

Introduction

Use of AI for text analysis in Denmark in 2023, by industry

Lingsoft Text Analysis NN NER

text-clustering-example-data

Text Analysis

Dataset

Contents

emotion analysis based on text Dataset

Natural Language Processing Text Data from Final Contractor/Grantee Reports...

Number of text messages sent in the U.S. 2004-2014

A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

Text Analysis System Report

Text Analysis System Report

text-analysis

Dataset

Contents

Data from: A Neural Approach for Text Extraction from Scholarly Figures

A Neural Approach for Text Extraction from Scholarly Figures

Datasets

Testing

Validation

Training

Code

all-document-text-data

SMS Marketing Statistics By Timings, Frequency, Business, Companies,...

Introduction

Tutorial Package for: Text as Data in Economic Analysis

Data from: Critical biblical studies via word frequency analysis: Unveiling...

Data from: A recent overview of the state-of-the-art elements of text...

Data from: ViTexOCR; a script to extract text overlays from digital video

text-stats

agentlans/text-stats