Facebook
TwitterData Set Information:
Relevant Information: All data is fully anonymized.
Data was originally collected from 19 participants, but the TAC readings of 6 participants were deemed unusable by SCRAM [1]. The data included is from the remaining 13 participants.
Accelerometer data was collected from smartphones at a sampling rate of 40Hz (file: all_accelerometer_data_pids_13.csv). The file contains 5 columns: a timestamp, a participant ID, and a sample from each axis of the accelerometer. Data was collected from a mix of 11 iPhones and 2 Android phones as noted in phone_types.csv. TAC data was collected using SCRAM [2] ankle bracelets and was collected at 30 minute intervals. The raw TAC readings are in the raw_tac directory. TAC readings which are more readily usable for processing are in clean_tac directory and have two columns: a timestamp and TAC reading. The cleaned TAC readings: (1) were processed with a zero-phase low-pass filter to smooth noise without shifting phase; (2) were shifted backwards by 45 minutes so the labels more closely match the true intoxication of the participant (since alcohol takes about 45 minutes to exit through the skin.) Please see the above referenced study for more details on how the data was processed ([Web Link]).
1 - [Web Link] 2 - J. Robert Zettl. The determination of blood alcohol concentration by transdermal measurement. [Web Link], 2002.
Number of Instances: Accelerometer readings: 14,057,567 TAC readings: 715 Participants: 13
Number of Attributes: - Time series: 3 axes of accelerometer data (columns x, y, z in all_accelerometer_data_pids_13.csv) - Static: 1 phone-type feature (in phone_types.csv) - Target: 1 time series of TAC for each of the 13 participants (in clean_tac directory).
For Each Attribute: (Main) all_accelerometer_data_pids_13.csv: time: integer, unix timestamp, milliseconds pid: symbolic, 13 categories listed in pids.txt x: continuous, time-series y: continuous, time-series z: continuous, time-series clean_tac/*.csv: timestamp: integer, unix timestamp, seconds TAC_Reading: continuous, time-series phone_type.csv: pid: symbolic, 13 categories listed in pids.txt phonetype: symbolic, 2 categories (iPhone, Android)
(Other) raw/*.xlsx: TAC Level: continuous, time-series IR Voltage: continuous, time-series Temperature: continuous, time-series Time: datetime Date: datetime
Missing Attribute Values: None
Target Distribution: TAC is measured in g/dl where 0.08 is the legal limit for intoxication while driving Mean TAC: 0.065 +/- 0.182 Max TAC: 0.443 TAC Inner Quartiles: 0.002, 0.029, 0.092 Mean Time-to-last-drink: 16.1 +/- 6.9 hrs
Attribute Information:
Provide information about each attribute in your data set.
Relevant Papers:
Past Usage: (a) Complete reference of article where it was described/used: Killian, J.A., Passino, K.M., Nandi, A., Madden, D.R. and Clapp, J., Learning to Detect Heavy Drinking Episodes Using Smartphone Accelerometer Data. In Proceedings of the 4th International Workshop on Knowledge Discovery in Healthcare Data co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019) (pp. 35-42). Web Link Indication of what attribute(s) were being predicted Features: Three-axis time series accelerometer data Target: Time series transdermal alcohol content (TAC) data (real-time measure of intoxication) (c) Indication of study's results The study decomposed each time series into 10 second windows and performed binary classification to predict if windows corresponded to an intoxicated participant (TAC >= 0.08) or sober participant (TAC < 0.08). The study tested several models and achieved a test accuracy of 77.5% with a random forest.
Citation Request:
When using this dataset, please cite: Killian, J.A., Passino, K.M., Nandi, A., Madden, D.R. and Clapp, J., Learning to Detect Heavy Drinking Episodes Using Smartphone Accelerometer Data. In Proceedings of the 4th International Workshop on Knowledge Discovery in Healthcare Data co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019) (pp. 35-42). [Web Link]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The aim of this study was to compare biomechanical, coordinative and physiological parameters in the front crawl, during interval training series performed in two submaximal intensities until exhaustion. Eleven swimmers, mean age of 21.0 ± 7.3 years, performed two sets of interval training with repetitions of 400 m (40 s of passive rest) at 90% (s90) and 95% (s95) of the 400 m front crawl mean speed (s400), which was previously determined during a maximum 400 m test. The results were: (i) increase in the stroke frequency and decrease in the stroke length between the trials and between the initial and final repetitions in the s90 and s95 series; (ii) index of coordination and propulsive time increased between the initial and final trials in the s95 series; (iii) the absolute and relative durations of the pull phase increased between the initial and final repetitions of the s95 series; (iv) perceived exertion, lactate concentration and heart rate increased between the initial and final repetitions in s90 and s95. To maintain speed in the s90 and s95 series of s400 leads to changes in the motor organization of the stroke in the front crawl.
Facebook
TwitterThis dataset is price of laptop in Indonesia regional. This dataset crawling data from 2 biggest Indonesian e-commerce, Tokopedia and Shopee.
In this dataset, I only took data from 6 laptop brands 1. Asus : ROG, TUF, Vivobook, Zenbook 2. Acer : Aspire, Nitro, Swift 3. Dell : Inspiron 4. HP : Envy, Omen, Pavilion 5. Lenovo : Ideapad, Legion, Yoga 6. MSI : Alpha, Gfthin - New in January 2021 : MSI Prestige, MSI Modern
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Facebook
TwitterThe data in this set was culled from the Directory of Open Access Journals (DOAJ), the Proquest database Library and Information Science Abstracts (LISA), and a sample of peer reviewed scholarly journals in the field of Library Science. The data include journals that are open access, which was first defined by the Budapest Open Access Initiative: By ‘open access’ to [scholarly] literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Starting with a batch of 377 journals, we focused our dataset to include journals that met the following criteria: 1) peer-reviewed 2) written in English or abstracted in English, 3) actively published at the time of..., Data Collection In the spring of 2023, researchers gathered 377 scholarly journals whose content covered the work of librarians, archivists, and affiliated information professionals. This data encompassed 221 journals from the Proquest database Library and Information Science Abstracts (LISA), widely regarded as an authoritative database in the field of librarianship. From the Directory of Open Access Journals, we included 144 LIS journals. We also included 12 other journals not indexed in DOAJ or LISA, based on the researchers’ knowledge of existing OA library journals. The data is separated into several different sets representing the different indices and journals we searched. The first set includes journals from the database LISA. The following fields are in this dataset:
Journal: title of the journal
Publisher: title of the publishing company
Open Data Policy: lists whether an open data exists and what the policy is
Country of publication: country where the journal is publ..., , # Open access practices of selected library science journals
The data in this set was culled from the Directory of Open Access Journals (DOAJ), the Proquest database Library and Information Science Abstracts (LISA), and a sample of peer reviewed scholarly journals in the field of Library Science.
The data include journals that are open access, which was first defined by the Budapest Open Access Initiative:Â
By ‘open access’ to [scholarly] literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself.
Starting with a batch of 377 journals, we focused our dataset to include journals that met the following criteria: 1) peer-reviewed 2) written in Engli...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Albanian-English parallel corpus MaCoCu-sq-en 1.0 was built by crawling the “.al” internet top-level domain in 2022, extending the crawl dynamically to other domains as well.
All the crawling process was carried out by the MaCoCu crawler (https://github.com/macocu/MaCoCu-crawler). Websites containing documents in both target languages were identified and processed using the tool Bitextor (https://github.com/bitextor/bitextor). Considerable effort was devoted into cleaning the extracted text to provide a high-quality parallel corpus. This was achieved by removing boilerplate and near-duplicated paragraphs and documents that are not in one of the targeted languages. Document and segment alignment as implemented in Bitextor were carried out, and Bifixer (https://github.com/bitextor/bifixer) and BicleanerAI (https://github.com/bitextor/bicleaner-ai) were used for fixing, cleaning, and deduplicating the final version of the corpus.
The corpus is available in three formats: two sentence-level formats, TXT and TMX, and a document-level TXT format. TMX is an XML-based format and TXT is a tab-separated format. They both consist of pairs of source and target segments (one or several sentences) and additional metadata. The following metadata is included in both sentence-level formats: - source and target document URL; - paragraph ID which includes information on the position of the sentence in the paragraph and in the document (e.g., “p35:77s1/3” which means “paragraph 35 out of 77, sentence 1 out of 3”); - quality score as provided by the tool Bicleaner AI (a likelihood of a pair of sentences being mutual translations, provided with a score between 0 and 1); - similarity score as provided by the sentence alignment tool Bleualign (value between 0 and 1); - personal information identification (“biroamer-entities-detected”): segments containing personal information are flagged, so final users of the corpus can decide whether to use these segments; - translation direction and machine translation identification (“translation-direction”): the source segment in each segment pair was identified by using a probabilistic model (https://github.com/RikVN/TranslationDirection), which also determines if the translation has been produced by a machine-translation system; - a DSI class (“dsi”): information whether the segment is connected to any of Digital Service Infrastructure (DSI) classes (e.g., cybersecurity, e-health, e-justice, open-data-portal), defined by the Connecting Europe Facility (https://github.com/RikVN/DSI); - English language variant: the language variant of English (British or American, using a lexicon-based English variety classifier - https://pypi.org/project/abclf/) was identified on document and domain level.
Furthermore, the sentence-level TXT format provides additional metadata: - web domain of the text; - source and target document title; - the date when the original file was retrieved; - the original type of the file (e.g., “html”), from which the sentence was extracted; - paragraph quality (labels, such as “short” or “good”, assigned based on paragraph length, URL and stopword density via the jusText tool - https://corpus.tools/wiki/Justext); - information whether the sentence is a heading or not in the original document.
The document-level TXT format provides pairs of documents identified to contain parallel data. In addition to the parallel documents (in base64 format), the corpus includes the following metadata: source and target document URL, a DSI category and the English language variant (British or American).
Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus.
This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author’s view. The Agency is not responsible for any use that may be made of the information it contains.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.
Dataset Specifications:
Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)Richness of Detail (11 Comprehensive Fields):
Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:
Review Content:
review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.title: The title given to the review by the user, often summarizing their main point.isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.Reviewer & Rating Information:
username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.App & Origin Context:
app_name: The name of the application being reviewed.app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.Metadata & Timestamps:
_id: A unique identifier for the specific review record in the dataset.crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).date: The original date the review was posted by the user on the App Store.Expanded Use Cases & Analytical Applications:
This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:
Product Development & Improvement:
review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.review text to inform future product roadmap decisions and develop features users actively desire.review field.rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.Market Research & Competitive Intelligence:
Marketing & App Store Optimization (ASO):
review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.Academic & Data Science Research:
review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.rating distribution, isEdited status, and date to understand user engagement and feedback cycles.country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
Facebook
Twitter
According to our latest research, the global Pipe Crawler Robot HD market size reached USD 1.19 billion in 2024, driven by increasing demand for advanced inspection solutions in pipeline infrastructure across various industries. The market is experiencing robust expansion, with a CAGR of 9.2% projected from 2025 to 2033. By the end of 2033, the Pipe Crawler Robot HD market is forecasted to attain a value of USD 2.72 billion. This growth is primarily attributed to the rising need for efficient, non-invasive inspection methods and the integration of high-definition imaging technologies that enhance operational reliability and safety in critical pipeline systems.
One of the primary growth factors propelling the Pipe Crawler Robot HD market is the aging infrastructure of pipelines in developed and developing regions alike. Many municipal and industrial pipeline networks, particularly in North America and Europe, are decades old and require regular inspection and maintenance to prevent failures and leaks. The adoption of pipe crawler robots equipped with high-definition cameras allows for precise, real-time assessment of pipeline conditions without the need for disruptive excavation or shutdowns. As governments and private entities ramp up investments in infrastructure renewal and preventive maintenance, the demand for these advanced robotic inspection solutions is expected to surge, supporting long-term market growth.
Another significant driver is the increasing regulatory scrutiny and safety standards imposed on industries such as oil & gas, water supply, and sewage management. Regulatory agencies worldwide are mandating more frequent and comprehensive pipeline inspections to prevent environmental hazards and ensure public safety. Pipe Crawler Robot HD systems facilitate compliance by providing detailed visual and analytical data from within pipelines, enabling operators to identify and address potential issues proactively. The technological advancements in robotics, miniaturization, and high-definition imaging further enhance the capabilities of these systems, making them indispensable tools for pipeline operators seeking to meet stringent regulatory requirements while optimizing operational efficiency.
Technological innovation remains a cornerstone of market expansion. The integration of artificial intelligence, machine learning, and advanced sensor technologies into Pipe Crawler Robot HD systems is elevating their performance and versatility. These innovations allow for automated defect detection, improved maneuverability in complex pipeline geometries, and enhanced data analytics for predictive maintenance. Furthermore, the development of wireless communication and remote operation capabilities is enabling real-time data transmission and remote diagnostics, reducing the need for manual intervention and improving safety for inspection teams. As technology continues to evolve, the market is poised to benefit from increased adoption across a broadening array of applications and industries.
Regionally, the Asia Pacific market is emerging as a significant growth engine for the Pipe Crawler Robot HD market, driven by rapid urbanization, expanding industrial activity, and substantial investments in water and energy infrastructure. Countries such as China, India, and Japan are witnessing accelerated deployment of advanced pipeline inspection technologies to address increasing demand for reliable utility services and to mitigate the risks associated with aging pipeline networks. Meanwhile, North America and Europe maintain strong positions due to their mature infrastructure and regulatory frameworks, while the Middle East & Africa and Latin America are gradually adopting these solutions as part of broader modernization initiatives.
The Pipe Crawler Robot HD market is segmented by product type into Wheeled Pipe Crawler Robots, Tracked Pipe Crawler Robots, HD Camera Pipe Crawler Robots, and Others. Wheeled Pipe Crawler Ro
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
rxivist.org allowed readers to sort and filter the tens of thousands of preprints posted to bioRxiv and medRxiv. Rxivist used a custom web crawler to index all papers posted to those two websites; this is a snapshot of Rxivist the production database. The version number indicates the date on which the snapshot was taken. See the included "README.md" file for instructions on how to use the "rxivist.backup" file to import data into a PostgreSQL database server.
Please note this is a different repository than the one used for the Rxivist manuscript—that is in a separate Zenodo repository. You're welcome (and encouraged!) to use this data in your research, but please cite our paper, now published in eLife.
Previous versions are also available pre-loaded into Docker images, available at blekhmanlab/rxivist_data.
Version notes:
2023-03-01
The final Rxivist data upload, more than four years after the first and encompassing 223,541 preprints posted to bioRxiv and medRxiv through the end of February 2023.
2020-12-07***
In addition to bioRxiv preprints, the database now includes all medRxiv preprints as well.
The website where a preprint was posted is now recorded in a new field in the "articles" table, called "repo".
We've significantly refactored the web crawler to take advantage of developments with the bioRxiv API.
The main difference is that preprints flagged as "published" by bioRxiv are no longer recorded on the same schedule that download metrics are updated: The Rxivist database should now record published DOI entries the same day bioRxiv detects them.
Twitter metrics have returned, for the most part. Improvements with the Crossref Event Data API mean we can once again tally daily Twitter counts for all bioRxiv DOIs.
The "crossref_daily" table remains where these are recorded, and daily numbers are now up to date.
Historical daily counts have also been re-crawled to fill in the empty space that started in October 2019.
There are still several gaps that are more than a week long due to missing data from Crossref.
We have recorded available Crossref Twitter data for all papers with DOI numbers starting with "10.1101," which includes all medRxiv preprints. However, there appears to be almost no Twitter data available for medRxiv preprints.
The download metrics for article id 72514 (DOI 10.1101/2020.01.30.927871) were found to be out of date for February 2020 and are now correct. This is notable because article 72514 is the most downloaded preprint of all time; we're still looking into why this wasn't updated after the month ended.
2020-11-18
Publication checks should be back on schedule.
2020-10-26
This snapshot fixes most of the data issues found in the previous version. Indexed papers are now up to date, and download metrics are back on schedule. The check for publication status remains behind schedule, however, and the database may not include published DOIs for papers that have been flagged on bioRxiv as "published" over the last two months. Another snapshot will be posted in the next few weeks with updated publication information.
2020-09-15
A crawler error caused this snapshot to exclude all papers posted after about August 29, with some papers having download metrics that were more out of date than usual. The "last_crawled" field is accurate.
2020-09-08
This snapshot is misconfigured and will not work without modification; it has been replaced with version 2020-09-15.
2019-12-27
Several dozen papers did not have dates associated with them; that has been fixed.
Some authors have had two entries in the "authors" table for portions of 2019, one profile that was linked to their ORCID and one that was not, occasionally with almost identical "name" strings. This happened after bioRxiv began changing author names to reflect the names in the PDFs, rather than the ones manually entered into their system. These database records are mostly consolidated now, but some may remain.
2019-11-29
The Crossref Event Data API remains down; Twitter data is unavailable for dates after early October.
2019-10-31
The Crossref Event Data API is still experiencing problems; the Twitter data for October is incomplete in this snapshot.
The README file has been modified to reflect changes in the process for creating your own DB snapshots if using the newly released PostgreSQL 12.
2019-10-01
The Crossref API is back online, and the "crossref_daily" table should now include up-to-date tweet information for July through September.
About 40,000 authors were removed from the author table because the name had been removed from all preprints they had previously been associated with, likely because their name changed slightly on the bioRxiv website ("John Smith" to "J Smith" or "John M Smith"). The "author_emails" table was also modified to remove entries referring to the deleted authors. The web crawler is being updated to clean these orphaned entries more frequently.
2019-08-30
The Crossref Event Data API, which provides the data used to populate the table of tweet counts, has not been fully functional since early July. While we are optimistic that accurate tweet counts will be available at some point, the sparse values currently in the "crossref_daily" table for July and August should not be considered reliable.
2019-07-01
A new "institution" field has been added to the "article_authors" table that stores each author's institutional affiliation as listed on that paper. The "authors" table still has each author's most recently observed institution.
We began collecting this data in the middle of May, but it has not been applied to older papers yet.
2019-05-11
The README was updated to correct a link to the Docker repository used for the pre-built images.
2019-03-21
The license for this dataset has been changed to CC-BY, which allows use for any purpose and requires only attribution.
A new table, "publication_dates," has been added and will be continually updated. This table will include an entry for each preprint that has been published externally for which we can determine a date of publication, based on data from Crossref. (This table was previously included in the "paper" schema but was not updated after early December 2018.)
Foreign key constraints have been added to almost every table in the database. This should not impact any read behavior, but anyone writing to these tables will encounter constraints on existing fields that refer to other tables. Most frequently, this means the "article" field in a table will need to refer to an ID that actually exists in the "articles" table.
The "author_translations" table has been removed. This was used to redirect incoming requests for outdated author profile pages and was likely not of any functional use to others.
The "README.md" file has been renamed "1README.md" because Zenodo only displays a preview for the file that appears first in the list alphabetically.
The "article_ranks" and "article_ranks_working" tables have been removed as well; they were unused.
2019-02-13.1
After consultation with bioRxiv, the "fulltext" table will not be included in further snapshots until (and if) concerns about licensing and copyright can be resolved.
The "docker-compose.yml" file was added, with corresponding instructions in the README to streamline deployment of a local copy of this database.
2019-02-13
The redundant "paper" schema has been removed.
BioRxiv has begun making the full text of preprints available online. Beginning with this version, a new table ("fulltext") is available that contains the text of preprints that have been processed already. The format in which this information is stored may change in the future; any digression will be noted here.
This is the first version that has a corresponding Docker image.
Facebook
Twitter
According to our latest research, the global Robotic Pipe Inspection Crawler market size reached USD 1.37 billion in 2024, reflecting strong demand across critical infrastructure sectors. The market is experiencing robust momentum, growing at a CAGR of 8.6% from 2025 to 2033. By the end of 2033, the market is forecasted to achieve a value of USD 2.83 billion. This remarkable growth trajectory is primarily fueled by the increasing need for efficient, accurate, and safe inspection solutions in industries such as oil & gas, water & wastewater, power generation, and municipal infrastructure. As per our comprehensive market analysis, the integration of advanced robotics, artificial intelligence, and data analytics into pipe inspection processes is revolutionizing asset management and maintenance strategies globally.
The primary growth factor driving the Robotic Pipe Inspection Crawler market is the escalating demand for infrastructure maintenance and monitoring, especially in aging pipeline networks. Many industrialized nations are facing significant challenges due to deteriorating underground pipe systems, which are prone to leaks, corrosion, and blockages. Traditional inspection methods are often labor-intensive, time-consuming, and sometimes hazardous. Robotic crawlers, equipped with high-definition cameras, sensors, and sometimes even repair tools, offer a non-invasive, efficient, and reliable alternative. These systems can traverse complex pipe networks, providing real-time data and imagery, enabling operators to detect anomalies early, plan maintenance, and prevent costly failures. The growing emphasis on predictive maintenance and asset integrity management in sectors such as oil & gas and water utilities further accelerates the adoption of robotic inspection technologies.
Another significant driver is the rapid technological advancements in robotics and automation, which have substantially improved the capabilities and versatility of pipe inspection crawlers. Modern crawlers feature modular designs, allowing for customization based on pipe diameter, material, and environmental conditions. Integration with wireless communication technologies, artificial intelligence, and machine learning has enabled automated data analysis, defect recognition, and reporting, reducing the need for human intervention and minimizing errors. The miniaturization of components and improvements in battery technology have also expanded the operational range and duration of these devices. As a result, end-users across industrial, municipal, and commercial sectors are increasingly investing in robotic inspection solutions to enhance operational efficiency, reduce downtime, and comply with stringent regulatory standards for safety and environmental protection.
Additionally, the rising focus on environmental sustainability and regulatory compliance is propelling the Robotic Pipe Inspection Crawler market. Governments and regulatory bodies worldwide are implementing stricter guidelines for pipeline inspection, leak detection, and maintenance to mitigate environmental risks and ensure public safety. Robotic crawlers facilitate comprehensive inspections without the need for excavation or service interruptions, reducing environmental impact and operational costs. Their ability to access hard-to-reach or hazardous areas makes them indispensable for industries dealing with hazardous materials or critical infrastructure. The growing adoption of smart city initiatives and digital infrastructure management further augments market growth, as municipalities and utilities prioritize investments in innovative inspection and monitoring technologies.
The introduction of the Free-Swimming Inspection Tool for Water Pipelines marks a significant advancement in the realm of pipeline inspection technologies. Unlike traditional robotic crawlers that require direct contact with the pipe surface, free-swimming tools operate autonomously within the flow of water, offering a unique advantage in inspecting long-distance pipelines without the need for service interruptions. This technology is particularly beneficial for water utilities aiming to enhance the efficiency and accuracy of their inspection processes. By utilizing advanced sensors and data analytics, free-swimming tools can detect anomalies such as leaks, corrosion, and sediment bui
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global crawler camera for pipelines market size reached USD 510 million in 2024, with a robust compound annual growth rate (CAGR) of 6.9% observed in recent years. This growth is driven by the increasing need for efficient inspection and maintenance of critical pipeline infrastructure worldwide. Based on the projected CAGR, the market is forecasted to attain a value of approximately USD 988 million by 2033, reflecting sustained demand across multiple end-use sectors. The continuous expansion of pipeline networks and the emphasis on preventive maintenance are core factors propelling this market's upward trajectory as per the latest research.
One of the most significant growth drivers for the crawler camera for pipelines market is the aging infrastructure in developed economies and the rapid expansion of utility networks in developing regions. Many countries in North America and Europe are witnessing an urgent need to upgrade their water, wastewater, and oil & gas pipelines, which have been in operation for decades. This has created a substantial demand for advanced inspection technologies like crawler cameras, which can navigate complex pipeline systems and provide real-time visual data for maintenance planning. Simultaneously, emerging economies in Asia Pacific and Latin America are investing heavily in new pipeline construction to support urbanization and industrialization, further driving the need for reliable inspection solutions. The versatility and adaptability of crawler cameras to various pipe sizes and environments make them indispensable for both retrofitting old systems and ensuring the integrity of new infrastructure.
Technological advancements in crawler camera systems have also played a pivotal role in market growth. Modern crawler cameras are equipped with high-definition imaging, 360-degree rotation, and robust lighting systems, enabling detailed inspection even in challenging environments such as submerged or heavily corroded pipelines. Integration with digital recording, wireless data transfer, and cloud-based analytics has revolutionized the inspection process, making it more efficient, accurate, and accessible. These innovations not only reduce operational downtime but also minimize the risk of catastrophic failures by enabling early detection of defects such as cracks, blockages, and leaks. As a result, industries ranging from municipal water management to oil & gas are increasingly adopting crawler camera solutions as part of their routine maintenance and regulatory compliance efforts.
Another key factor fueling the crawler camera for pipelines market is the growing emphasis on environmental sustainability and regulatory compliance. Governments and regulatory bodies worldwide are imposing stringent standards on pipeline operators to prevent leaks, spills, and contamination events. Non-compliance can result in severe penalties, reputational damage, and environmental harm. Crawler cameras offer a non-invasive and cost-effective means of conducting thorough pipeline inspections, thereby supporting operators in meeting regulatory requirements and environmental goals. Moreover, the ability to document and archive inspection data provides traceability and accountability, which are crucial for both public and private sector stakeholders. This regulatory push is expected to remain a major catalyst for market expansion throughout the forecast period.
From a regional perspective, North America currently holds the largest share of the crawler camera for pipelines market, driven by extensive pipeline networks, high maintenance standards, and a mature regulatory framework. Europe follows closely, with a strong focus on upgrading aging infrastructure and adopting advanced inspection technologies. The Asia Pacific region, however, is projected to exhibit the highest growth rate, fueled by rapid urbanization, industrial development, and significant investments in water and energy infrastructure. Latin America and the Middle East & Africa are also emerging as important markets, supported by expanding oil & gas exploration activities and government initiatives to enhance water management systems. This dynamic regional landscape underscores the global relevance and growth potential of crawler camera solutions.
The crawler camera for pipelines market is segmented by product type into push rod cameras,
Facebook
Twitter
According to our latest research, the global magnetic crawler tank inspection market size reached USD 675.2 million in 2024, reflecting the sector’s robust adoption across critical infrastructure industries. The market is forecasted to grow at a CAGR of 8.1% from 2025 to 2033, reaching a projected value of USD 1,327.4 million by 2033. This growth is primarily driven by the increasing necessity for non-destructive, efficient, and safe inspection methods in hazardous and inaccessible tank environments, particularly in oil & gas, chemical, and municipal sectors.
The surge in demand for magnetic crawler tank inspection solutions is fundamentally underpinned by the growing focus on asset integrity management and regulatory compliance. Industries such as oil & gas, water & wastewater, and chemical processing are under mounting pressure to minimize operational risks, prevent leaks, and ensure safety. The deployment of advanced magnetic crawler systems enables operators to conduct thorough inspections of storage tanks, pipelines, and confined spaces without the need for human entry, thereby reducing health hazards and downtime. As industries continue to prioritize preventive maintenance and risk mitigation, the adoption of these robotic inspection technologies is expected to intensify, further propelling market expansion.
Technological advancements are another significant growth catalyst in the magnetic crawler tank inspection market. Innovations such as high-definition visual sensors, ultrasonic and eddy current testing modules, and enhanced magnetic adhesion systems have greatly improved the accuracy, efficiency, and versatility of these crawlers. Modern systems are capable of traversing complex geometries, vertical surfaces, and even submerged environments, expanding their applicability across a broader range of tank and vessel types. The integration of data analytics, remote monitoring, and real-time reporting has also enabled operators to make informed decisions quickly, streamlining maintenance workflows and reducing operational costs. These technological improvements are expected to attract further investments and foster the development of next-generation inspection solutions.
Additionally, the global shift towards sustainability and environmental stewardship is influencing the growth trajectory of the magnetic crawler tank inspection market. With stricter environmental regulations and increasing public scrutiny on industrial emissions and leaks, facility operators are compelled to adopt reliable inspection technologies to detect and address potential vulnerabilities proactively. Magnetic crawler systems, with their ability to access hard-to-reach areas and deliver precise diagnostic data, are becoming indispensable tools for ensuring compliance and minimizing environmental impact. This trend is particularly pronounced in regions with aging infrastructure, where the risk of catastrophic failures is higher, further driving demand for advanced inspection solutions.
From a regional perspective, North America currently leads the magnetic crawler tank inspection market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The dominance of North America can be attributed to the presence of a mature oil & gas industry, stringent regulatory frameworks, and early adoption of robotic inspection technologies. Europe’s market growth is fueled by its focus on industrial safety and sustainability, while Asia Pacific is experiencing rapid expansion due to increasing industrialization, infrastructure investments, and rising awareness of asset integrity management. Latin America and the Middle East & Africa are also witnessing gradual adoption, driven by investments in energy and water infrastructure. This regional diversification highlights the global relevance and growth potential of magnetic crawler tank inspection technologies.
The product type segm
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for "wmt19"
Dataset Summary
Warning: There are issues with the Common Crawl corpus data (training-parallel-commoncrawl.tgz):
Non-English files contain many English sentences.
Their "parallel" sentences in English are not aligned: they are uncorrelated with their counterpart.
We have contacted the WMT organizers, and in response, they have indicated that they do not have plans to update the Common Crawl corpus data. Their rationale pertains… See the full description on the dataset page: https://huggingface.co/datasets/wmt/wmt19.
Facebook
TwitterThe purpose of the project is to make available a standard training and test setup for language modeling experiments.
The training/held-out data was produced from the WMT 2011 News Crawl data using a combination of Bash shell and Perl scripts distributed here.
This also means that your results on this data set are reproducible by the research community at large.
Besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten feld-out data sets, for each of the following baseline models:
unpruned Katz (1.1B n-grams) pruned Katz (~15M n-grams) unpruned Interpolated Kneser-Ney (1.1B n-grams) pruned Interpolated Kneser-Ney (~15M n-grams) Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Link to paper: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41880.pdf
Happy benchmarking!
Facebook
TwitterDatabase Description: (a) Title Bar Crawl: Detecting Heavy Drinking (b) Abstract Accelerometer and transdermal alcohol content data from a college bar crawl. Used to predict heavy drinking episodes via mobile data.
Sources: (a) Owner of database Jackson A Killian (jkillian@g.harvard.edu, Harvard University); Danielle R Madden (University of Southern California); John Clapp (University of Southern California) (b) Donor of database Jackson A Killian (jkillian@g.harvard.edu, Harvard University); Danielle R Madden (University of Southern California); John Clapp (University of Southern California) (c) Date collected May 2017 (d) Date submitted Jan 2020
Past Usage: (a) Complete reference of article where it was described/used: Killian, J.A., Passino, K.M., Nandi, A., Madden, D.R. and Clapp, J., Learning to Detect Heavy Drinking Episodes Using Smartphone Accelerometer Data. In Proceedings of the 4th International Workshop on Knowledge Discovery in Healthcare Data co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019) (pp. 35-42). http://ceur-ws.org/Vol-2429/paper6.pdf (b) Indication of what attribute(s) were being predicted Features: Three-axis time series accelerometer data Target: Time series transdermal alcohol content (TAC) data (real-time measure of intoxication) (c) Indication of study's results The study decomposed each time series into 10 second windows and performed binary classification to predict if windows corresponded to an intoxicated participant (TAC >= 0.08) or sober participant (TAC < 0.08). The study tested several models and achieved a test accuracy of 77.5% with a random forest.
Relevant Information: All data is fully anonymized.
Data was originally collected from 19 participants, but the TAC readings of 6 participants were deemed unusable by SCRAM [1]. The data included is from the remaining 13 participants.
Accelerometer data was collected from smartphones at a sampling rate of 40Hz (file: all_accelerometer_data_pids_13.csv). The file contains 5 columns: a timestamp, a participant ID, and a sample from each axis of the accelerometer. Data was collected from a mix of 11 iPhones and 2 Android phones as noted in phone_types.csv. TAC data was collected using SCRAM [2] ankle bracelets and was collected at 30 minute intervals. The raw TAC readings are in the raw_tac directory. TAC readings which are more readily usable for processing are in clean_tac directory and have two columns: a timestamp and TAC reading. The cleaned TAC readings: (1) were processed with a zero-phase low-pass filter to smooth noise without shifting phase; (2) were shifted backwards by 45 minutes so the labels more closely match the true intoxication of the participant (since alcohol takes about 45 minutes to exit through the skin.) Please see the above referenced study for more details on how the data was processed (http://ceur-ws.org/Vol-2429/paper6.pdf).
1 - https://www.scramsystems.com/ 2 - J. Robert Zettl. The determination of blood alcohol concentration by transdermal measurement. https://www.scramsystems.com/images/uploads/general/research/the-determination-of-blood-alcohol-concentrationby-transdermal-measurement.pdf, 2002.
Number of Instances: Accelerometer readings: 14,057,567 TAC readings: 715 Participants: 13
Number of Attributes:
For Each Attribute: (Main) all_accelerometer_data_pids_13.csv: time: integer, unix timestamp, milliseconds pid: symbolic, 13 categories listed in pids.txt x: continuous, time-series y: continuous, time-series z: continuous, time-series clean_tac/*.csv: timestamp: integer, unix timestamp, seconds TAC_Reading: continuous, time-series phone_type.csv: pid: symbolic, 13 categories listed in pids.txt phonetype: symbolic, 2 categories (iPhone, Android)
(Other) raw/*.xlsx: TAC Level: continuous, time-series IR Voltage: continuous, time-series Temperature: continuous, time-series Time: datetime Date: datetime
Missing Attribute Values: None
Target Distribution: TAC is measured in g/dl where 0.08 is the legal limit for intoxication while driving Mean TAC: 0.065 +/- 0.182 Max TAC: 0.443 TAC Inner Quartiles: 0.002, 0.029, 0.092 Mean Time-to-last-drink: 16.1 +/- 6.9 hrs
Facebook
Twitterhttps://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
📀 Falcon RefinedWeb
Falcon RefinedWeb is a massive English web dataset built by TII and released under an ODC-By 1.0 license. See the 📓 paper on arXiv for more details. RefinedWeb is built through stringent filtering and large-scale deduplication of CommonCrawl; we found models trained on RefinedWeb to achieve performance in-line or better than models trained on curated datasets, while only relying on web data. RefinedWeb is also "multimodal-friendly": it contains links and alt… See the full description on the dataset page: https://huggingface.co/datasets/tiiuae/falcon-refinedweb.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Catalan-English parallel corpus MaCoCu-ca-en 1.0 was built by crawling the ".cat", ".es", ".ad", ".fr", ".it" and ".eu” internet top-level domain in 2022, extending the crawl dynamically to other domains as well.
All the crawling process was carried out by the MaCoCu crawler (https://github.com/macocu/MaCoCu-crawler). Websites containing documents in both target languages were identified and processed using the tool Bitextor (https://github.com/bitextor/bitextor). Considerable effort was devoted into cleaning the extracted text to provide a high-quality parallel corpus. This was achieved by removing boilerplate and near-duplicated paragraphs and documents that are not in one of the targeted languages. Document and segment alignment as implemented in Bitextor were carried out, and Bifixer (https://github.com/bitextor/bifixer) and BicleanerAI (https://github.com/bitextor/bicleaner-ai) were used for fixing, cleaning, and deduplicating the final version of the corpus.
The corpus is available in three formats: two sentence-level formats, TXT and TMX, and a document-level TXT format. TMX is an XML-based format and TXT is a tab-separated format. They both consist of pairs of source and target segments (one or several sentences) and additional metadata. The following metadata is included in both sentence-level formats: - source and target document URL; - paragraph ID which includes information on the position of the sentence in the paragraph and in the document (e.g., “p35:77s1/3” which means “paragraph 35 out of 77, sentence 1 out of 3”); - quality score as provided by the tool Bicleaner AI (a likelihood of a pair of sentences being mutual translations, provided with a score between 0 and 1); - similarity score as provided by the sentence alignment tool Bleualign (value between 0 and 1); - personal information identification (“biroamer-entities-detected”): segments containing personal information are flagged, so final users of the corpus can decide whether to use these segments; - translation direction and machine translation identification (“translation-direction”): the source segment in each segment pair was identified by using a probabilistic model (https://github.com/RikVN/TranslationDirection), which also determines if the translation has been produced by a machine-translation system; - a DSI class (“dsi”): information whether the segment is connected to any of Digital Service Infrastructure (DSI) classes (e.g., cybersecurity, e-health, e-justice, open-data-portal), defined by the Connecting Europe Facility (https://github.com/RikVN/DSI); - English language variant: the language variant of English (British or American, using a lexicon-based English variety classifier - https://pypi.org/project/abclf/) was identified on document and domain level.
Furthermore, the sentence-level TXT format provides additional metadata: - web domain of the text; - source and target document title; - the date when the original file was retrieved; - the original type of the file (e.g., “html”), from which the sentence was extracted; - paragraph quality (labels, such as “short” or “good”, assigned based on paragraph length, URL and stopword density via the jusText tool - https://corpus.tools/wiki/Justext); - information whether the sentence is a heading or not in the original document.
The document-level TXT format provides pairs of documents identified to contain parallel data. In addition to the parallel documents (in base64 format), the corpus includes the following metadata: source and target document URL, a DSI category and the English language variant (British or American).
Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus.
This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author’s view. The Agency is not responsible for any use that may be made of the information it contains.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Montenegrin-English parallel corpus MaCoCu-cnr-en 1.0 was built by crawling the “.me” internet top-level domain in 2021 and 2022, extending the crawl dynamically to other domains as well.
All the crawling process was carried out by the MaCoCu crawler (https://github.com/macocu/MaCoCu-crawler). Websites containing documents in both target languages were identified and processed using the tool Bitextor (https://github.com/bitextor/bitextor). Considerable effort was devoted into cleaning the extracted text to provide a high-quality parallel corpus. This was achieved by removing boilerplate and near-duplicated paragraphs and documents that are not in one of the targeted languages. Document and segment alignment as implemented in Bitextor were carried out, and Bifixer (https://github.com/bitextor/bifixer) and BicleanerAI (https://github.com/bitextor/bicleaner-ai) were used for fixing, cleaning, and deduplicating the final version of the corpus.
The corpus is available in three formats: two sentence-level formats, TXT and TMX, and a document-level TXT format. In each format, the texts are separated based on the script into two files: a Latin and a Cyrillic subcorpus. TMX is an XML-based format and TXT is a tab-separated format. They both consist of pairs of source and target segments (one or several sentences) and additional metadata. The following metadata is included in both sentence-level formats: - source and target document URL; - paragraph ID which includes information on the position of the sentence in the paragraph and in the document (e.g., “p35:77s1/3” which means “paragraph 35 out of 77, sentence 1 out of 3”); - quality score as provided by the tool Bicleaner AI (a likelihood of a pair of sentences being mutual translations, provided with a score between 0 and 1); - similarity score as provided by the sentence alignment tool Bleualign (value between 0 and 1); - personal information identification (“biroamer-entities-detected”): segments containing personal information are flagged, so final users of the corpus can decide whether to use these segments; - translation direction and machine translation identification (“translation-direction”): the source segment in each segment pair was identified by using a probabilistic model (https://github.com/RikVN/TranslationDirection), which also determines if the translation has been produced by a machine-translation system; - a DSI class (“dsi”): information whether the segment is connected to any of Digital Service Infrastructure (DSI) classes (e.g., cybersecurity, e-health, e-justice, open-data-portal), defined by the Connecting Europe Facility (https://github.com/RikVN/DSI); - English language variant: the language variant of English (British or American, using a lexicon-based English variety classifier - https://pypi.org/project/abclf/) was identified on document and domain level.
Furthermore, the sentence-level TXT format provides additional metadata: - web domain of the text; - source and target document title; - the date when the original file was retrieved; - the original type of the file (e.g., “html”), from which the sentence was extracted; - paragraph quality (labels, such as “short” or “good”, assigned based on paragraph length, URL and stopword density via the jusText tool - https://corpus.tools/wiki/Justext); - information whether the sentence is a heading or not in the original document.
The document-level TXT format provides pairs of documents identified to contain parallel data. In addition to the parallel documents (in base64 format), the corpus includes the following metadata: source and target document URL, a DSI category and the English language variant (British or American).
Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus.
This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author’s view. The Agency is not responsible for any use that may be made of the information it contains.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterData Set Information:
Relevant Information: All data is fully anonymized.
Data was originally collected from 19 participants, but the TAC readings of 6 participants were deemed unusable by SCRAM [1]. The data included is from the remaining 13 participants.
Accelerometer data was collected from smartphones at a sampling rate of 40Hz (file: all_accelerometer_data_pids_13.csv). The file contains 5 columns: a timestamp, a participant ID, and a sample from each axis of the accelerometer. Data was collected from a mix of 11 iPhones and 2 Android phones as noted in phone_types.csv. TAC data was collected using SCRAM [2] ankle bracelets and was collected at 30 minute intervals. The raw TAC readings are in the raw_tac directory. TAC readings which are more readily usable for processing are in clean_tac directory and have two columns: a timestamp and TAC reading. The cleaned TAC readings: (1) were processed with a zero-phase low-pass filter to smooth noise without shifting phase; (2) were shifted backwards by 45 minutes so the labels more closely match the true intoxication of the participant (since alcohol takes about 45 minutes to exit through the skin.) Please see the above referenced study for more details on how the data was processed ([Web Link]).
1 - [Web Link] 2 - J. Robert Zettl. The determination of blood alcohol concentration by transdermal measurement. [Web Link], 2002.
Number of Instances: Accelerometer readings: 14,057,567 TAC readings: 715 Participants: 13
Number of Attributes: - Time series: 3 axes of accelerometer data (columns x, y, z in all_accelerometer_data_pids_13.csv) - Static: 1 phone-type feature (in phone_types.csv) - Target: 1 time series of TAC for each of the 13 participants (in clean_tac directory).
For Each Attribute: (Main) all_accelerometer_data_pids_13.csv: time: integer, unix timestamp, milliseconds pid: symbolic, 13 categories listed in pids.txt x: continuous, time-series y: continuous, time-series z: continuous, time-series clean_tac/*.csv: timestamp: integer, unix timestamp, seconds TAC_Reading: continuous, time-series phone_type.csv: pid: symbolic, 13 categories listed in pids.txt phonetype: symbolic, 2 categories (iPhone, Android)
(Other) raw/*.xlsx: TAC Level: continuous, time-series IR Voltage: continuous, time-series Temperature: continuous, time-series Time: datetime Date: datetime
Missing Attribute Values: None
Target Distribution: TAC is measured in g/dl where 0.08 is the legal limit for intoxication while driving Mean TAC: 0.065 +/- 0.182 Max TAC: 0.443 TAC Inner Quartiles: 0.002, 0.029, 0.092 Mean Time-to-last-drink: 16.1 +/- 6.9 hrs
Attribute Information:
Provide information about each attribute in your data set.
Relevant Papers:
Past Usage: (a) Complete reference of article where it was described/used: Killian, J.A., Passino, K.M., Nandi, A., Madden, D.R. and Clapp, J., Learning to Detect Heavy Drinking Episodes Using Smartphone Accelerometer Data. In Proceedings of the 4th International Workshop on Knowledge Discovery in Healthcare Data co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019) (pp. 35-42). Web Link Indication of what attribute(s) were being predicted Features: Three-axis time series accelerometer data Target: Time series transdermal alcohol content (TAC) data (real-time measure of intoxication) (c) Indication of study's results The study decomposed each time series into 10 second windows and performed binary classification to predict if windows corresponded to an intoxicated participant (TAC >= 0.08) or sober participant (TAC < 0.08). The study tested several models and achieved a test accuracy of 77.5% with a random forest.
Citation Request:
When using this dataset, please cite: Killian, J.A., Passino, K.M., Nandi, A., Madden, D.R. and Clapp, J., Learning to Detect Heavy Drinking Episodes Using Smartphone Accelerometer Data. In Proceedings of the 4th International Workshop on Knowledge Discovery in Healthcare Data co-located with the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019) (pp. 35-42). [Web Link]