The RIMES database (Reconnaissance et Indexation de données Manuscrites et de fac similÉS / Recognition and Indexing of handwritten documents and faxes) was created to evaluate automatic systems of recognition and indexing of handwritten letters. Of particular interest are cases such as those sent by postal mail or fax by individuals to companies or administrations.
The database was collected by asking volunteers to write handwritten letters in exchange of gift vouchers. Volunteer were given a fictional identity (same sex as the real one) and up to 5 scenarios. Each scenario has been chosen among 9 realistic following themes : change of personal information (address, bank account), information request, opening and closing (customer account), modification of contract or order, complaint (bad service quality…), payment difficulties (asking for a delay, tax exemption…), reminder letter, damage declaration with further circumstances and a destination (administrations or service providers (telephone, power, bank, insurances). The volunteers composed a letter with those pieces of information using their own words. The layout was free and it was only asked to use white paper and to write in a readable way with black ink.
The collect was a success with more than 1,300 people who have participated to the RIMES database creation by writing up to 5 mails. The RIMES database thus obtained contains 12,723 pages corresponding to 5605 mails of two to three pages.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Introduction
The RIMES-database (Reconnaissance et Indexation de données Manuscrites et de fac similÉS / Recognition and Indexing of handwritten documents and faxes) comprises handwritten correspondence letters, in French, “sent” by individuals to companies or administrations; all correspondence is fictitious and there is no PII in the records.
The database was collected by asking volunteers to write handwritten letters in exchange for gift vouchers. Volunteers were given a fictitious identity (same sex as the real one) and up to 5 scenarios. Each scenario was chosen from among 9 realistic topics: change of personal data (address, bank account), request for information, opening and closing (customer account), change of contract or order, complaint (e.g. poor quality of service), payment difficulties (request for delay, tax exemption, etc.), reminder, claim with other circumstances and a target (administrations or service providers such as telephone, electricity, bank, insurance companies). The volunteers wrote a letter with this information in their own words. The layout was free and the only request was to use white paper and to write legibly in black ink.
Contents
There are several files in this record; they correpond to the whole dataset and also some subsets that were created during the project.
The Communications part is where the full document images and their annotations are stored. We give some detail of what the annotations comprise and briefly describe the other subsets; NB there was a subset comprising images from the cropped logos, which are not distributed here, due to some issues with the annotations.
Communications -- Images_Courriers.zip
There are in total 5605 communications, each containing from 2 to 3 pages:
One correspondence (mandatory)
One questionnaire (mandatory)
One fax (optional)
Filenames are constituted as follows:
Communication number [1, 5605]
Underscore “_”
One letter [F, L, Q]
L for correspondence/Letter (Lettre, in French)
Questionnaire
Fax
The images and the corresponding annotations are split into 3 folders:
DVD1: images from 1 to 1799
DVD2: images from 1800 to 3699
DVD3: images from 3700 to 5605
There are in total 12610 images.
The annotation files are in xml format and support different tasks:
Document Structure Identification
Handwritten text recognition
Writer recognition
Information Extraction
Document Structure Identification
For the document structure, there are 8 "types" to be identified in the Letters for the different blocks of text in the image:
Sender address
Recipient address
Date/Place (each is annotated with it's own tag)
Subject
Introduction (Ouverture in French)
Text body
Signature
PS / annex
Types 1 and 2 have further details, if it is a person or a corporate entity and address can also contain telephone/fax number.
For example:
Coordonnées Expéditeur (Sender Address)
Maxime Granier 13 Grand rue 57370 Dames et Quatre Vents
In the case of Faxes, the types can be further complemented by the type of text:
Dactylographié, which stands for Printed;
Manuscrit, for Handwritten text.
For instance:
Expéditeur_autre / Dactylographié
De :
Expéditeur_personne / Manuscrit
Lucie FOURES
Questionnaires have several other types, but there is no remanescent documentation about them.
Handwritten text recognition
There are annotations for the paragraphs; line breaks are indicated with " ". The transcriptions are verbatim and contain the same spelling and grammar errors that could be seen in the pages. When there could be more than one possible spelling (j’essaie/j’essaye, événement/évènement, ultrason/ultra son, and writing errors), the options are in the ground truth following a special construction:
¤{alternative_1/alternative_2}¤
Writer recognition
There is an identity for each writer in the database, so the usual tasks of identification and verification can be realized. The writer is identified in the usual French form, with family name, in all caps, first followed by the given name, for instance:
GRANIER Maxime
Information Extraction
Nine scenarios are annotated for the different types of communication. We provide some free translations into English.
Scenario Free translation
Changement de données personnelles Change of personal data
Demande d'information Request for information
Difficulté de paiement Payment Difficulties
Fermeture de compte Account Closure
Gestion de sinistre Claims Handling
Modification de contrat / Commande Contract Changes / Order
Ouverture de compte Account Opening
Réclamation Complaint
Relance de courrier sans réponse Correspondance Reminder
Paragraphs -- images_blocs_de_texte.zip
The main body of the letters was cropped from the full page images and stored as grayscale JPEG images and split into 3 folders:
DVD1: images from 1 to 1799 (1796 images)
DVD2: images from 1800 to 3699 (1899 images)
DVD3: images from 3700 to 5605 (1905 images)
The transcriptions can be obtained from the corresponding Communications transcriptions; the numbers in the filenames correspond to the communications.
Cursive words -- imagettes_mots_cursif.zip
The paragraphs were split into lines and each line was further split into words.
The images were split into 57 blocks (lot in French) organized in folders named:
lot_N_rimes_version_definitive, where N is the block number [1, 57]
Each folder has data from 100 letters, further organized into sub-folders following the convention:
_L
Then each sub-folder has one image per word, with naming:
L_.tiff
Where Line Number and Word position start from 0. The transcription should be inferred from the corresponding Communications transcriptions.
Character snippets -- imagettes_caracteres.zip
Words were split to characters (A to Z) and digits (0 to 9), totalling 95269 images. They are distributed in 3 folders:
characters_rimes_DVD1
characters_rimes_DVD2
characters_rimes_DVD3
Each is further divided into 4 blocks (lot in French):
lot_1
lot_2
lot_3
lot_4
The naming of the image files follows the following:
_.png
Where Class is in [A-Z0-9].
Acknowledgments
This dataset was originally collected and prepared in 2007 by the following partners: DGA/CTA/DT/GIP - CEP Arcueil; TSP – ARTEMIS Télécom SudParis; and A2iA SA, as part of the Techno-Vision project. This project was funded by the French ministries for Research and Defense (Ministère de la Recherche and Ministère de la Défense).
After the acquisition of A2iA SA in September 2018, Mitek Systems, Inc. became a legal owner of the dataset, and decided to release it publicly – which was one of the objectives of the project after its conclusion – under a permissive license in 2024, to encourage open science.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
RIMES-2011 - line level
Dataset Summary
The RIMES-2011 database (Recognition and Indexation of handwritten documents and faxes) was created to evaluate automatic recognition and indexing systems for handwritten letters. The database was collected by asking volunteers to write handwritten letters in exchange for gift certificates. Volunteers were given a fictitious identity (same gender as the real one) and up to 5 scenarios. Each scenario was chosen from among 9… See the full description on the dataset page: https://huggingface.co/datasets/Teklia/RIMES-2011-line.
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Data collecting by local state and local health agencies. Compiled and visualized by The New York Times.
This is the US Coronavirus data repository from The New York Times here U.S. coronavirus interactive site. This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available. For additional reporting and data visualizations, see The New York Times’ Interactive coronavirus data tool.
Data source: https://github.com/nytimes/covid-19-data
This dataset includes TIMES model files associated with generating scenarios for the paper (https://iopscience.iop.org/article/10.1088/2753-3751/ad958b). The study utilized US EPA's TIMES database version: EPAUS9rT_v20.4. The other file includes underlying data used for figures in the manuscript (FigureData_formatted.xlsx). This dataset is associated with the following publication: Zalesak, A., N. Kittner, D. Loughlin, and P. Kaplanakman. Evaluation of energy, carbon dioxide, and air emission implications of medium- and heavy-duty truck electrification in the United States using EPA’s regional TIMES energy systems model. Environmental Research: Energy. IOP Publishing, BRISTOL, UK, 1: 045018, (2024).
Financial Times Interactive Data LLC offers a vast repository of economic and financial data, providing valuable insights into global markets and trading. With a focus on delivering timely and accurate information, the company has established itself as a go-to source for financial institutions, investors, and researchers seeking to stay ahead of the curve.
our vast database is comprised of historic financial statements, economic indicators, and proprietary data from leading sources, including government agencies, regulatory bodies, and industry associations. By providing access to this trove of information, Financial Times Interactive Data LLC enables its clients to make informed decisions, identify trends, and uncover new opportunities in the rapidly evolving world of finance.
The table US Counties is part of the dataset New York Times US Coronavirus Database, available at https://columbia.redivis.com/datasets/mgcj-asjsw1awy. It contains 1079469 rows across 6 variables.
This dataset provides information about the number of properties, residents, and average property values for Rimes Court cross streets in Santa Maria, CA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a database containing techno-economic data for fuel production technologies, including data for stand-alone technologies, technologies co-producing district heating and technologies producing heat for integration with industries. The database is a compilation of information from literature, specifically tailored for use in TIMES models. Even though this specific database has been developed for TIMES-Sweden, the data can also be applied for other regions. The database is continuously updated as work progresses with the TIMES-Sweden model.
Preferably to be used in combination with TIMES-Sweden Industry database (10.5281/zenodo.4139800), and TIMES-Sweden (Industrial) Heat generation technologies database (10.5281/zenodo.6372930).
This Database is also a part of the IEA ETSAP SubRES project, with the aim to make techno-economic data more accessible. More information about ETSAP can be found here: https://iea-etsap.org/
More information about TIMES-Sweden and the modelling team can be found here: http://www.ltu.se/TIMES-Sweden
Data collecting by local state and local health agencies. Compiled and visualized by The New York Times.
The purpose of the data set is to provide multi-modal data and contextual information (weather and incidents) that can be used to research and develop applications for the USDOT Dynamic Mobility Applications (DMA) program. This legacy dataset was created before data.transportation.gov and is only currently available via the attached file(s). Please contact the dataset owner if there is a need for users to work with this data using the data.transportation.gov analysis features (online viewing, API, graphing, etc.) and the USDOT will consider modifying the dataset to fully integrate in data.transportation.gov. Additional related data can be found here: https://data.transportation.gov/Automobiles/Seattle-20-Second-Freeway/ixg2-6cni
This dataset provides information about the number of properties, residents, and average property values for Bohler Rimes Road cross streets in Statesboro, GA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a database containing techno-economic data for industrial technologies. The database is a compilation of information from literature, specifically tailored for use in TIMES models. Even though this specific database has been developed for TIMES-Sweden, the data can also be applied for other regions. The database is continuously updated as work progresses with the TIMES-Sweden model.
Preferably to be used in combination with TIMES-Sweden Fuel production technologies database (10.5281/zenodo.6372926), and TIMES-Sweden (Industrial) Heat generation technologies database (10.5281/zenodo.6372930).
This Database is also a part of the ETSAP SubRES project, with the aim to make techno-ecnomic data more accessible. More information about ETSAP can be found here: https://iea-etsap.org/
More information about TIMES-Sweden and the modelling team can be found here: http://www.ltu.se/TIMES-Sweden
Rosalia Times Series Database
The BOKU (University of Natural Resources and Life Sciences Vienna) university demonstration forest Rosalia with an area of 950 ha has been used for research and education since 1875. In 2013 – upon an initiative of a group of researchers in various disciplines – it was decided to extend the so far mainly forestry oriented activities by implementing a hydrological experimental research watershed. The overall objective is to collect data that support the study of transport processes in the system of soil, water, plants and atmosphere. More specifically, emphasis is on bridging the gap between point related measurements and effective values and parameters required for modelling flow and transport processes in watersheds.
2 Objectives
The main objectives for the research watershed are
Operation is planned for a period of at least 10 years using only internal resources of the university, to avoid potential interruptions due to project-based short-term availability of personal and financial resources.
The objective of this article is to present the research watershed, the data collected and to make these data accessible to the research community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the data and files used in: Lachlan Clapin and Thomas Longden (2024) Waiting to generate: an analysis of onshore wind and solar PV project development lead-times in Australia, Energy Economics (forthcoming).
Abstract: The feasibility of near-term renewable targets will depend upon the time taken to get projects completed. Very few studies assess total project lead-times for renewable energy projects. This study investigates the determinants of lead-times for 170 onshore wind and solar PV projects completed in Australia between 2000 and 2023. We track multiple project stages and estimate the impacts of changes in ownership, experience, approval processes, rule changes, and a commissioning process that differs by size of generation. Australia has had a notable improvement in lead-times. Solar projects that commenced before 2010 had an average lead-time of 83 months (min: 63, max: 102). This decreased to 41 months for solar projects (min: 19, max: 75) that commenced after 2016. Onshore wind projects took longer to develop. Project lead-times were 136 months (min: 50, max: 200) when they commenced before 2005. This decreased to 53 months (min: 20, max: 85) for projects starting after 2016. Pre-construction lead-times decreased notably for both solar and wind, which implies that the approval process did improve. There is evidence from one jurisdiction that this did occur, particularly for onshore wind projects. Over the same period, commissioning lead-times were similar for wind projects but have increased for solar projects. On average, commissioning took up to 7 months longer after a change in the re-iterative process of tests and equipment changes to meet generator performance standards. Changes in project ownership occurred often (38% of projects) but this had little impact on lead-times. Accurate estimates of lead-times are important for investors, project-owners and policymakers. Yet, they are rarely reported.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The creation of the database and research using its contents were funded by a project financed by the National Science Centre (Poland), grant no. 2021/43/B/HS3/02636.
Maintain the wait time between first contact and face-to-face visit for behavioral health treatment to less than 3 days every year through 2018.
This dataset includes data collected to validate a method of analysis for arsenic species As(III), As(V), dimethylarsinate (DMA), and monomethylarsonate (MMA) in surface water and groundwater samples. It also includes site information and water chemistry for samples used in the validation and used for other studies of arsenic species in groundwater. It includes speciation data measured at multiple time points to establish or verify hold times and stability of As species in surface water and groundwater samples.
The RIMES database (Reconnaissance et Indexation de données Manuscrites et de fac similÉS / Recognition and Indexing of handwritten documents and faxes) was created to evaluate automatic systems of recognition and indexing of handwritten letters. Of particular interest are cases such as those sent by postal mail or fax by individuals to companies or administrations.
The database was collected by asking volunteers to write handwritten letters in exchange of gift vouchers. Volunteer were given a fictional identity (same sex as the real one) and up to 5 scenarios. Each scenario has been chosen among 9 realistic following themes : change of personal information (address, bank account), information request, opening and closing (customer account), modification of contract or order, complaint (bad service quality…), payment difficulties (asking for a delay, tax exemption…), reminder letter, damage declaration with further circumstances and a destination (administrations or service providers (telephone, power, bank, insurances). The volunteers composed a letter with those pieces of information using their own words. The layout was free and it was only asked to use white paper and to write in a readable way with black ink.
The collect was a success with more than 1,300 people who have participated to the RIMES database creation by writing up to 5 mails. The RIMES database thus obtained contains 12,723 pages corresponding to 5605 mails of two to three pages.