2 datasets found
  1. d

    Data from: Stillwater Complex, Montana—logs of core drilled by Stillwater...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Stillwater Complex, Montana—logs of core drilled by Stillwater Mining Company and Anaconda Copper Corp. in the Stillwater Mine area, 1983 to 1989 [Dataset]. https://catalog.data.gov/dataset/stillwater-complex-montanalogs-of-core-drilled-by-stillwater-mining-company-and-anaconda-c
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This dataset includes TIFF (Tagged Image File Format) images of graphic drill core logs showing associated drill core information, a TIFF image of the explanation for the lithology and structure sections of the logs, an Esri shapefile of the locations of the drill holes, and 12 .csv files of tabular data that were compiled from handwritten drill core logs. The drill core is from the Stillwater Mine area of the Stillwater Complex, Montana and was drilled from 1983 to 1989 by the Stillwater Mining Company and Anaconda Copper Corp. The data shown in the graphic drill logs and contained within the .csv files includes lithologic, structure, percent recovery, grain size, sulfide, nickel, copper, platinum, and palladium mineralization information. The graphic drill logs were created using Golden software's Strater 5 drill core visualization software and are provided with both logarithmic and linear scales where applicable. The graphic drill logs are plotted using the depth recorded in the drill logs and do not reflect stratigraphic true thickness. All instances of question marks ("?") represent original data as written by the geologist. In areas where the hand-written notes were unreadable, the notation of "[unreadable]" was used. See USGS SIR 2014-5183 (https://pubs.usgs.gov/sir/2014/5183/) for report and spatial data relating to the Stillwater Complex.

  2. Data from: The e-NDP project : collaborative digital edition of the Chapter...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claustre, Julie; Smith, Darwin; Torres Aguilar, Sergio; Bretthauer, Isabelle; Brochard, Pierre; Canteaut, Olivier; Cottereau, Emilie; Delivré, Fabrice; Denglos, Mathilde; Jolivet, Vincent; Julerot, Véronique; Kouamé, Thierry; Lusset, Elisabeth; Massoni, Anne; Nadiras, Sebastien; Perreaux, Nicolas; Regazzi, Hugo; Treglia, Mathilde (2023). The e-NDP project : collaborative digital edition of the Chapter registers of Notre-Dame of Paris (1326-1504). Ground-truth for handwriting text recognition (HTR) on late medieval manuscripts. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7575692
    Explore at:
    Dataset updated
    Feb 13, 2023
    Dataset provided by
    LaMOP
    Université du Luxembourg
    Université de Franche-Comté
    Université de Limoges
    Université de Paris 1 Panthéon-Sorbonne
    École nationale des chartes, Paris Sciences et Lettres
    Archives nationales de France
    Authors
    Claustre, Julie; Smith, Darwin; Torres Aguilar, Sergio; Bretthauer, Isabelle; Brochard, Pierre; Canteaut, Olivier; Cottereau, Emilie; Delivré, Fabrice; Denglos, Mathilde; Jolivet, Vincent; Julerot, Véronique; Kouamé, Thierry; Lusset, Elisabeth; Massoni, Anne; Nadiras, Sebastien; Perreaux, Nicolas; Regazzi, Hugo; Treglia, Mathilde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Paris
    Description

    The e-NDP project, funded by the ANR, is led by the LaMOP (Julie Claustre and Darwin Smith).

    The project's partners are the Archives nationales, the Bibliothèque nationale de France (Department of Manuscripts, Bibliothèque de l'Arsenal), the École nationale des chartes and the Bibliothèque Mazarine.

    The e-NDP project aims at renewing our knowledge on Notre-Dame de Paris cathedral through the creation of a collaborative digital edition of the registers of its Chapter (1326-1504, AN LL 105-128), the community of 51 canons meeting three times a week on set days to take all administrative, financial and practical decisions pertaining to the cathedral, its estate and the society living in its cloister. This corpus has never been the object of a comprehensive study to understand the workings and history of this urban enclave and powerful community. The collaborative digital edition is based on a process of handwriting text recognition (HTR), tested and supervised by scholars, researchers and engineers combining expertise in Medieval history, paleography, philology and digital humanities. The edition shall allow a better insight into the Chapter’s administration, into its economical and political power within Paris, and the relationships it maintained with other institutions in the city.

    Section 1 : The e-NDP ground-truth dataset for Handwriting text recognition.

    The full e-NDP corpus kept today in the French National Archives and was entirely digitized and described in its catalog in 2022.

    The first major goal of the e-NDP projet is to propose a first automatic transcription of the 14k pages composing the 26 chapter registers. To achieve this goal representative samples from each one of the volumes were selected and transcribed in order to train a specialized HTR model able to propose a high quality automatic transcription. The collected ground-truth released on this repository currently has 512 pages from the 26 registers of the cathedral chapter preserved in the National Archives (LL105 - LL128, 1326-1504). The transcriptions were manually completed in two rounds by a group of 12 contributors, historians and paleographers, over the course of 2021-2022 using eScriptorium as annotation environment.

    Ground-truth features :

    Number of hands : according to our estimates no fewer than 18 main hands were involved in the writing of the registers during the medieval period.

    Language : More than 98% of the content of the registers was written in Latin, the rest in French. The exact percentage is hard to estimate because the vernacular language is often used in formulae, notes and comments. It is rare to find entire pages or blocks written in French.

    Script family : The registers were written using a Cursive script (ca. late XIIIe - XVIe).

    Documental typology : The volumes containing the chapter conclusions were conceived to serve as memorial records, but above all as documents for regular use and consultation in the daily practice of administration and management. In diplomatics the notion of "documentary manuscripts" is used to describe this kind of sources also by opposition to books and litterary or normative manuscripts.

    Ground truth statistics
    
    
        Text units
        Count
    
    
        Pages
        512
    
    
        Annotated regions (see section 2)
        2448
    
    
        Lines of text
        34231
    
    
        Tokens
        205083
    
    
        Characters
        3320407
    

    Rules of transcription :

    The abbreviations have been resolved, both those by suspension (facimꝰ ---> facimus) and by contraction (dñi --> domini). Likewise, those using conventional signs (⁊ --> et ; ꝓ --> pro) have been resolved.

    The named entities (names of persons, places and institutions) have been capitalized. The beginning of a block of text as well as the original capitals used by the notary are also capitalized.

    The consonantal i and u characters have been transcribed as j and v in both French and Latin.

    The punctuation marks used in the text: . and / have been transcribed, but the transcription has not been standardized with modern punctuation.

    Corrections and words that appear cancelled in the manuscript have been transcribed surrounded by the sign $ at the beginning and at the end.

    More specific transcription rules can be found into the file transcription_guidelines.pdf

    Section 2. e-NDP Layout Segmentation.

    Layout segmentation is a compulsory step before HTR recognition in order to distinguish sections and regions inside a document. This process intend to separate interdependant page zones to produce a recognition in a section-sequence order and not in a line-sequence order which mix textual and peri-textual content.

    The regions of 364 pages (see GT-layout_list) of the e-NDP corpus were annotated using a 5 sections vocabulary (see endp_layout_regions) in order to describe the page distribution in all the 26 volumes :

    Block : All the central text blocks, that normally corresponds to the main content called "conclusions" in registers.

    Liste : List of names of the canons who were present during the meeting. Normally located before the conclusions.

    Entrée : Marginal notes or entries to inform about the content of conclusions.

    Date : Paragraph contending the date. Normally at the head of a conclusion, but separate of the main body.

    Numérotation : Page numbers in roman or arabic. Usually appear in the top corners of the pages.

    Layout GT statistics
    
    
        Region
        Count
    
    
        block
        833
    
    
        liste
        431
    
    
        date
        448
    
    
        entrée
        205
    
    
        numérotation
        531
    

    Section 3. The e-NDP HTR modeling.

    The e-NDP project has progressively trained several HTR models adapted to work on late medieval cursive in order to accelerate the production of ground truth. Currently the best model delivers an average CER (Character error ratio) of 9.7% in handwriting recognition on the 26 registers (see endp_learning_curve) and can serve as generalist model for other manuscripts of the same period and similar script family. These models and their training implementation details can be found in the project's github repository.

    Additionally, the automatic HTR transcriptions of the 26 registers (14k pages, 4.5M tokens) enriched with lexical and semantical information has been the subject of a first online publication using the NoSketch engine that allows advanced data mining based on the combination of data, metadata and NLP features.

    Section 4. Dataset content.

    This zip dataset contains :

    • HTR_ground_truth : Two folders containing the jpg / jpeg images and their curated transcriptions in PAGE XML format.

    • images_docs : 4 files illustrating the different phases of the project (list of GT for layout segmentation, layout ontologie, transcription guideline and HTR evaluation curves)

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Geological Survey (2025). Stillwater Complex, Montana—logs of core drilled by Stillwater Mining Company and Anaconda Copper Corp. in the Stillwater Mine area, 1983 to 1989 [Dataset]. https://catalog.data.gov/dataset/stillwater-complex-montanalogs-of-core-drilled-by-stillwater-mining-company-and-anaconda-c

Data from: Stillwater Complex, Montana—logs of core drilled by Stillwater Mining Company and Anaconda Copper Corp. in the Stillwater Mine area, 1983 to 1989

Related Article
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description

This dataset includes TIFF (Tagged Image File Format) images of graphic drill core logs showing associated drill core information, a TIFF image of the explanation for the lithology and structure sections of the logs, an Esri shapefile of the locations of the drill holes, and 12 .csv files of tabular data that were compiled from handwritten drill core logs. The drill core is from the Stillwater Mine area of the Stillwater Complex, Montana and was drilled from 1983 to 1989 by the Stillwater Mining Company and Anaconda Copper Corp. The data shown in the graphic drill logs and contained within the .csv files includes lithologic, structure, percent recovery, grain size, sulfide, nickel, copper, platinum, and palladium mineralization information. The graphic drill logs were created using Golden software's Strater 5 drill core visualization software and are provided with both logarithmic and linear scales where applicable. The graphic drill logs are plotted using the depth recorded in the drill logs and do not reflect stratigraphic true thickness. All instances of question marks ("?") represent original data as written by the geologist. In areas where the hand-written notes were unreadable, the notation of "[unreadable]" was used. See USGS SIR 2014-5183 (https://pubs.usgs.gov/sir/2014/5183/) for report and spatial data relating to the Stillwater Complex.

Search
Clear search
Close search
Google apps
Main menu