13 datasets found
  1. d

    Gulf Shrimp Control Data Tables

    • catalog.data.gov
    Updated Jul 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southeast Fisheries Science Center (Resource Provider) (2024). Gulf Shrimp Control Data Tables [Dataset]. https://catalog.data.gov/dataset/gulf-shrimp-control-data-tables
    Explore at:
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    Southeast Fisheries Science Center (Resource Provider)
    Description

    These are tables used to process the loads of gulf shrimp data. It contains pre-validation tables, error tables and information about statistics on data loads. It contains no data tables and no code tables. This information need not be published data set contains catch (landed catch) and effort for fishing trips made by the larger vessels that fish near and offshore for the various species of shrimp in the Gulf of Mexico. The data set also contains landings by the smaller boats that fish in the bays, lakes, bayous, and rivers for saltwater shrimp species; however, these landings data may be aggregated for multiple trip and may not provide effort data similar to the data for the larger vessels. The landings statistics in this data set consist of the quantity and value for the individual species of shrimp by size category type and quantity of gear, fishing duration and fishing area The data collection procedures for the catch/effort data for the large vessels consist of two parts. The landings statistics are collected from the seafood dealers after the trips are unloaded; whereas, the data on fishing effort and area are collected by interviews with the captain or crew while the trip is being unloaded.

  2. PUDL Data Release v1.0.0

    • zenodo.org
    application/gzip, bin +1
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell (2023). PUDL Data Release v1.0.0 [Dataset]. http://doi.org/10.5281/zenodo.3653159
    Explore at:
    application/gzip, bin, shAvailable download formats
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zane A. Selvans; Zane A. Selvans; Christina M. Gosnell; Christina M. Gosnell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the first data release from the Public Utility Data Liberation (PUDL) project. It can be referenced & cited using https://doi.org/10.5281/zenodo.3653159

    For more information about the free and open source software used to generate this data release, see Catalyst Cooperative's PUDL repository on Github, and the associated documentation on Read The Docs. This data release was generated using v0.3.1 of the catalystcoop.pudl python package.

    Included Data Packages

    This release consists of three tabular data packages, conforming to the standards published by Frictionless Data and the Open Knowledge Foundation. The data are stored in CSV files (some of which are compressed using gzip), and the associated metadata is stored as JSON. These tabular data can be used to populate a relational database.

    • pudl-eia860-eia923:
      Data originally collected and published by the US Energy Information Administration (US EIA). The data from EIA Form 860 covers the years 2011-2018. The Form 923 data covers 2009-2018. A large majority of the data published in the original data sources has been included, but some parts, like fuel stocks on hand, and EIA 923 schedules 6, 7, & 8 have not yet been integrated.
    • pudl-eia860-eia923-epacems:
      This data package contains all of the same data as the pudl-eia860-eia923 package above, as well as the Hourly Emissions data from the US Environmental Protection Agency's (EPA's) Continuous Emissions Monitoring System (CEMS) from 1995-2018. The EPA CEMS data covers thousands of power plants at hourly resolution for decades, and contains close to a billion records.
    • pudl-ferc1:
      Seven data tables from FERC Form 1 are included, primarily relating to individual power plants, and covering the years 1994-2018 (the entire span of time for which FERC provides this data). These tables are the only ones which have been subjected to any cleaning or organization for programmatic use within PUDL. The complete, raw FERC Form 1 database contains 116 different tables with many thousands of columns of mostly financial data. We will archive a complete copy of the multi-year FERC Form 1 Database as a file-based SQLite database at Zenodo, independent of this data release. It can also be re-generated using the catalystcoop.pudl Python package and the original source data files archived as part of this data release.

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. You can also:

    Using the Data

    The data packages are just CSVs (data) and JSON (metadata) files. They can be used with a variety of tools on many platforms. However, the data is organized primarily with the idea that it will be loaded into a relational database, and the PUDL Python package that was used to generate this data release can facilitate that process. Once the data is loaded into a database, you can access that DB however you like.

    Make sure conda is installed

    None of these commands will work without the conda Python package manager installed, either via Anaconda or miniconda:

    Download the data

    First download the files from the Zenodo archive into a new empty directory. A couple of them are very large (5-10 GB), and depending on what you're trying to do you may not need them.

    • If you don't want to recreate the data release from scratch by re-running the entire ETL process yourself, and you don't want to create a full clone of the original FERC Form 1 database, including all of the data that has not yet been integrated into PUDL, then you don't need to download pudl-input-data.tgz.
    • If you don't need the EPA CEMS Hourly Emissions data, you do not need to download pudl-eia860-eia923-epacems.tgz.

    Load All of PUDL in a Single Line

    Use cd to get into your new directory at the terminal (in Linux or Mac OS), or open up an Anaconda terminal in that directory if you're on Windows.

    If you have downloaded all of the files from the archive, and you want it all to be accessible locally, you can run a single shell script, called load-pudl.sh:

    bash pudl-load.sh
    

    This will do the following:

    • Load the FERC Form 1, EIA Form 860, and EIA Form 923 data packages into an SQLite database which can be found at sqlite/pudl.sqlite.
    • Convert the EPA CEMS data package into an Apache Parquet dataset which can be found at parquet/epacems.
    • Clone all of the FERC Form 1 annual databases into a single SQLite database which can be found at sqlite/ferc1.sqlite.

    Selectively Load PUDL Data

    If you don't want to download and load all of the PUDL data, you can load each of the above datasets separately.

    Create the PUDL conda Environment

    This installs the PUDL software locally, and a couple of other useful packages:

    conda create --yes --name pudl --channel conda-forge \
      --strict-channel-priority \
      python=3.7 catalystcoop.pudl=0.3.1 dask jupyter jupyterlab seaborn pip
    conda activate pudl
    

    Create a PUDL data management workspace

    Use the PUDL setup script to create a new data management environment inside this directory. After you run this command you'll see some other directories show up, like parquet, sqlite, data etc.

    pudl_setup ./
    

    Extract and load the FERC Form 1 and EIA 860/923 data

    If you just want the FERC Form 1 and EIA 860/923 data that has been integrated into PUDL, you only need to download pudl-ferc1.tgz and pudl-eia860-eia923.tgz. Then extract them in the same directory where you ran pudl_setup:

    tar -xzf pudl-ferc1.tgz
    tar -xzf pudl-eia860-eia923.tgz
    

    To make use of the FERC Form 1 and EIA 860/923 data, you'll probably want to load them into a local database. The datapkg_to_sqlite script that comes with PUDL will do that for you:

    datapkg_to_sqlite \
      datapkg/pudl-data-release/pudl-ferc1/datapackage.json \
      datapkg/pudl-data-release/pudl-eia860-eia923/datapackage.json \
      -o datapkg/pudl-data-release/pudl-merged/
    

    Now you should be able to connect to the database (~300 MB) which is stored in sqlite/pudl.sqlite.

    Extract EPA CEMS and convert to Apache Parquet

    If you want to work with the EPA CEMS data, which is much larger, we recommend converting it to an Apache Parquet dataset with the included epacems_to_parquet script. Then you can read those files into dataframes directly. In Python you can use the pandas.DataFrame.read_parquet() method. If you need to work with more data than can fit in memory at one time, we recommend using Dask dataframes. Converting the entire dataset from datapackages into Apache Parquet may take an hour or more:

    tar -xzf pudl-eia860-eia923-epacems.tgz
    epacems_to_parquet datapkg/pudl-data-release/pudl-eia860-eia923-epacems/datapackage.json
    

    You should find the Parquet dataset (~5 GB) under parquet/epacems, partitioned by year and state for easier querying.

    Clone the raw FERC Form 1 Databases

    If you want to access the entire set of original, raw FERC Form 1 data (of which only a small subset has been cleaned and integrated into PUDL) you can extract the original input data that's part of the Zenodo archive and run the ferc1_to_sqlite script using the same settings file that was used to generate the data release:

    tar -xzf pudl-input-data.tgz
    ferc1_to_sqlite data-release-settings.yml
    

    You'll find the FERC Form 1 database (~820 MB) in sqlite/ferc1.sqlite.

    Data Quality Control

    We have performed basic sanity checks on much but not all of the data compiled in PUDL to ensure that we identify any major issues we might have introduced through our processing

  3. Z

    Sentence/Table Pair Data from Wikipedia for Pre-training with...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huan Sun (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5612315
    Explore at:
    Dataset updated
    Oct 29, 2021
    Dataset provided by
    Alyssa Lees
    You Wu
    Cong Yu
    Yu Su
    Huan Sun
    Xiang Deng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

    There are two files:

    sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

    table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

    The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

    For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

    Below is a sample code snippet to load the data

    import webdataset as wds

    path to the uncompressed files, should be a directory with a set of tar files

    url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar' dataset = ( wds.Dataset(url) .shuffle(1000) # cache 1000 samples and shuffle .decode() .to_tuple("json") .batched(20) # group every 20 examples into a batch )

    Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch

    You can also iterate through all examples and dump them with your preferred data format

    Below we show how the data is organized with two examples.

    Text-only

    {'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence 's1_all_links': { 'Sils,_Girona': [[0, 4]], 'municipality': [[10, 22]], 'Comarques_of_Catalonia': [[30, 37]], 'Selva': [[41, 46]], 'Catalonia': [[51, 60]] }, # list of entities and their mentions in the sentence (start, end location) 'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs { 'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair 's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query 's2s': [ # list of other sentences that contain the common entity pair, or evidence { 'md5': '2777e32bddd6ec414f0bc7a0b7fea331', 'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.', 's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence 'pair_locs': [ # mentions of the entity pair in the evidence [[19, 27]], # mentions of entity 1 [[0, 5], [288, 293]] # mentions of entity 2 ], 'all_links': { 'Selva': [[0, 5], [288, 293]], 'Comarques_of_Catalonia': [[19, 27]], 'Catalonia': [[40, 49]] } } ,...] # there are multiple evidence sentences }, ,...] # there are multiple entity pairs in the query }

    Hybrid

    {'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.', 's1_all_links': {...}, # same as text-only 'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only 'table_pairs': [ 'tid': 'Major_League_Baseball-1', 'text':[ ['World Series Records', 'World Series Records', ...], ['Team', 'Number of Series won', ...], ['St. Louis Cardinals (NL)', '11', ...], ...] # table content, list of rows 'index':[ [[0, 0], [0, 1], ...], [[1, 0], [1, 1], ...], ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table. 'value_ranks':[ [0, 0, ...], [0, 0, ...], [0, 10, ...], ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS 'value_inv_ranks': [], # inverse rank 'all_links':{ 'St._Louis_Cardinals': { '2': [ [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]] ] # list of mentions in the second row, the key is row_id }, 'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]}, } 'name': '', # table name, if exists 'pairs': { 'pair': ['American_League', 'National_League'], 's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query 'table_pair_locs': { '17': [ # mention of entity pair in row 17 [ [[17, 0], [3, 18]], [[17, 1], [3, 18]], [[17, 2], [3, 18]], [[17, 3], [3, 18]] ], # mention of the first entity [ [[17, 0], [21, 36]], [[17, 1], [21, 36]], ] # mention of the second entity ] } } ] }

  4. s

    Datasets in support of the Southampton doctoral thesis 'Applying large scale...

    • eprints.soton.ac.uk
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coke, Brandon; Ewing, Rob (2025). Datasets in support of the Southampton doctoral thesis 'Applying large scale metanalysis of transcriptomic data to uncover hyper-responsive genes and prediction via machine learning' [Dataset]. http://doi.org/10.5258/SOTON/D3221
    Explore at:
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    University of Southampton
    Authors
    Coke, Brandon; Ewing, Rob
    Description

    The SQLite databases contain the outputs from the large scale analysis of pre-existing RNA-seq and microarray datasets performed in chapter 2. Both SQLite databases contain the outputs of limma- a package used to perform differential expressed gene analysis on the datasets from Gene Expression Omnibus (GEO)- https://www.ncbi.nlm.nih.gov/geo/. The Schema for both databases are as follows- the data table contains the outputs and statistics from limma. The meta table contains metadata about the number of treated and control samples, the type of experiment conducted and the tissue used. These datasets where used to derive the priors used in chapters 3 to 5 based on the proportion of datasets wherein a given gene is identified as differentially expressed- i.e. p-value below 0.05. Die to the size of the file, this is only available on request, please use https://library.soton.ac.uk/datarequest The machine_learning_input.csv file is a comma delaminated file containing the genomic and transcript based features used to predict a gene's prior in the machine learning models. For more information see the readme file. The RNK files are tab delimited files. The .RNK files' first column is the gene whils the second is the rank from 1 to 0. These files were used to assess the enrichment of desired DEGs across 22 perturbation studies in chapter 2 using GSEA- https://www.gsea-msigdb.org/gsea/index.jsp. 1 represents a gene with the lowest rank- highest priority. Whilst 0 represents the lowest priority for a given gene. The .RDS images are the R images used for the novel GEOreflect approach for ranking DEGs in bulk transcriptomic data developed in chapter 3. They are also needed to run the RShiny application used to showcase the method. The code for which can be found at GitHub (https://github.com/brandoncoke/GEOreflect) as well ain in the GEOreflect_bulk_DEG_analysis.tar. The .RDS files require R and the readRDS() function to load into the environment and contains the percentile matrices used to calculate a platform p-value rank. Within the GEOreflect_bulk_DEG_analysis.tar file is an R script GEOreflect_functions.R which when sourced after loading one of the .RDS images into the R environment enables the user to perform the GEOreflect method on bulk RNA-seq transcriptomic datasets by loading the percentile_matrix_p_value_RNAseq.RDS image. Alternatively when analysing GPL570 microarray datasets the percentile_matrix.RDS file needs to be loaded into the R environment and the appropiate R function then needs to be applied the DEG list. To run the RShiny application ensure both .RDS files are in the directory with the app.R file i.e. after using git clone https://github.com/brandoncoke/GEOreflect move both .RDS files into the GEOreflect directory with the cloned repository. The csv files with the scRNA-seq appended. These files contain the normalised mutual index, adjusted rand index and Silhouette coefficeint obtained when using 6 single cell RNA-sequencing techniques- GEOreflect, Seurat's vst method, CellBRF, genebasis and CellBRF with the 3 sigma rule imposed. This analysis was carried out in chapter 3. These .csvs use their GEO identifier in the file name or for Zheng et al's data from genomics 10X. The name assigned to it via the DuoClustering2018 R package. The machine_learning_input.csv file is a comma delaminated file containing the genomic and transcript based features used to predict a gene's prior in the machine learning models. The inputs from this file were used to develop the machine learning models used in chapter 5. First row- gene is the HNGC identifier for the genes whilst the min_to_be_sig column represents a gene's CDF value at 0.05 for their p-value distribution obtained from the RNA-seq datasets i.e. the target y for the regressor model. The sd column is unused- and was only relevant when calculating the priors using GPL570 microarray data were there can be redundant probes resulting in multiple priors for the same gene. This column would represent the standard deviation.

  5. f

    Bat Recording Manager 7.2

    • figshare.com
    Updated Mar 23, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barbastellus barbastellus; Justin Halls (2019). Bat Recording Manager 7.2 [Dataset]. http://doi.org/10.6084/m9.figshare.5972296.v34
    Explore at:
    application/x-dosexecAvailable download formats
    Dataset updated
    Mar 23, 2019
    Dataset provided by
    figshare
    Authors
    Barbastellus barbastellus; Justin Halls
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A program for managing collections of full spectrum recordings of bats.v6.2.6660 incorporates the import and export of collections of pictures in the image compare window.v6.2.6661 fixes some bugs and speed issues in 6660.v6.2.6680 tries to fix some database updating problems and adds additional debugging in this area.v7.0.6760 - Major improvements and changes.First define the additional shortvut key in Audacity - CTRL-SHIFT-M=Open menu in focussed track. New item in 'View' menu- Analyse and Import, will open a folder of .wav files and sequentially open them in Audacity. When annotated and the label file saved and Audacity closed the next file will be opened. If the label file is not saved then the process stops and will resume on the next invocation of Analyse and Import on that folder. As each file is opened the label track wil be automatically created and named.and the view ill zoom to the first 5 seconds of the .wav track.7.0.6764 also includes a new report format which (for one or more sessions) gives number of minutes in each ten minute window throughout the day in which a species of bat was detected. Rows are given for each species in the recordings. In Excel looks good as a bar chart or a radar chart.7.06789 hopefully fixes the problems when trying to update a database that caused the program to crash on startup if the database did not contain the more recent Version table.7.0.6799 cosmetic changes to use the normal file selection dialog instead of the folder browser dialog, and also when using Analyse and Import, you no longer need to pick a file when selecting the .wav file folder.7.0.6820 Adds session data to all report formats, including pass statistics for all species found in that session.7.0.6844 Adds the ability to add, save, adjust and include in exported images, Fiducial lines. Lines can be added, deleted or adjusted in the image comparison window and are saved to the database when the window is closed. For exported images the lines are permanently overlaid on the image and are no longer adjustable.7.0.6847 Makes slight improvements to the aspect ratio of images in the comparison window and when images are exported the fiducial lines are only included if the FIDS button is deptessed.7.0.6850 Fixes an occasional bug when saving images through Analyse and Import - using filenames in the caption has priority over bat's names. Also improvements in file handling when changing databases - now attempts to recognise if a db is the right type.7.0.6858 Makes some improvements to image handling, including a modification to the database structure to allow long descriptions for images (previously description+caption had to be less than 250 chars) and the ability to copy images within the application (but not to external applications). A single image may now be used simultaneously as a bat image, a call image or a segment image. Changes to it in one location will be reflected in all the other locations. On deletion the link is removed and if there are no remaining links for the image then the image itself will be removed from the database.7.0.6859 has some improvements to the image handling system. In the batReference view the COMP button now adds all bat and call images for all selected bats to the comparison window. Double clicking on a bat adds all bat, call and segment images for all the bats selected to the comparison window.7.0.6860 removed the COMP button from the bat reference view. Double-clicking in this view transfers all images of bat, calls and recordings to the comparison window. Double-clicking in the ListByBats view transfers all recording images but not the bat and call images to the comparison window. Exported images for recordings use the recording filename plus the start offset of the segment as a filename, or alternatively the image caption. 7.0.6866 Improvements to the grids and to grid scaling and movement especially for the sonagram grids.7.0.6876 Added the ability to right-click on a labelled segment in the recordings detail list control, to open that recording in Audacity and scroll to the location of that labelled segment. Only one instance of Audacity may be opened at a time or the scrolling does not work. Also made some improvements to the scrolling behaviour of the recording detail window.Version 7.1 makes significant changes to the way in which the recordingSessions list is displayed. Because this list can get quite large and therefore takes a long time to load, it now loads the data in discrete pages.At the top of the RecordingSessions List is a new navigation bar with a set of buttons and two combo-boxes. The rightmost combobox is used to set the number of items that will be loaded and displayed on a page. The selections are currently 10, 25, 50 and 100. Slower machines may find it advantageous to use smaller page sizes in order to speed up load times and reduce the demand for memory and cpu-time.The other combobox allows the selection of a sort field for the session list. Sessions are displayed in columns in a DataGrid which allows columns to be re-sized, moved and sorted. These functions all now only apply to the subset of data that has been loaded as a page. The Combo-box allows you to sort the full set of data in the database before loading the page. Thus if the combobox is set to sort on DATE with a Page size of 10, then only the 10 earliest (or the 10 latest depending on the direction of sorting) sessions in the database will be loaded. The displayed set of sessions can be sorted on the screen by clicking the column headers but this only changes the order on the screen, it does not load any other sessions from the database.The four buttons can be used to load the next or previous pages or to move to the start or end of the complete database collection. The Next or Previous buttons move the selection by 2/3 of the Page Size so that there will always be some visual overlap between pages.The sort combo-box has two entries for each field, one with a suffix of ^ and one with a suffix of v . These sort the database in Ascending or Descending order. Selecting a sort field will update the display and sort the display entries on the same field, but the sort direction of the displayed items will be whatever was last used. Clicking the column header will change the direction of sort for the displayed items.v7.1.6885 Updates the database to DB version 6.2 by the addition of two link tables between bats and recordings and between bats and sessions. These tables enable much faster access to bat specific data. Also various improvements to improve the speed of loading data when switching to List By Bats view, especially with very large databases.v7.1.6891 Further performance improvements in loading ListByBats and in loading imagesv7.1.6901 Has the ability to perform screen grabs of images without needing an external screen grabber program. Shift-Click on the 'PASTE' button and drag and resize the semi-transparent window to select a screen area, right click in the window to capture that portion of the screen. For details refer to Import/Import Picturesv7.1.6913 Fixed some scaling issues on fiducial lines in the comparison windowv7.1.6915 Bugfix for adjusting fiducial lines - 7.1.6913 removedv7.1.6941 Improvements and adjustments to grid and fiducial line handlingv7.1.6951 Fixes some problems with the Search dialogv7.2.6970 Introduces the ability to replay segments at reduced speed or in heterodyne 'bat detector' mode.v7.2.6971 When opening a recording or segment in Audacity the corresponding .txt file will be opened as a label track. NB this only works if there is only a single copy of Audacity open - subsequent calls with Audacity still open do not open the label track.v7.2.6978 Improvements to Heterodyne playback to use pure sinewave.7.2.6984 Bug fixes and mods to image handling - image captions can now have a region appended in seconds after the file name.---BRM-Aud-Setup_v7_2_7000.exeThis version includes its only private copy of Audacity 2.3.0 portable, which will be placed in the same folder as BRM and has its own pre-configured configuration file appropriate for use with BRM. This will not interfere with any existing installation of Audacity but provides all the Audacity features required by BRM with no further action by the user. BRM will use this version to display .wav files.v7.2.7000 also includes a new report format which is tailored to provide data for the Hertfordshire Mammals, Amphibians and Reptiles survey. It also displays the GPS co-ordinates for the Recording Session as an OS Grid Reference as well as latitude and longitude.v7.2.7010 Speed improvements and bug-fixes to opening and running Audacity through BRM. Audacity portable is now located in C:\audacity-win-portable instead of under the BRM program folder.v7.2.7012 Fixed some bugs in Report generation when producing he Frequency Table. Enabled the AddTag button in the BatReference pane.v7.2.7021 Upgrades the Audacity component to version 2.3.1 and a few minor bug fixes.

  6. COKI Language Dataset

    • zenodo.org
    application/gzip, csv
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
    Explore at:
    application/gzip, csvAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Methodology
    A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Query or Download
    The data is publicly accessible in BigQuery in the following two tables:

    When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

    See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

    Code
    The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

    License
    COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

    Attributions
    This work contains information from:

    References
    [1] https://doi.org/10.5281/zenodo.6366695
    [2] https://fasttext.cc/docs/en/language-identification.html
    [3] https://modelpredict.com/language-identification-survey

  7. Open Data Portal Catalogue

    • open.canada.ca
    • datasets.ai
    • +3more
    csv, json, jsonl, png +2
    Updated Aug 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
    Explore at:
    csv, sqlite, json, png, jsonl, xlsxAvailable download formats
    Dataset updated
    Aug 12, 2025
    Dataset provided by
    Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
    Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.

  8. H

    Consumer Expenditure Survey (CE)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Consumer Expenditure Survey (CE) [Dataset]. http://doi.org/10.7910/DVN/UTNJAH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the consumer expenditure survey (ce) with r the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories. how else are you gonna know that the average american household spent $34 (±2) on bacon, $826 (±17) on cellular phones, and $13 (±2) on digital e-readers in 2011? an integral component of the market basket calculation in the consumer price index, this survey recently became available as public-use microdata and they're slowly releasing historical files back to 1996. hooray! for a t aste of what's possible with ce data, look at the quick tables listed on their main page - these tables contain approximately a bazillion different expenditure categories broken down by demographic groups. guess what? i just learned that americans living in households with $5,000 to $9,999 of annual income spent an average of $283 (±90) on pets, toys, hobbies, and playground equipment (pdf page 3). you can often get close to your statistic of interest from these web tables. but say you wanted to look at domestic pet expenditure among only households with children between 12 and 17 years old. another one of the thirteen web tables - the consumer unit composition table - shows a few different breakouts of households with kids, but none matching that exact population of interest. the bureau of labor statistics (bls) (the survey's designers) and the census bureau (the survey's administrators) have provided plenty of the major statistics and breakouts for you, but they're not psychic. if you want to comb through this data for specific expenditure categories broken out by a you-defined segment of the united states' population, then let a little r into your life. fun starts now. fair warning: only analyze t he consumer expenditure survey if you are nerd to the core. the microdata ship with two different survey types (interview and diary), each containing five or six quarterly table formats that need to be stacked, merged, and manipulated prior to a methodologically-correct analysis. the scripts in this repository contain examples to prepare 'em all, just be advised that magnificent data like this will never be no-assembly-required. the folks at bls have posted an excellent summary of what's av ailable - read it before anything else. after that, read the getting started guide. don't skim. a few of the descriptions below refer to sas programs provided by the bureau of labor statistics. you'll find these in the C:\My Directory\CES\2011\docs directory after you run the download program. this new github repository contains three scripts: 2010-2011 - download all microdata.R lo op through every year and download every file hosted on the bls's ce ftp site import each of the comma-separated value files into r with read.csv depending on user-settings, save each table as an r data file (.rda) or stat a-readable file (.dta) 2011 fmly intrvw - analysis examples.R load the r data files (.rda) necessary to create the 'fmly' table shown in the ce macros program documentation.doc file construct that 'fmly' table, using five quarters of interviews (q1 2011 thru q1 2012) initiate a replicate-weighted survey design object perform some lovely li'l analysis examples replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using unimputed variables replicate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t -tests using unimputed variables create an rsqlite database (to minimize ram usage) containing the five imputed variable files, after identifying which variables were imputed based on pdf page 3 of the user's guide to income imputation initiate a replicate-weighted, database-backed, multiply-imputed survey design object perform a few additional analyses that highlight the modified syntax required for multiply-imputed survey designs replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using imputed variables repl icate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t-tests using imputed variables replicate the %proc_reg() and %proc_logistic() macros found in "ce macros.sas" and provide some examples of regressions and logistic regressions using both unimputed and imputed variables replicate integrated mean and se.R match each step in the bls-provided sas program "integr ated mean and se.sas" but with r instead of sas create an rsqlite database when the expenditure table gets too large for older computers to handle in ram export a table "2011 integrated mean and se.csv" that exactly matches the contents of the sas-produced "2011 integrated mean and se.lst" text file click here to view these three scripts for...

  9. e

    Code for EchoTables (IKILeUS) - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Code for EchoTables (IKILeUS) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/47270cb0-d38d-53de-a147-e7a0bb0573ab
    Explore at:
    Dataset updated
    Aug 9, 2025
    Description

    EchoTables is an innovative accessibility tool developed as part of the IKILeUS project at the University of Stuttgart. It is designed to improve the usability of tabular data for visually impaired users by converting structured tables into concise, auditory-friendly textual summaries. Traditional screen readers navigate tables linearly, which imposes a high cognitive load on users. EchoTables alleviates this issue by summarizing tables, facilitating quicker comprehension and more efficient information retrieval. Initially utilizing RUCAIBox (LLM), EchoTables transitioned to Mistral-7B, a more powerful open-source model, to enhance processing efficiency and scalability. The tool has been tested with widely used screen readers such as VoiceOver to ensure accessibility. EchoTables has been adapted to process diverse data sources, including lecture materials, assignments, and WikiTables, making it a valuable resource for students navigating complex datasets.

  10. o

    Data from: Fishing intensity in the Atlantic Ocean (from Global Fishing...

    • explore.openaire.eu
    • zenodo.org
    Updated Sep 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Mateo; Asier Anabitarte Riol; Igor Granado; Jose-A. Fernandes (2024). Fishing intensity in the Atlantic Ocean (from Global Fishing Watch) [Dataset]. http://doi.org/10.5281/zenodo.13791296
    Explore at:
    Dataset updated
    Sep 19, 2024
    Authors
    Maria Mateo; Asier Anabitarte Riol; Igor Granado; Jose-A. Fernandes
    Area covered
    Atlantic Ocean
    Description
    1. MISSION ATLANTIC The MISSION ATLANTIC project is an EU-funded initiative that focuses on understanding the impacts of climate change and human activities on these ecosystems. The project aims to map and assess the current and future status of Atlantic marine ecosystems, develop tools for sustainable management, and support ecosystem-based governance to ensure the resilience and sustainable use of ocean resources. The project brings together experts from 33 partner organizations across 14 countries, including Europe, Africa, North, and South America. MISSION ATLANTIC includes ten work packages. The present published dataset is included in WP3, which focuses on mapping the pelagic ecosystems, resources, and pressures in the Atlantic Ocean. This WP aims to collect extensive spatial and temporal data to create 3D maps of the water column, identify key vertical ecosystem domains, and assess the pressures from climate change and human activities. More specifically, the dataset corresponds to the fishing intensity presented in the Deliverable 3.2, which integrates data from various sources to map the distribution and dynamics of present ecosystem pressures over time, providing crucial insights for sustainable management strategies. 2. Data description 2.1. Data Source Fishing intensity estimates from the Global Fishing Watch initiative (GFW) (Kroodsma et al. 2018), who applies machine learning algorithms to data from Automatic Identification Systems (AIS), Vessel Monitoring Systems (VMS), and vessel registries, has been used for the year 2020. This machine learning approach has been able to distinguish between fishing and routing activity of individual vessels, while using pattern recognition to differentiate seven main fishing gear types at the Atlantic Ocean scale (Taconet et al., 2019). The seven main fishing vessel types considered are: trawlers, purse seiners, drifting longliners, set gillnets, squid jiggers, pots and traps, and other. In this work we have aggregated these into pelagic, seabed and passive fishing activities to align with our grouping of ecosystem components. The GFW data has some limitations: AIS is only required for large vessels. The International Maritime Organization requires AIS use for all vessels of 300 gross tonnage and upward, although some jurisdictions mandate its use in smaller vessels. For example, within the European Union it is required for fishing vessels at least 15m in length. This means that in some areas the fishing intensity estimates will not include the activity of small vessels operating near shore. AIS can be intentionally turned off, for example, when vessels carry out illegal fishing activities (Kurekin et al. 2019). In the GFW dataset, vessels classified as trawlers include both pelagic and bottom trawlers. As trawlers are included in the bottom fishing category, it is highly likely that the data overestimates the effort on the seafloor and underestimates it on the water column. 2.2. Data Processing 1. Data download from the GFW portal. 2. Using R: Add daily files and aggregate fishing hours by fishing gear and coordinates: library(data.table)## Load data fileIdx = list.files(".../fleet-daily-csvs-100-v2-2020/", full.names = T) ## Loop colsIdx = c("geartype", "hours", "fishing_hours", "x", "y") lapply(fileIdx, function(xx) { out = data.table (x = NA_real_, y = NA_real_, geartype = NA_character_) tmp = fread(xx) tmp[, ":=" (y = floor(cell_ll_lat * 10L) / 10L, x = floor(cell_ll_lon * 10L) / 10L)] tmp = tmp[, ..colsIdx] h = tmp[, c(.N, lapply(.SD, sum, na.rm = T)), by = .(x, y, geartype)] outh = data.table::merge.data.table(out, h, by = c("x", "y", "geartype"), all=TRUE) fwrite(outh, ".../GFW_2020_0.1_degrees_and_gear_all.csv", nThread = 14, append = T) }) Group fishing gears into main fishing groups: library(dplyr)library(tidyr)## Load data fishing % group_by(x, y, group) %>% summarise(gfishing_hours = sum(fishing_hours)) Pivot table in order to have fishing groups in columns. Each row corresponds to the coordinates of the left corner of the grid cell (0.1 decimal degrees): ## Pivoting table (fishing groups in columns) fish_gr3 % pivot_wider(names_from = "group", values_from = "gfishing_hours", values_fill = 0) ## Saving data (to import in PostgreSQL) write.csv(fish_gr3, ".../fishing.csv"), row.names = FALSE) Export the table in our PostGIS spatial database using QGis 3. Using PostgreSQL: Create grid cell identifiers (gid): -- Generating a gid ALTER TABLE public.fishing ADD COLUMN gid uuid PRIMARY KEY DEFAULT uuid_generate_v4(); Estimate the centroid of each grid cell: -- Create columns ALTER TABLE public.fishing ADD COLUMN cen_lat float; ALTER TABLE public.fishing ADD COLUMN cen_lon float; -- Calculate the grid centroid UPDATE public.fishing SET cen_lat = y + 0.05; UPDATE public.fishing SET cen_lon = x + 0.05; Create the geometry column based on the estimated centroids to provide the spatial component: -- (if necessary) S...
  11. Z

    Data from: Computational 3D resolution enhancement for optical coherence...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George-Othon Glentis (2024). Computational 3D resolution enhancement for optical coherence tomography with a narrowband visible light source [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7870794
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Jos de Wit
    George-Othon Glentis
    Jeroen Kalkman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the code and data underlying the publication "Computational 3D resolution enhancement for optical coherence tomography with a narrowband visible light source" in Biomedical Optics Express 14, 3532-3554 (2023) (doi.org/10.1364/BOE.487345).

    The reader is free to use the scripts and data in this depository, as long as the manuscript is correctly cited in their work. For further questions, please contact the corresponding author.

    Description of the code and datasets

    Table 1 describes all the Matlab and Python scripts in this depository. Table 2 describes the datasets. The input datasets are the phase corrected datasets, as the raw data is large in size and phase correction using a coverslip as reference is rather straightforward. Processed datasets are also added to the repository to allow for running only a limited number of scripts, or to obtain for example the aberration corrected data without the need to use python. Note that the simulation input data (input_simulations_pointscatters_SLDshape_98zf_noise75.mat) is generated with random noise, so if this is overwritten de results may slightly vary. Also the aberration correction is done with random apertures, so the processed aberration corrected data (exp_pointscat_image_MIAA_ISAM_CAO.mat and exp_leaf_image_MIAA_ISAM_CAO.mat) will also slightly change if the aberration correction script is run anew. The current processed datasets are used as basis for the figures in the publication. For details on the implementation we refer to the publication.

    Table 1: The Matlab and Python scripts with their description
    
    
        Script name
        Description
    
    
        MIAA_ISAM_processing.m
        This scripts performs the DFT, RFIAA and MIAA processing of the phase-corrected data that can be loaded from the datasets. Afterwards it also applies ISAM on the DFT and MIAA data and plots the results in a figure (via the scripts plot_figure3, plot_figure5 and plot_simulationdatafigure).
    
    
        resolution_analysis_figure4.m
        This figure loads the data from the point scatterers (absolute amplitude data), seeks the point scatterrers and fits them to obtain the resolution data. Finally it plots figure 4 of the publication.
    
    
        fiaa_oct_c1.m, oct_iaa_c1.m, rec_fiaa_oct_c1.m, rfiaa_oct_c1.m 
        These four functions are used to apply fast IAA and MIAA. See script MIAA_ISAM_processing.m for their usage.
    
    
        viridis.m, morgenstemning.m
        These scripts define the colormaps for the figures.
    
    
        plot_figure3.m, plot_figure5.m, plot_simulationdatafigure.m
        These scripts are used to plot the figures 3 and 5 and a figure with simulation data. These scripts are executed at the end of script MIAA_ISAM_processing.m.
    
    
        Python script: computational_adaptive_optics_script.py
        Python script that applied computational adaptive optics to obtain the data for figure 6 of the manuscript.
    
    
        Python script: zernike_functions2.py
        Python script that gives the values and carthesian derrivatives of the Zernike polynomials.
    
    
        figure6_ComputationalAdaptiveOptics.m
        Script that loads the CAO data that was saved in Python, analyzes the resolution, and plots figure 6.
    
    
        Python script: OCTsimulations_3D_script2.py
        Python script simulates OCT data, adds noise and saves it as .mat file for use in the matlab script above.
    
    
        Python script: OCTsimulations2.py
        Module that contains a python class that can be used to simulate 3D OCT datasets based on a Gaussian beam.
    
    
        Matlab toolbox DIPimage 2.9.zip
        Dipimage is used in the scripts. The toolbox can be downloaded online or this zip can be used.
    
    
    
    
    
    
    The datasets in this Zenodo repository
    
    
        Name
        Description
    
    
        input_leafdisc_phasecorrected.mat
        Phase corrected input image of the leaf disc (used in figure 5).
    
    
        input_TiO2gelatin_004_phasecorrected.mat
        Phase corrected input image of the TiO2 in gelatin sample.
    
    
        input_simulations_pointscatters_SLDshape_98zf_noise75
        Input simulation data that, once processed, is used in figure 4.
    

    exp_pointscat_image_DFT.mat

    exp_pointscat_image_DFT_ISAM.mat

    exp_pointscat_image_RFIAA.mat

    exp_pointscat_image_MIAA_ISAM.mat

    exp_pointscat_image_MIAA_ISAM_CAO.mat

        Processed experimental amplitude data for the TiO2 point scattering sample with respectively DFT, DFT+ISAM, RFIAA, MIAA+ISAM and MIAA+ISAM+CAO. These datasets are used for fitting in figure 4 (except for CAO), and MIAA_ISAM and MIAA_ISAM_CAO are used for figure 6.
    

    simu_pointscat_image_DFT.mat

    simu_pointscat_image_RFIAA.mat

    simu_pointscat_image_DFT_ISAM.mat

    simu_pointscat_image_MIAA_ISAM.mat

        Processed amplitude data from the simulation dataset, which is used in the script for figure 4 for the resolution analysis.
    

    exp_leaf_image_MIAA_ISAM.mat

    exp_leaf_image_MIAA_ISAM_CAO.mat

        Processed amplitude data from the leaf sample, with and without aberration correction which is used to produce figure 6.
    

    exp_leaf_zernike_coefficients_CAO_normal_wmaf.mat

    exp_pointscat_zernike_coefficients_CAO_normal_wmaf.mat

        Estimated Zernike coefficients and the weighted moving average of them that is used for the computational aberration correction. Some of this data is plotted in Figure 6 of the manuscript.
    
    
        input_zernike_modes.mat
        The reference Zernike modes corresponding to the data that is loaded to give the modes the proper name.
    

    exp_pointscat_MIAA_ISAM_complex.mat

    exp_leaf_MIAA_ISAM_complex

        Complex MIAA+ISAM processed data that is used as input for the computational aberration correction.
    
  12. Connectomics of (part of) the MICrONS mm3 dataset

    • zenodo.org
    bin
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. W. Reimann; M. W. Reimann (2023). Connectomics of (part of) the MICrONS mm3 dataset [Dataset]. http://doi.org/10.5281/zenodo.8364070
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    M. W. Reimann; M. W. Reimann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides the connectome of a large part of the IARPA MICrONS mm^3 dataset (https://www.microns-explorer.org/cortical-mm3). Specifically, it contains internal connectivity between most neurons of "portion 65" of the EM volume (see above link for details), i.e. synapses between neurons inside the volume, but no synapses from neurons extrinsic to the volume. The volume contains parts of the regions VISp, VISrl, VISal and VISlm.

    The file combines data from the following tables of the "minnie65_public_v117" release (see link above):
    - allen_soma_coarse_cell_class_model_v1_minnie3_v1 for neuron identifiers, tentative classes and soma locations
    - proofreading_status_public_release for information about axon / dendrite completeness
    - synapses_pni_2 for synaptic connection locations, sizes, source and target neurons


    The full description of those tables, as originally provided, can be found below. We converted location indicated in voxel indices in the original data to locations in nm, additionally we provide very tentative region annotations for neurons (but see below!).
    The main utility of this release lies in its formatting for the Connectome-utilities python package (https://github.com/BlueBrain/ConnectomeUtilities). As such, you can easily load it and use Connectome-utilities functionality for various analyses. Exemplary notebooks are provided.

    The file contains two representations of the connectome:
    - "full" represent multiple synapses between neurons as multiple directed edges.
    - "condensed" only has (at most) a single edge between neurons, but it is associated with a property "count" that specifies the number of synapses. Other synapse properties (such as their locations) are mostly lost in the condensed representation, only the mean and sum of the "size" property is provided. You can also get the condensed version from the full version by using the .condense() function of Connectome-utilities (see documentation).

    Some notes:
    - The data in the "allen soma coarse cell class model" contains some neuron identifiers associated with multiple types. Manual inspection of their meshes indicated that they really are merges of several neurons. Since this only seemed to affect a few hundred of neurons, we simply filtered them out for this dataset.
    - Neurons are annotated with brain region names ("visp", "visrl", etc.). Do NOT treat them as accurate. They were derived as follows: An image depicting the EM volume with region borders drawn is provided on the MICrONS page. We aligned the x and z coordinates of neuron with the image coordinates and assigned them regions based on which side of the border they landed. This is not overly accurate and assumes region borders are parallel to the y-axis (however, note that the y-axis is very much aligned with the depth-axis). If you can provide a transformation from the original voxel coordinates to the Allen CCF, please let me know and I will use that instead.

    Getting started:

    To start, check the documentation of Connectome-utilities or just dive into the included exemplary jupyter notebooks.

    Contact:

    If you have questions or notes: conntility.645co@simplelogin.com

    CREDIT
    All of this is based on MICrONS, with very little work by me. To give full credit, I will include below the original description of the datasets used. Many thanks to everyone mentioned below and everyone else that worked hard to provide that highly valuable data!

    allen_soma_coarse_cell_class_model_v1_minnie3_v1:
    This is a model developed by Leila Elabbady and Forrest Collman, it uses features extracted from the somatic region and nucleus segmentation (developed in collaboration with Shang Mu and Gayathri Mahalingam). Those features included the number of soma synapses, the somatic area, the somatic area to volume ratio, the density of somatic synapses, teh volume of the soma, the depth in cortex of the cell (based upon the y coordinate after a 5 degree rotation), the nucleus area, the ratio of the nucleus area to nucleus volume, the average diameter of the proximal processes of the cell, the area of the nucleus with a fold, the fraction of the nucleus area within a fold, the volume of the nucleus, and the ratio of the volume of the nucleus to the volume of the soma. The model was trained using labels from the allen_v1_column_types_v2 table supplemented with NP labels from allen_minnie_extra_types as of version 91 . The model is a SVM classifier, using an rbf kernel, with class balance. On 20% of the data held out the model has a 77% accuracy, with principal confusion between layer 6 IT and CT, and layer 5 IT with layer 4 and many types. Please contact Forrest Collman for more information or questions about the model.

    proofreading_status_public_release:
    The proofreading status of neurons that have been comprehensively proofread within this version. Axon and dendrite compartment status are marked separately under 'axon_status' and 'dendrite_status', as proofreading effort was applied differently to the different compartments in some cells. There are three possible status values for each compartment: 'non' indicates no comprehensive proofreading. 'clean' indicates that all false merges have been removed, but all tips have not necessarily been followed. 'extended' indicates that the cell is both clean and all tips have been followed as far as a proofreader was able to. The 'pt_position' is at a cell body or similar core position for the cell. The column 'valid_id' provides the root id when the proofreading was last checked. If the current root id in 'pt_root_id' is not the same as 'valid_id', there is no guarantee that the proofreading status is correct. Very small false axon merges (axon fragments approximately 5 microns or less in length) were considered acceptable for clean neurites. Note that this table does not list all edited cells, but only those with comprehensive effort toward the status mentioned here. Table compiled by Sven Dorkenwald and Casey Schneider-Mizell, including work by many proofreaders and data maintained by Stelios Papadopoulos.

    synapses_pni_2:
    Automated synapse detection performed by Nick Turner from the Seung Lab. size represents the number of (4x4x40 nm) voxels painted by the automated cleft segmentation, and the IDs reference the IDs of the cleft segmentation. Ctr_pt reflects the centroid of the cleft segmentation. The cleft segmentation volume is located in the flat_segmentation_source field.

  13. w

    GLA Population Projections - Custom Age Tables

    • data.wu.ac.at
    • data.europa.eu
    xls
    Updated Sep 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    London Datastore Archive (2015). GLA Population Projections - Custom Age Tables [Dataset]. https://data.wu.ac.at/odso/datahub_io/YTcxM2E0YmUtMDg5MS00MmYwLWI1ZDQtM2JjYjdlNzUyNWEw
    Explore at:
    xls(6428672.0), xls(6437376.0), xls(38683136.0), xls(2705408.0), xls(6410240.0), xls(2705920.0), xls(6427136.0), xls(2679808.0), xls(6431232.0), xls(35003904.0), xls(39437312.0), xls(38370304.0), xls(6435328.0)Available download formats
    Dataset updated
    Sep 26, 2015
    Dataset provided by
    London Datastore Archive
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    https://londondatastore-upload.s3.amazonaws.com/gla-custom-age-screen.JPG" alt="Alt text" />

    Excel age range creator for GLA Projections data

    This Excel based tool enables users to query the raw single year of age data so that any age range can easily be calculated without having to carry out often complex, and time consuming formulas that could also be open to human error. Each year the GLA demography team produce sets of population projections. On this page each of these datasets since 2009 can be accessed, though please remember that the older versions have been superceded. From 2012, data includes population estimates and projections between 2001 and 2041 for each borough plus Central London (Camden, City of London, Kensington & Chelsea, and Westminster), Rest of Inner Boroughs, Inner London, Outer London and Greater London.

    The full raw data by single year of age (SYA) and gender are available as Datastore packages at the links below.

    How to use the tool: Simply select the lower and upper age range for both males and females (starting in cell C3) and the spreadsheet will return the total population for the range.

    Tip: You can copy and paste the boroughs you are interested in to another worksheet by clicking: Edit then Go To (or Control + G), then Special, and Visible cells only. Then simply copy and 'paste values' of the cells to a new location.

    Warning: The ethnic group and ward files are large (around 35MB), and may take some time to download depending on your bandwidth.

    Find out more about GLA population projections on the GLA Demographic Projections page

    BOROUGH PROJECTIONS

    GLA 2009 Round London Plan Population Projections (January 2010) (SUPERSEDED)

    GLA 2009 Round (revised) London Plan Population Projections (August 2010) (SUPERCEDED)

    GLA 2009 Round (revised) SHLAA Population Projections (August 2010) (SUPERCEDED)

    GLA 2010 Round SHLAA Population Projections (February 2011) (SUPERCEDED)

    GLA 2011 Round SHLAA Population Projections, High Fertility (December 2011) (SUPERCEDED)

    GLA 2011 Round SHLAA Population Projections, Standard Fertility (January 2012) (SUPERCEDED)

    GLA 2012 Round SHLAA Population Projections, (December 2012)(SUPERCEDED)

    GLA 2012 Round Trend Based Population Projections, (December 2012) (SUPERCEDED)

    GLA 2012 Round SHLAA Borough Projections incorporating DCLG 2011 household formation rates, (June 2013) (SUPERCEDED)

    GLA 2013 Round Trend Based Population Projections - High (December 2013) (SUPERCEDED)

    GLA 2013 Round Trend Based Population Projections - Central (December 2013) (SUPERCEDED)

    GLA 2013 Round Trend Based Population Projections - Low (December 2013) (SUPERCEDED)

    GLA 2013 Round SHLAA Based Population Projections (February 2014) (SUPERCEDED) Spreadsheet now includes national comparator data from ONS.

    GLA 2013 Round SHLAA Based Capped Population Projections (March 2014) (SUPERCEDED) Spreadsheet includes national comparator data from ONS.

    GLA 2014 Round Trend-based, Short-Term Migration Scenario Population Projections (April 2015) Spreadsheet includes national comparator data from ONS.

    GLA 2014 Round Trend-based, Long-Term Migration Scenario Population Projections (April 2015) Spreadsheet includes national comparator data from ONS.

    GLA 2014 Round SHLAA DCLG Based Long Term Migration Scenario Population Projections (April 2015) Spreadsheet includes national comparator data from ONS.

    GLA 2014 Round SHLAA Capped Household Size Model Short Term Migration Scenario Population Projections (April 2015) Spreadsheet includes national comparator data from ONS. This is the recommended file to use.

    WARD PROJECTIONS

    GLA 2008 round (High) Ward Projections (March 2009) (SUPERSEDED)

    GLA 2009 round (revised) London Plan Ward Projections (August 2010) (SUPERCEDED)

    GLA 2010 round SHLAA Ward Projections (February 2011) (SUPERCEDED)

    GLA 2011 round SHLAA Standard Ward Projections (January 2012) (SUPERCEDED)

    GLA 2011 round SHLAA High Ward Projections (January 2012) (SUPERCEDED)

    GLA 2012 round SHLAA based Ward Projections (March 2013) (XLS) (SUPERCEDED)

    GLA 2012 round SHLAA Ward Projections (March 2013) (XLS) (SUPERCEDED)

    GLA 2013 round SHLAA Ward Projections (March 2014) (SUPERCEDED)

    GLA 2013 round SHLAA Capped Ward Projections (March 2014) (SUPERCEDED)

    GLA 2014 round SHLAA Capped Household Size Model Short Term Migration Scenario Ward Projections (April 2015) This is the recommended file to use.

    ETHNIC GROUP PROJECTIONS FOR LOCAL AUTHORITIES

    GLA 2012 Round SHLAA Ethnic Group Borough Projections - Interim (May 2013) (SUPERCEDED)

    GLA 2012 Round Trend Based Ethnic Group Borough Projections - Interim (May 2013) (SUPERCEDED)

    GLA 2012 Round SHLAA Based Ethnic Group Borough Projections - Final (Nov 2013) (SUPERCEDED)

    GLA 2012 Round Trend Based Ethnic Group Borough Projections - Final (Nov 2013) (SUPERCEDED)

    GLA 2013 Round SHLAA Capped Ethnic Group Borough Projections (August 2014)

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Southeast Fisheries Science Center (Resource Provider) (2024). Gulf Shrimp Control Data Tables [Dataset]. https://catalog.data.gov/dataset/gulf-shrimp-control-data-tables

Gulf Shrimp Control Data Tables

Explore at:
Dataset updated
Jul 2, 2024
Dataset provided by
Southeast Fisheries Science Center (Resource Provider)
Description

These are tables used to process the loads of gulf shrimp data. It contains pre-validation tables, error tables and information about statistics on data loads. It contains no data tables and no code tables. This information need not be published data set contains catch (landed catch) and effort for fishing trips made by the larger vessels that fish near and offshore for the various species of shrimp in the Gulf of Mexico. The data set also contains landings by the smaller boats that fish in the bays, lakes, bayous, and rivers for saltwater shrimp species; however, these landings data may be aggregated for multiple trip and may not provide effort data similar to the data for the larger vessels. The landings statistics in this data set consist of the quantity and value for the individual species of shrimp by size category type and quantity of gear, fishing duration and fishing area The data collection procedures for the catch/effort data for the large vessels consist of two parts. The landings statistics are collected from the seafood dealers after the trips are unloaded; whereas, the data on fishing effort and area are collected by interviews with the captain or crew while the trip is being unloaded.

Search
Clear search
Close search
Google apps
Main menu