3 datasets found
  1. n

    Data from: Kabat Database of Sequences of Proteins of Immunological Interest...

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jun 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465
    Explore at:
    Dataset updated
    Jun 27, 2024
    Description

    The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.

  2. U.S. Commercial Aviation Industry Metrics

    • kaggle.com
    zip
    Updated Jul 13, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Franklin Bradfield (2017). U.S. Commercial Aviation Industry Metrics [Dataset]. https://www.kaggle.com/shellshock1911/us-commercial-aviation-industry-metrics
    Explore at:
    zip(1573798 bytes)Available download formats
    Dataset updated
    Jul 13, 2017
    Authors
    Franklin Bradfield
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Have you taken a flight in the U.S. in the past 15 years? If so, then you are a part of monthly data that the U.S. Department of Transportation's TranStats service makes available on various metrics for 15 U.S. airlines and 30 major U.S airports. Their website unfortunately does not include a method for easily downloading and sharing files. Furthermore, the source is built in ASP.NET, so extracting the data is rather cumbersome. To allow easier community access to this rich source of information, I scraped the metrics for every airline / airport combination and stored them in separate CSV files.

    Occasionally, an airline doesn't serve a certain airport, or it didn't serve it for the entire duration that the data collection period covers*. In those cases, the data either doesn't exist or is typically too sparse to be of much use. As such, I've only uploaded complete files for airports that an airline served for the entire uninterrupted duration of the collection period. For these files, there should be 174 time series points for one or more of the nine columns below. I recommend any of the files for American, Delta, or United Airlines for outstanding examples of complete and robust airline data.

    * No data for Atlas Air exists, and Virgin America commenced service in 2007, so no folders for either airline are included.

    Content

    There are 13 airlines that have at least one complete dataset. Each airline's folder includes CSV file(s) for each airport that are complete as defined by the above criteria. I've double-checked the files, but if you find one that violates the criteria, please point it out. The file names have the format "AIRLINE-AIRPORT.csv", where both AIRLINE and AIRPORT are IATA codes. For a full listing of the airlines and airports that the codes correspond to, check out the airline_codes.csv or airport_codes.csv files that are included, or perform a lookup here. Note that the data in each airport file represents metrics for flights that originated at the airport.

    Among the 13 airlines in data.zip, there are a total of 161 individual datasets. There are also two special folders included - airlines_all_airports.csv and airports_all_airlines.csv. The first contains datasets for each airline aggregated over all airports, while the second contains datasets for each airport aggregated over all airlines. To preview a sample dataset, check out all_airlines_all_airports.csv, which contains industry-wide data.

    Each file includes the following metrics for each month from October 2002 to March 2017:

    1. Date (YYYY-MM-DD): All dates are set to the first of the month. The day value is just a placeholder and has no significance.
    2. ASM_Domestic: Available Seat-Miles in thousands (000s). Number of domestic flights * Number of seats on each flight
    3. ASM_International*: Available Seat-Miles in thousands (000s). Number of international flights * Number of seats on each flight
    4. Flights_Domestic
    5. Flights_International*
    6. Passengers_Domestic
    7. Passengers_International*
    8. RPM_Domestic: Revenue Passenger-Miles in thousands (000s). Number of domestic flights * Number of paying passengers
    9. RPM_International*: Revenue Passenger-Miles in thousands (000s). Number of international flights * Number of paying passengers

    * Frequently contains missing values

    Acknowledgements

    Thanks to the U.S. Department of Transportation for collecting this data every month and making it publicly available to us all.

    Source: https://www.transtats.bts.gov/Data_Elements.aspx

    Inspiration

    The airline / airport datasets are perfect for practicing and/or testing time series forecasting with classic statistical models such as autoregressive integrated moving average (ARIMA), or modern deep learning techniques such as long short-term memory (LSTM) networks. The datasets typically show evidence of trends, seasonality, and noise, so modeling and accurate forecasting can be challenging, but still more tractable than time series problems possessing more stochastic elements, e.g. stocks, currencies, commodities, etc. The source releases new data each month, so feel free to check your models' performances against new data as it comes out. I will update the files here every 3 to 6 months depending on how things go.

    A future plan is to build a SQLite database so a vast array of queries can be run against the data. The data in it its current time series format is not conducive for this, so coming up with a workable structure for the tables is the first step towards this goal. If you have any suggestions for how I can improve the data presentation, or anything that you would like me to add, please let me know. Looking forward to seeing the questions that we can answer together!

  3. Files digitised by the National Archives of Australia, 25 February 2021 to...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Sherratt; Tim Sherratt (2025). Files digitised by the National Archives of Australia, 25 February 2021 to 24 December 2022 [Dataset]. http://doi.org/10.5281/zenodo.7567138
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tim Sherratt; Tim Sherratt
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains details of 731,079 files digitised by the National Archives of Australia in 2021 and 2022.

    The National Archives of Australia's online database, RecordSearch, includes a list of recently digitised files, but the list only includes files digitised in the last month. This dataset was created by combining regular harvests of this list to create a continuous record of files digitised from 2021 and 2022. It was created and shared to help document long-term changes in access to files held by the NAA.

    The first harvest was run on 27 March 2021 and captured details back to 25 February. Since then I have automatically run weekly harvests and saved them in this repository. At the end of 2022, changes to RecordSearch broke the harvesting script, so I ran an extra harvest of the previous month on 20 January 2023 to make sure nothing was missed. I combined all the harvests into a single dataset, filtered it to include only 2022, and removed any duplicates. I also added series titles. The harvesting method is documented in this notebook.

    The dataset is saved in CSV format and includes the following columns:

    • title – the title of this file
    • item_id – the identifier of this file
    • series – the identifier of the series that contains this item
    • control_symbol – the control symbol of this file
    • date_range – the date range of the item's contents
    • date_digitised – the date the file was digitised
    • series_title – title of the series that contains this item

    You can construct a url to a digitised file using the item_id. For example:

    http://recordsearch.naa.gov.au/scripts/AutoSearch.asp?O=I&Number=[item_id]

    For more information on harvesting data from RecordSearch, see the GLAM Workbench.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465

Data from: Kabat Database of Sequences of Proteins of Immunological Interest

RRID:SCR_006465, nif-0000-21233, Kabat Database of Sequences of Proteins of Immunological Interest (RRID:SCR_006465), Kabat Database

Related Article
Explore at:
Dataset updated
Jun 27, 2024
Description

The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.

Search
Clear search
Close search
Google apps
Main menu