3 datasets found

n
Data from: Kabat Database of Sequences of Proteins of Immunological Interest...
neuinfo.org
dknet.org
+2more
Updated Jun 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006465
Dataset updated
Jun 27, 2024
Description
The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.
U.S. Commercial Aviation Industry Metrics
kaggle.com
zip
Updated Jul 13, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Franklin Bradfield (2017). U.S. Commercial Aviation Industry Metrics [Dataset]. https://www.kaggle.com/shellshock1911/us-commercial-aviation-industry-metrics
Explore at:
zip(1573798 bytes)Available download formats
Dataset updated
Jul 13, 2017
Authors
Franklin Bradfield
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Have you taken a flight in the U.S. in the past 15 years? If so, then you are a part of monthly data that the U.S. Department of Transportation's TranStats service makes available on various metrics for 15 U.S. airlines and 30 major U.S airports. Their website unfortunately does not include a method for easily downloading and sharing files. Furthermore, the source is built in ASP.NET, so extracting the data is rather cumbersome. To allow easier community access to this rich source of information, I scraped the metrics for every airline / airport combination and stored them in separate CSV files.

Occasionally, an airline doesn't serve a certain airport, or it didn't serve it for the entire duration that the data collection period covers*. In those cases, the data either doesn't exist or is typically too sparse to be of much use. As such, I've only uploaded complete files for airports that an airline served for the entire uninterrupted duration of the collection period. For these files, there should be 174 time series points for one or more of the nine columns below. I recommend any of the files for American, Delta, or United Airlines for outstanding examples of complete and robust airline data.

* No data for Atlas Air exists, and Virgin America commenced service in 2007, so no folders for either airline are included.

Content

There are 13 airlines that have at least one complete dataset. Each airline's folder includes CSV file(s) for each airport that are complete as defined by the above criteria. I've double-checked the files, but if you find one that violates the criteria, please point it out. The file names have the format "AIRLINE-AIRPORT.csv", where both AIRLINE and AIRPORT are IATA codes. For a full listing of the airlines and airports that the codes correspond to, check out the airline_codes.csv or airport_codes.csv files that are included, or perform a lookup here. Note that the data in each airport file represents metrics for flights that originated at the airport.

Among the 13 airlines in data.zip, there are a total of 161 individual datasets. There are also two special folders included - airlines_all_airports.csv and airports_all_airlines.csv. The first contains datasets for each airline aggregated over all airports, while the second contains datasets for each airport aggregated over all airlines. To preview a sample dataset, check out all_airlines_all_airports.csv, which contains industry-wide data.

Each file includes the following metrics for each month from October 2002 to March 2017:

Date (YYYY-MM-DD): All dates are set to the first of the month. The day value is just a placeholder and has no significance.

ASM_Domestic: Available Seat-Miles in thousands (000s). Number of domestic flights * Number of seats on each flight

ASM_International*: Available Seat-Miles in thousands (000s). Number of international flights * Number of seats on each flight

Flights_Domestic

Flights_International*

Passengers_Domestic

Passengers_International*

RPM_Domestic: Revenue Passenger-Miles in thousands (000s). Number of domestic flights * Number of paying passengers

RPM_International*: Revenue Passenger-Miles in thousands (000s). Number of international flights * Number of paying passengers

* Frequently contains missing values

Acknowledgements

Thanks to the U.S. Department of Transportation for collecting this data every month and making it publicly available to us all.

Source: https://www.transtats.bts.gov/Data_Elements.aspx

Inspiration

The airline / airport datasets are perfect for practicing and/or testing time series forecasting with classic statistical models such as autoregressive integrated moving average (ARIMA), or modern deep learning techniques such as long short-term memory (LSTM) networks. The datasets typically show evidence of trends, seasonality, and noise, so modeling and accurate forecasting can be challenging, but still more tractable than time series problems possessing more stochastic elements, e.g. stocks, currencies, commodities, etc. The source releases new data each month, so feel free to check your models' performances against new data as it comes out. I will update the files here every 3 to 6 months depending on how things go.

A future plan is to build a SQLite database so a vast array of queries can be run against the data. The data in it its current time series format is not conducive for this, so coming up with a workable structure for the tables is the first step towards this goal. If you have any suggestions for how I can improve the data presentation, or anything that you would like me to add, please let me know. Looking forward to seeing the questions that we can answer together!
Files digitised by the National Archives of Australia, 25 February 2021 to...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Sherratt; Tim Sherratt (2025). Files digitised by the National Archives of Australia, 25 February 2021 to 24 December 2022 [Dataset]. http://doi.org/10.5281/zenodo.7567138
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7567138
Dataset updated
Jan 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tim Sherratt; Tim Sherratt
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains details of 731,079 files digitised by the National Archives of Australia in 2021 and 2022.

The National Archives of Australia's online database, RecordSearch, includes a list of recently digitised files, but the list only includes files digitised in the last month. This dataset was created by combining regular harvests of this list to create a continuous record of files digitised from 2021 and 2022. It was created and shared to help document long-term changes in access to files held by the NAA.

The first harvest was run on 27 March 2021 and captured details back to 25 February. Since then I have automatically run weekly harvests and saved them in this repository. At the end of 2022, changes to RecordSearch broke the harvesting script, so I ran an extra harvest of the previous month on 20 January 2023 to make sure nothing was missed. I combined all the harvests into a single dataset, filtered it to include only 2022, and removed any duplicates. I also added series titles. The harvesting method is documented in this notebook.

The dataset is saved in CSV format and includes the following columns:

title – the title of this file

item_id – the identifier of this file

series – the identifier of the series that contains this item

control_symbol – the control symbol of this file

date_range – the date range of the item's contents

date_digitised – the date the file was digitised

series_title – title of the series that contains this item

You can construct a url to a digitised file using the item_id. For example:

http://recordsearch.naa.gov.au/scripts/AutoSearch.asp?O=I&Number=[item_id]

For more information on harvesting data from RecordSearch, see the GLAM Workbench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Kabat Database of Sequences of Proteins of Immunological Interest [Dataset]. http://identifiers.org/RRID:SCR_006465

Data from: Kabat Database of Sequences of Proteins of Immunological Interest

RRID:SCR_006465, nif-0000-21233, Kabat Database of Sequences of Proteins of Immunological Interest (RRID:SCR_006465), Kabat Database

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_006465

Dataset updated

Jun 27, 2024

Description

The Kabat Database determines the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. The Kabat Database searching and analysis tools package is an ASP.NET web-based portal containing lookup tools, sequence matching tools, alignment tools, length distribution tools, positional correlation tools and much more. The searching and analysis tools are custom made for the aligned data sets contained in both the SQL Server and ASCII text flat file formats. The searching and analysis tools may be run on a single PC workstation or in a distributed environment. The analysis tools are written in ASP.NET and C# and are available in Visual Studio .NET 2003/2005/2008 formats. The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences at that time. Bence Jones proteins, mostly from human, were aligned, using the now-known Kabat numbering system, and a quantitative measure, variability, was calculated for every position. Three peaks, at positions 24-34, 50-56 and 89-97, were identified and proposed to form the complementarity determining regions (CDR) of light chains. Subsequently, antibody heavy chain amino acid sequences were also aligned using a different numbering system, since the locations of their CDRs (31-35B, 50-65 and 95-102) are different from those of the light chains. CDRL1 starts right after the first invariant Cys 23 of light chains, while CDRH1 is eight amino acid residues away from the first invariant Cys 22 of heavy chains. During the past 30 years, the Kabat database has grown to include nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules and other proteins of immunological interest. It has been used extensively by immunologists to derive useful structural and functional information from the primary sequences of these proteins.

Clear search

Close search

Google apps

Main menu

Data from: Kabat Database of Sequences of Proteins of Immunological Interest...

U.S. Commercial Aviation Industry Metrics

Context

Content

Acknowledgements

Inspiration

Files digitised by the National Archives of Australia, 25 February 2021 to...

Data from: Kabat Database of Sequences of Proteins of Immunological InterestSee More Versions

RRID:SCR_006465, nif-0000-21233, Kabat Database of Sequences of Proteins of Immunological Interest (RRID:SCR_006465), Kabat Database

Data from: Kabat Database of Sequences of Proteins of Immunological Interest