27 datasets found
  1. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  2. 2022 Bikeshare Data -Reduced File Size -All Months

    • kaggle.com
    zip
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kendall Marie (2023). 2022 Bikeshare Data -Reduced File Size -All Months [Dataset]. https://www.kaggle.com/datasets/kendallmarie/2022-bikeshare-data-all-months-combined
    Explore at:
    zip(98884 bytes)Available download formats
    Dataset updated
    Mar 8, 2023
    Authors
    Kendall Marie
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is a condensed version of the raw data obtained through the Google Data Analytics Course, made available by Lyft and the City of Chicago under this license (https://ride.divvybikes.com/data-license-agreement).

    I originally did my study in another platform, and the original files were too large to upload to Posit Cloud in full. Each of the 12 monthly files contained anywhere from 100k to 800k rows. Therefore, I decided to reduce the number of rows drastically by performing grouping, summaries, and thoughtful omissions in Excel for each csv file. What I have uploaded here is the result of that process.

    Data is grouped by: month, day, rider_type, bike_type, and time_of_day. total_rides represent the sum of the data in each grouping as well as the total number of rows that were combined to make the new summarized row, avg_ride_length is the calculated average of all data in each grouping.

    Be sure that you use weighted averages if you want to calculate the mean of avg_ride_length for different subgroups as the values in this file are already averages of the summarized groups. You can include the total_rides value in your weighted average calculation to weigh properly.

    9 Columns:

    date - year, month, and day in date format - includes all days in 2022 day_of_week - Actual day of week as character. Set up a new sort order if needed. rider_type - values are either 'casual', those who pay per ride, or 'member', for riders who have annual memberships. bike_type - Values are 'classic' (non-electric, traditional bikes), or 'electric' (e-bikes). time_of_day - this divides the day into 6 equal time frames, 4 hours each, starting at 12AM. Each individual ride was placed into one of these time frames using the time they STARTED their rides, even if the ride was long enough to end in a later time frame. This column was added to help summarize the original dataset. total_rides - Count of all individual rides in each grouping (row). This column was added to help summarize the original dataset. avg_ride_length - The calculated average of all rides in each grouping (row). Look to total_rides to know how many original rides length values were included in this average. This column was added to help summarize the original dataset. min_ride_length - Minimum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset. max_ride_length - Maximum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset.

    Please note: the time_of_day column has inconsistent spacing. Use mutate(time_of_day = gsub(" ", "", time_of _day)) to remove all spaces.

    Revisions

    Below is the list of revisions I made in Excel before uploading the final csv files to the R environment:

    • Deleted station location columns and lat/long as much of this data was already missing.

    • Deleted ride id column since each observation was unique and I would not be joining with another table on this variable.

    • Deleted rows pertaining to "docked bikes" since there were no member entries for this type and I could not compare member vs casual rider data. I also received no information in the project details about what constitutes a "docked" bike.

    • Used ride start time and end time to calculate a new column called ride_length (by subtracting), and deleted all rows with 0 and 1 minute results, which were explained in the project outline as being related to staff tasks rather than users. An example would be taking a bike out of rotation for maintenance.

    • Placed start time into a range of times (time_of_day) in order to group more observations while maintaining general time data. time_of_day now represents a time frame when the bike ride BEGAN. I created six 4-hour time frames, beginning at 12AM.

    • Added a Day of Week column, with Sunday = 1 and Saturday = 7, then changed from numbers to the actual day names.

    • Used pivot tables to group total_rides, avg_ride_length, min_ride_length, and max_ride_length by date, rider_type, bike_type, and time_of_day.

    • Combined into one csv file with all months, containing less than 9,000 rows (instead of several million)

  3. Vehicle licensing statistics data files

    • s3.amazonaws.com
    • gov.uk
    Updated May 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2022). Vehicle licensing statistics data files [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/181/1811927.html
    Explore at:
    Dataset updated
    May 24, 2022
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    The following datafiles contain detailed information about vehicles in the UK, which would be too large to use as structured tables. They are provided as simple CSV text files that should be easier to use digitally.

    We welcome any feedback on the structure of our new datafiles, their usability, or any suggestions for improvements, please contact vehicles statistics.

    How to use CSV files

    CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).

    When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.

    Download data files

    Make and model by quarter

    df_VEH0120_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077520/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 37.6 MB)

    Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)

    Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

    df_VEH0120_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077521/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 20.8 MB)

    Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)

    Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

    df_VEH0160_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077522/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 17.1 MB)

    Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)

    Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

    df_VEH0160_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077523/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 4.93 MB)

    Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)

    Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

    Make and model by age

    df_VEH0124: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077524/df_VEH0124.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model, model, year of first use and year of manufacture: United Kingdom (CSV, 28.2 MB)

    Scope: All licensed vehicles in the United Kingdom; 2021 Quarter 4 (end December) only

    Schema: BodyType, Make, GenModel, Model, YearFirstUsed, YearManufacture, Licensed (number of vehicles), SORN (number of vehicles)

    Make and model by engine size

    df_VEH0220: <a class="govu

  4. Excel file containing additional data too large to fit in a PDF,...

    • plos.figshare.com
    xlsx
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides (2024). Excel file containing additional data too large to fit in a PDF, CUT&RUN–RNAseq merge analyses. [Dataset]. http://doi.org/10.1371/journal.pgen.1011495.s018
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel file containing additional data too large to fit in a PDF, CUT&RUN–RNAseq merge analyses.

  5. d

    COVID-19 Nursing Home Data

    • catalog.data.gov
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2021). COVID-19 Nursing Home Data [Dataset]. https://catalog.data.gov/zh_Hant_TW/dataset/covid-19-nursing-home-data
    Explore at:
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    Centers for Disease Control and Prevention
    Description

    Submitted data as of the week ending 11/28/2021. The Nursing Home COVID-19 Public File includes data reported by nursing homes to the CDC’s National Healthcare Safety Network (NHSN) Long Term Care Facility (LTCF) COVID-19 Module: Surveillance Reporting Pathways and COVID-19 Vaccinations. For resources and ways to explore and visualize the data, please see the links to the left, as well as the buttons at the top of the page. Please note: Starting with week ending 9/12/2021, the full downloadable file has become too large to open in most spreadsheet programs, including Microsoft Excel. If you require smaller files, you can use the links below to download 2020 and 2021 data separately: Dataset for 2020 Dataset for 2021

  6. [Superseded] Intellectual Property Government Open Data 2019

    • data.gov.au
    • researchdata.edu.au
    csv-geo-au, pdf
    Updated Jan 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2022). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://data.gov.au/data/dataset/activity/intellectual-property-government-open-data-2019
    Explore at:
    csv-geo-au(59281977), csv-geo-au(680030), csv-geo-au(39873883), csv-geo-au(37247273), csv-geo-au(25433945), csv-geo-au(92768371), pdf(702054), csv-geo-au(208449), csv-geo-au(166844), csv-geo-au(517357734), csv-geo-au(32100526), csv-geo-au(33981694), csv-geo-au(21315), csv-geo-au(6828919), csv-geo-au(86824299), csv-geo-au(359763), csv-geo-au(567412), csv-geo-au(153175), csv-geo-au(165051861), csv-geo-au(115749297), csv-geo-au(79743393), csv-geo-au(55504675), csv-geo-au(221026), csv-geo-au(50760305), csv-geo-au(2867571), csv-geo-au(212907250), csv-geo-au(4352457), csv-geo-au(4843670), csv-geo-au(1032589), csv-geo-au(1163830), csv-geo-au(278689420), csv-geo-au(28585330), csv-geo-au(130674), csv-geo-au(13968748), csv-geo-au(11926959), csv-geo-au(4802733), csv-geo-au(243729054), csv-geo-au(64511181), csv-geo-au(592774239), csv-geo-au(149948862)Available download formats
    Dataset updated
    Jan 26, 2022
    Dataset authored and provided by
    IP Australiahttp://ipaustralia.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is IPGOD?

    The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.

    How do I use IPGOD?

    IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.

    IP Data Platform

    IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform

    References

    The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.

    Updates

    Tables and columns

    Due to the changes in our systems, some tables have been affected.

    • We have added IPGOD 225 and IPGOD 325 to the dataset!
    • The IPGOD 206 table is not available this year.
    • Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.

    Data quality improvements

    Data quality has been improved across all tables.

    • Null values are simply empty rather than '31/12/9999'.
    • All date columns are now in ISO format 'yyyy-mm-dd'.
    • All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.
    • All tables are encoded in UTF-8.
    • All tables use the backslash \ as the escape character.
    • The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
  7. 2022 General Payment Data

    • healthdata.gov
    csv, xlsx, xml
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenPaymentsData.cms.gov (2023). 2022 General Payment Data [Dataset]. https://healthdata.gov/w/xgjv-zhkt/default?cur=ly0nnF2pd50
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Centers for Medicare & Medicaid Services
    Description

    All general (non-research, non-ownership related) payments from the 2022 program year [January 1 – December 31, 2022]

    NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.

  8. g

    IP Australia - [Superseded] Intellectual Property Government Open Data 2019...

    • gimi9.com
    Updated Jul 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). IP Australia - [Superseded] Intellectual Property Government Open Data 2019 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_intellectual-property-government-open-data-2019
    Explore at:
    Dataset updated
    Jul 20, 2018
    Area covered
    Australia
    Description

    What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. # How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. # IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform # References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. * Patents * Trade Marks * Designs * Plant Breeder’s Rights # Updates ### Tables and columns Due to the changes in our systems, some tables have been affected. * We have added IPGOD 225 and IPGOD 325 to the dataset! * The IPGOD 206 table is not available this year. * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. ### Data quality improvements Data quality has been improved across all tables. * Null values are simply empty rather than '31/12/9999'. * All date columns are now in ISO format 'yyyy-mm-dd'. * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. * All tables are encoded in UTF-8. * All tables use the backslash \ as the escape character. * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  9. f

    Excel form S1. Data reported in the paper. from Robotic modelling of snake...

    • rs.figshare.com
    xlsx
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiyuan Fu; Chen Li (2024). Excel form S1. Data reported in the paper. from Robotic modelling of snake traversing large, smooth obstacles reveals stability benefits of body compliance [Dataset]. http://doi.org/10.6084/m9.figshare.11881506.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset provided by
    The Royal Society
    Authors
    Qiyuan Fu; Chen Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Snakes can move through almost any terrain. Although their locomotion on flat surfaces using planar gaits is inherently stable, when snakes deform their body out of plane to traverse complex terrain, maintaining stability becomes a challenge. On trees and desert dunes, snakes grip branches or brace against depressed sand for stability. However, how they stably surmount obstacles like boulders too large and smooth to gain such ‘anchor points’ is less understood. Similarly, snake robots are challenged to stably traverse large, smooth obstacles for search and rescue and building inspection. Our recent study discovered that snakes combine body lateral undulation and cantilevering to stably traverse large steps. Here, we developed a snake robot with this gait and snake-like anisotropic friction and used it as a physical model to understand stability principles. The robot traversed steps as high as a third of its body length rapidly and stably. However, on higher steps, it was more likely to fail due to more frequent rolling and flipping over, which was absent in the snake with a compliant body. Adding body compliance reduced the robot's roll instability by statistically improving surface contact, without reducing speed. Besides advancing understanding of snake locomotion, our robot achieved high traversal speed surpassing most previous snake robots and approaching snakes, while maintaining high traversal probability.

  10. 2024 General Payment Data

    • healthdata.gov
    csv, xlsx, xml
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenPaymentsData.cms.gov (2025). 2024 General Payment Data [Dataset]. https://healthdata.gov/CMS/2024-General-Payment-Data/2fsj-j6dj
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Centers for Medicare & Medicaid Services
    Description

    All general (non-research, non-ownership related) payments from the 2024 program year [January 1 – December 31, 2024]

    NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.

  11. A

    Earnings Public-Use File, 2006

    • data.amerigeoss.org
    • data.wu.ac.at
    zip
    Updated Jul 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2019). Earnings Public-Use File, 2006 [Dataset]. https://data.amerigeoss.org/it/dataset/earnings-public-use-file-2006
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2019
    Dataset provided by
    United States
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Social Security Administration released Earnings Public-Use File (EPUF) for 2006. File contains earnings information for individuals drawn from a systematic random 1-percent sample of all Social Security numbers (SSNs) issued before January 2007. EPUF consists of two linkable subfiles. One contains selected demographic and aggregate earnings information for all 4,348,254 individuals in the file, and the second contains annual earnings records for the 3,131,424 individuals who had positive earnings in at least 1 year from 1951 through 2006. Please Note: This data set is very large and will not work properly in Microsoft Excel. Data software capable of handling large files should be used.

  12. 2023 General Payment Data

    • healthdata.gov
    csv, xlsx, xml
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenPaymentsData.cms.gov (2024). 2023 General Payment Data [Dataset]. https://healthdata.gov/CMS/2023-General-Payment-Data/rjgu-is5n
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    Centers for Medicare & Medicaid Services
    Description

    All general (non-research, non-ownership related) payments from the 2023 program year [January 1 – December 31, 2023]

    NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.

  13. u

    Electrification of Heat Demonstration Project, 2020-2023

    • datacatalogue.ukdataservice.ac.uk
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Energy Systems Catapult (2024). Electrification of Heat Demonstration Project, 2020-2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-9050-2
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Energy Systems Catapult
    Area covered
    United Kingdom
    Description

    The heat pump monitoring datasets are a key output of the Electrification of Heat Demonstration (EoH) project, a government-funded heat pump trial assessing the feasibility of heat pumps across the UK’s diverse housing stock. These datasets are provided in both cleansed and raw form and allow analysis of the initial performance of the heat pumps installed in the trial. From the datasets, insights such as heat pump seasonal performance factor (a measure of the heat pump's efficiency), heat pump performance during the coldest day of the year, and half-hourly performance to inform peak demand can be gleaned.

    For the second edition (December 2024), the data were updated to include cleaned performance data collected between November 2020 and September 2023. The only documentation currently available with the study is the Excel data dictionary. Reports and other contextual information can be found on the Energy Systems Catapult website.

    The EoH project was funded by the Department of Business, Energy and Industrial Strategy. From 2023, it is covered by the new Department for Energy Security and Net Zero.

    Data availability

    This study comprises the open-access cleansed data from the EoH project and a summary dataset, available in four zipped files (see the 'Access Data' tab). Users must download all four zip files to obtain the full set of cleansed data and accompanying documentation.

    When unzipped, the full cleansed data comprises 742 CSV files. Most of the individual CSV files are too large to open in Excel. Users should ensure they have sufficient computing facilities to analyse the data.

    The UKDS also holds an accompanying study, SN 9049 Electrification of Heat Demonstration Project: Heat Pump Performance Raw Data, 2020-2023, which is available only to registered UKDS users. This contains the raw data from the EoH project. Since the data are very large, only the summary dataset is available to download; an order must be placed for FTP delivery of the remaining raw data. Other studies in the set include SN 9209, which comprises 30-minute interval heat pump performance data, and SN 9210, which includes daily heat pump performance data.

    The Python code used to cleanse the raw data and then perform the analysis is accessible via the "https://github.com/ES-Catapult/electrification_of_heat" target="_blank"> Energy Systems Catapult Github

  14. Data from: Student Academic Performance Dataset

    • kaggle.com
    zip
    Updated Sep 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olaniyan Julius (2024). Student Academic Performance Dataset [Dataset]. https://www.kaggle.com/datasets/olaniyanjulius/student-academic-performance-dataset
    Explore at:
    zip(1139508 bytes)Available download formats
    Dataset updated
    Sep 2, 2024
    Authors
    Olaniyan Julius
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset was carefully created to evaluate student performance using a more holistic approach that goes beyond the traditional metrics of Continuous Assessment (CA) and Examination (Exam) scores. This new dataset integrates additional variables to provide a comprehensive evaluation framework. The columns in the dataset are defined as follows: x1 (Attendance): Represents the student's attendance, measured on a scale of 0 to 10. x2 (Practical Skills): Captures the student's practical skills, also measured on a scale of 0 to 10. x3 (Demeanor): Reflects the student's demeanor, measured on a scale of 0 to 10. x4 (Presentation Quality): Assesses the quality of the student's presentations, with scores ranging from 0 to 10. x5 (Class Participation): Measures the level of participation in class, scored between 0 and 10. x6 (Continuous Assessment): Represents the continuous assessment scores, ranging from 0 to 10. x7 (Examination): Reflects the student's performance in examinations, with a range of 0 to 40 marks. total: This column is the sum of selected feature values, aggregating the performance across different metrics to provide a cumulative score. remarks: This column contains categorical values (1 to 3), which classify the overall performance based on the 'total' score or other feature values. The generated dataset contained approximately 72,000,000 records, which was too large to load as a single Excel file. To manage this, the dataset was divided into 200 files, each containing roughly 363,000 records. From each of these files, 1,000 records were randomly extracted, resulting in a final dataset of 200,000 records. These records span all the chosen grades, ensuring a comprehensive and balanced representation across different performance levels. This dataset provides a higher-dimensional representation of student performance, making it suitable for advanced analytical models and comprehensive evaluations of academic success.

  15. f

    CSV Data Dump for 31 Day Model Run

    • brunel.figshare.com
    xlsx
    Updated Jul 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ioana Pisica; Alex Gray (2023). CSV Data Dump for 31 Day Model Run [Dataset]. http://doi.org/10.17633/rd.brunel.23545038.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 14, 2023
    Dataset provided by
    Brunel University London
    Authors
    Ioana Pisica; Alex Gray
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The target company's hydraulic modelling package uses Innovyze InfoworksTM. This product enables third party integration through API’s and Ruby scripts when the ICM Exchange service is enabled. As a result, the research looked at opportunities to exploit scripting in order to run the chosen optimisation strategy. The first approach initially investigated the use of a CS-script tool that would export the results tables directly from the Innovyze InfoworksTM environment into CSV format workbooks. From here the data could then be inspected, with the application of mathematical tooling to optimise the pump start parameters before returning these back into the model and rerunning. Note, the computational resource the research obtained to deploy the modelling and analysis tools comprised the following specification. Hardware

    Dell Poweredge R720

    Intel Xeon Processor E5-2600 v2

    2x Processor Sockets

    32GB Memory random access memory (RAM) – 1866MT/s Virtual Machine

    Hosted on VMWare Hypervisor v6.0.

    Windows Server 2012R2.

    Microsoft Excel 64bit.

    16 virtual-central-processing-units (V-CPU’s).

    Full provision of 32GB RAM – 1866MT/s.

    were highlighted in the first round of data exports as, even with a dedicated

    Issues server offering 16-V-CPUs, and the specification as shown above, the Excel frontend environment was unable to process the very large data matrices being generated. There were regular failings of the Excel executable which led to an overall inability to inspect the data let alone run calculations on the matrices. When considering the five- second sample over 31 days this resulted in matrices in the order of [44x535682] per model run, with the calculations in (14-19) needing to be applied on a per cell basis.

  16. c

    LADOT Parking Meter Occupancy - Archive

    • s.cnmilf.com
    • data.lacity.org
    • +1more
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.lacity.org (2025). LADOT Parking Meter Occupancy - Archive [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/ladot-parking-meter-occupancy-archive
    Explore at:
    Dataset updated
    Oct 4, 2025
    Dataset provided by
    data.lacity.org
    Description

    Monthly archive of all parking meter sensor activity over the previous 36 months (3 years). Updated monthly for data 2 months prior (eg. January data will be published early March). For best-available current "live" status, see "LADOT Parking Meter Occupancy". For _location and parking policy details, see "LADOT Metered Parking Inventory & Policies". This dataset is geared towards database professionals and/or app developers. Each file is extremely large, over 300MB at minimum. Common applications like Microsoft Excel will not be able to open the file and show all data. ** For best results, import into a database or use advanced data access methods appropriate for processing large files.

  17. 2018 General Payment Data

    • healthdata.gov
    csv, xlsx, xml
    Updated Jan 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenPaymentsData.cms.gov (2022). 2018 General Payment Data [Dataset]. https://healthdata.gov/widgets/yfej-cxn5?mobile_redirect=true
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Jan 21, 2022
    Dataset provided by
    Centers for Medicare & Medicaid Services
    Description

    All general (non-research, non-ownership related) payments from the 2018 program year [January 1 – December 31, 2018]

    NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.

  18. All raw data files

    • figshare.com
    hdf
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Saltzman (2025). All raw data files [Dataset]. http://doi.org/10.6084/m9.figshare.29260688.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jordan Saltzman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All data files across all recording sessions in the Enhanced Naturalistic Habitat (ENH). Excel files contain data on behavior and USVs.mat files are the call detection filesAudio files are too large to be uploaded, please contact me for access to the audio files. The call detection files contain the spectrograms from each recording session. The spectrograms are boxed and labeled according to call type. To view these files, one needs to run DeepSqueak on Matlab. On DeepSqueak, select the detection file under the 'Select Detection Files' dropdownSelect the corresponding audio file under the 'Select Audio Files' dropdownAll files are labeled by date, so they can easily be cross-referenced.

  19. Z

    Measuring Bulk Crystallographic Texture from Ti-6Al-4V Hot-Rolled Sample...

    • data.niaid.nih.gov
    Updated Feb 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel, Christopher Stuart; Zeng, Xiaohan; Quinta da Fonseca, João (2023). Measuring Bulk Crystallographic Texture from Ti-6Al-4V Hot-Rolled Sample Matrices using Synchrotron X-ray Diffraction (Analysis Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7437908
    Explore at:
    Dataset updated
    Feb 3, 2023
    Dataset provided by
    The Univeristy of Manchester
    The University of Manchester
    Authors
    Daniel, Christopher Stuart; Zeng, Xiaohan; Quinta da Fonseca, João
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset of synchrotron X-ray diffraction (SXRD) analysis files, recording the refinement of crystallographic texture from a number of Ti-6Al-4V (Ti-64) sample matrices, containing a total of 93 hot-rolled samples, from three different orthogonal sample directions. The aim of the work was to accurately quantify bulk macro-texture for both the α (hexagonal close packed, hcp) and β (body-centred cubic, bcc) phases across a range of different processing conditions.

    Material

    Prior to the experiment, the Ti-64 materials had been hot-rolled at a range of different temperatures, and to different reductions, followed by air-cooling, using a rolling mill at The University of Manchester. Rectangular specimens (6 mm x 5 mm x 2 mm) were then machined from the centre of these rolled blocks, and from the starting material. The samples were cut along different orthogonal rolling directions and are referenced according to alignment of the rolling directions (RD – rolling direction, TD – transverse direction, ND – normal direction) with the long horizontal (X) axis and short vertical (Y) axis of the rectangular specimens. Samples of the same orientation were glued together to form matrices for the synchrotron analysis. The material, rolling conditions, sample orientations and experiment reference numbers used for the synchrotron diffraction analysis are included in the data as an excel spreadsheet.

    SXRD Data Collection

    Data was recorded using a high energy 90 keV synchrotron X-ray beam and a 5 second exposure at the detector for each measurement point. The slits were adjusted to give a 0.5 x 0.5 mm beam area, chosen to optimally resolve both the α and β phase peaks. The SXRD data was recorded by stage-scanning the beam in sequential X-Y positions at 0.5 mm increments across the rectangular sample matrices, containing a number of samples glued together, to analyse a total of 93 samples from the different processing conditions and orientations. Post-processing of the data was then used to sort the data into a rectangular grid of measurement points from each individual sample.

    Diffraction Pattern Averaging

    The stage-scan diffraction pattern images from each matrix were sorted into individual samples, and the images averaged together for each specimen, using a Python notebook sxrd-tiff-summer. The averaged .tiff images each capture average diffraction peak intensities from an area of about 30 mm2 (equivalent to a total volume of ~ 60 mm3), with three different sample orientations then used to calculate the bulk crystallographic texture from each rolling condition.

    SXRD Data Analysis

    A new Fourier-based peak fitting method from the Continuous-Peak-Fit Python package was used to fit full diffraction pattern ring intensities, using a range of different lattice plane peaks for determining crystallographic texture in both the α and β phases. Bulk texture was calculated by combining the ring intensities from three different sample orientations.

    A .poni calibration file was created using Dioptas, through a refinement matching peak intensities from a LaB6 or CeO2 standard diffraction pattern image. Two calibrations were needed as some of the data was collected in July 2022 and some of the data was collected in August 2022. Dioptas was then used to determine peak bounds in 2θ for characterising a total of 22 α and 4 β lattice plane rings from the averaged Ti-64 diffraction pattern images, which were recorded in a .py input script. Using these two inputs, Continuous-Peak-Fit automatically converts full diffraction pattern rings into profiles of intensity versus azimuthal angle, for each 2θ section, which can also include multiple overlapping α and β peaks.

    The Continuous-Peak-Fit refinement can be launched in a notebook or from the terminal, to automatically calculate a full mathematical description, in the form of Fourier expansion terms, to match the intensity variation of each individual lattice plane ring. The results for peak position, intensity and half-width for all 22 α and 4 β lattice plane peaks were recorded at an azimuthal resolution of 1º and stored in a .fit output file. Details for setting up and running this analysis can be found in the continuous-peak-fit-analysis package. This package also includes a Python script for extracting lattice plane ring intensity distributions from the .fit files, matching the intensity values with spherical polar coordinates to parametrise the intensity distributions from each of the three different sample orientations, in the form of pole figures. The script can also be used to combine intensity distributions from different sample orientations. The final intensity variations are recorded for each of the lattice plane peaks as text files, which can be loaded into MTEX to plot and analyse both the α and β phase crystallographic texture.

    Metadata

    An accompanying YAML text file contains associated SXRD beamline metadata for each measurement. The raw data is in the form of synchrotron diffraction pattern .tiff images which were too large to upload to Zenodo and are instead stored on The University of Manchester's Research Database Storage (RDS) repository. The raw data can therefore be obtained by emailing the authors.

    The material data folder documents the machining of the samples and the sample orientations.

    The associated processing metadata for the Continuous-Peak-Fit analyses records information about the different packages used to process the data, along with details about the different files contained within this analysis dataset.

  20. Excel file containing additional data too large to fit in a PDF, RNAseq...

    • plos.figshare.com
    xlsx
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides (2024). Excel file containing additional data too large to fit in a PDF, RNAseq mCrip2 KO DEG Prol and Diff. [Dataset]. http://doi.org/10.1371/journal.pgen.1011495.s017
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel file containing additional data too large to fit in a PDF, RNAseq mCrip2 KO DEG Prol and Diff.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Organization logo

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Search
Clear search
Close search
Google apps
Main menu