27 datasets found

Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
2022 Bikeshare Data -Reduced File Size -All Months
kaggle.com
zip
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kendall Marie (2023). 2022 Bikeshare Data -Reduced File Size -All Months [Dataset]. https://www.kaggle.com/datasets/kendallmarie/2022-bikeshare-data-all-months-combined
Explore at:
zip(98884 bytes)Available download formats
Dataset updated
Mar 8, 2023
Authors
Kendall Marie
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This is a condensed version of the raw data obtained through the Google Data Analytics Course, made available by Lyft and the City of Chicago under this license (https://ride.divvybikes.com/data-license-agreement).

I originally did my study in another platform, and the original files were too large to upload to Posit Cloud in full. Each of the 12 monthly files contained anywhere from 100k to 800k rows. Therefore, I decided to reduce the number of rows drastically by performing grouping, summaries, and thoughtful omissions in Excel for each csv file. What I have uploaded here is the result of that process.

Data is grouped by: month, day, rider_type, bike_type, and time_of_day. total_rides represent the sum of the data in each grouping as well as the total number of rows that were combined to make the new summarized row, avg_ride_length is the calculated average of all data in each grouping.

Be sure that you use weighted averages if you want to calculate the mean of avg_ride_length for different subgroups as the values in this file are already averages of the summarized groups. You can include the total_rides value in your weighted average calculation to weigh properly.

9 Columns:

date - year, month, and day in date format - includes all days in 2022 day_of_week - Actual day of week as character. Set up a new sort order if needed. rider_type - values are either 'casual', those who pay per ride, or 'member', for riders who have annual memberships. bike_type - Values are 'classic' (non-electric, traditional bikes), or 'electric' (e-bikes). time_of_day - this divides the day into 6 equal time frames, 4 hours each, starting at 12AM. Each individual ride was placed into one of these time frames using the time they STARTED their rides, even if the ride was long enough to end in a later time frame. This column was added to help summarize the original dataset. total_rides - Count of all individual rides in each grouping (row). This column was added to help summarize the original dataset. avg_ride_length - The calculated average of all rides in each grouping (row). Look to total_rides to know how many original rides length values were included in this average. This column was added to help summarize the original dataset. min_ride_length - Minimum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset. max_ride_length - Maximum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset.

Please note: the time_of_day column has inconsistent spacing. Use mutate(time_of_day = gsub(" ", "", time_of _day)) to remove all spaces.

Revisions

Below is the list of revisions I made in Excel before uploading the final csv files to the R environment:

Deleted station location columns and lat/long as much of this data was already missing.

Deleted ride id column since each observation was unique and I would not be joining with another table on this variable.

Deleted rows pertaining to "docked bikes" since there were no member entries for this type and I could not compare member vs casual rider data. I also received no information in the project details about what constitutes a "docked" bike.

Used ride start time and end time to calculate a new column called ride_length (by subtracting), and deleted all rows with 0 and 1 minute results, which were explained in the project outline as being related to staff tasks rather than users. An example would be taking a bike out of rotation for maintenance.

Placed start time into a range of times (time_of_day) in order to group more observations while maintaining general time data. time_of_day now represents a time frame when the bike ride BEGAN. I created six 4-hour time frames, beginning at 12AM.

Added a Day of Week column, with Sunday = 1 and Saturday = 7, then changed from numbers to the actual day names.

Used pivot tables to group total_rides, avg_ride_length, min_ride_length, and max_ride_length by date, rider_type, bike_type, and time_of_day.

Combined into one csv file with all months, containing less than 9,000 rows (instead of several million)
Vehicle licensing statistics data files
s3.amazonaws.com
gov.uk
Updated May 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Transport (2022). Vehicle licensing statistics data files [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/181/1811927.html
Explore at:
Dataset updated
May 24, 2022
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Transport
Description
The following datafiles contain detailed information about vehicles in the UK, which would be too large to use as structured tables. They are provided as simple CSV text files that should be easier to use digitally.

Data tables containing aggregated information about vehicles in the UK are also available.

We welcome any feedback on the structure of our new datafiles, their usability, or any suggestions for improvements, please contact vehicles statistics.

How to use CSV files

CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).

When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.

Download data files

Make and model by quarter

df_VEH0120_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077520/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 37.6 MB)

Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)

Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

df_VEH0120_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077521/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 20.8 MB)

Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)

Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]

df_VEH0160_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077522/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 17.1 MB)

Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)

Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

df_VEH0160_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077523/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 4.93 MB)

Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)

Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]

Make and model by age

df_VEH0124: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077524/df_VEH0124.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model, model, year of first use and year of manufacture: United Kingdom (CSV, 28.2 MB)

Scope: All licensed vehicles in the United Kingdom; 2021 Quarter 4 (end December) only

Schema: BodyType, Make, GenModel, Model, YearFirstUsed, YearManufacture, Licensed (number of vehicles), SORN (number of vehicles)

Make and model by engine size

df_VEH0220: <a class="govu
Excel file containing additional data too large to fit in a PDF,...
plos.figshare.com
xlsx
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides (2024). Excel file containing additional data too large to fit in a PDF, CUT&RUN–RNAseq merge analyses. [Dataset]. http://doi.org/10.1371/journal.pgen.1011495.s018
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1011495.s018
Dataset updated
Dec 26, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel file containing additional data too large to fit in a PDF, CUT&RUN–RNAseq merge analyses.
d
COVID-19 Nursing Home Data
catalog.data.gov
Updated Jul 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2021). COVID-19 Nursing Home Data [Dataset]. https://catalog.data.gov/zh_Hant_TW/dataset/covid-19-nursing-home-data
Explore at:
Dataset updated
Jul 22, 2021
Dataset provided by
Centers for Disease Control and Prevention
Description
Submitted data as of the week ending 11/28/2021. The Nursing Home COVID-19 Public File includes data reported by nursing homes to the CDC’s National Healthcare Safety Network (NHSN) Long Term Care Facility (LTCF) COVID-19 Module: Surveillance Reporting Pathways and COVID-19 Vaccinations. For resources and ways to explore and visualize the data, please see the links to the left, as well as the buttons at the top of the page. Please note: Starting with week ending 9/12/2021, the full downloadable file has become too large to open in most spreadsheet programs, including Microsoft Excel. If you require smaller files, you can use the links below to download 2020 and 2021 data separately: Dataset for 2020 Dataset for 2021
[Superseded] Intellectual Property Government Open Data 2019
data.gov.au
researchdata.edu.au
csv-geo-au, pdf
Updated Jan 26, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IP Australia (2022). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://data.gov.au/data/dataset/activity/intellectual-property-government-open-data-2019
Explore at:
csv-geo-au(59281977), csv-geo-au(680030), csv-geo-au(39873883), csv-geo-au(37247273), csv-geo-au(25433945), csv-geo-au(92768371), pdf(702054), csv-geo-au(208449), csv-geo-au(166844), csv-geo-au(517357734), csv-geo-au(32100526), csv-geo-au(33981694), csv-geo-au(21315), csv-geo-au(6828919), csv-geo-au(86824299), csv-geo-au(359763), csv-geo-au(567412), csv-geo-au(153175), csv-geo-au(165051861), csv-geo-au(115749297), csv-geo-au(79743393), csv-geo-au(55504675), csv-geo-au(221026), csv-geo-au(50760305), csv-geo-au(2867571), csv-geo-au(212907250), csv-geo-au(4352457), csv-geo-au(4843670), csv-geo-au(1032589), csv-geo-au(1163830), csv-geo-au(278689420), csv-geo-au(28585330), csv-geo-au(130674), csv-geo-au(13968748), csv-geo-au(11926959), csv-geo-au(4802733), csv-geo-au(243729054), csv-geo-au(64511181), csv-geo-au(592774239), csv-geo-au(149948862)Available download formats
Dataset updated
Jan 26, 2022
Dataset authored and provided by
IP Australiahttp://ipaustralia.gov.au/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
What is IPGOD?

The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.

How do I use IPGOD?

IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.

IP Data Platform

IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform

References

The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.

Patents

Trade Marks

Designs

Plant Breeder’s Rights

Updates

Tables and columns

Due to the changes in our systems, some tables have been affected.

We have added IPGOD 225 and IPGOD 325 to the dataset!

The IPGOD 206 table is not available this year.

Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.

Data quality improvements

Data quality has been improved across all tables.

Null values are simply empty rather than '31/12/9999'.

All date columns are now in ISO format 'yyyy-mm-dd'.

All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.

All tables are encoded in UTF-8.

All tables use the backslash \ as the escape character.

The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
2022 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2023). 2022 General Payment Data [Dataset]. https://healthdata.gov/w/xgjv-zhkt/default?cur=ly0nnF2pd50
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Jul 1, 2023
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2022 program year [January 1 – December 31, 2022]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
g
IP Australia - [Superseded] Intellectual Property Government Open Data 2019...
gimi9.com
Updated Jul 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). IP Australia - [Superseded] Intellectual Property Government Open Data 2019 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_intellectual-property-government-open-data-2019
Explore at:
Dataset updated
Jul 20, 2018
Area covered
Australia
Description
What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. # How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. # IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform # References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. * Patents * Trade Marks * Designs * Plant Breeder’s Rights # Updates ### Tables and columns Due to the changes in our systems, some tables have been affected. * We have added IPGOD 225 and IPGOD 325 to the dataset! * The IPGOD 206 table is not available this year. * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. ### Data quality improvements Data quality has been improved across all tables. * Null values are simply empty rather than '31/12/9999'. * All date columns are now in ISO format 'yyyy-mm-dd'. * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. * All tables are encoded in UTF-8. * All tables use the backslash \ as the escape character. * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
f
Excel form S1. Data reported in the paper. from Robotic modelling of snake...
rs.figshare.com
xlsx
Updated Feb 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiyuan Fu; Chen Li (2024). Excel form S1. Data reported in the paper. from Robotic modelling of snake traversing large, smooth obstacles reveals stability benefits of body compliance [Dataset]. http://doi.org/10.6084/m9.figshare.11881506.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11881506.v1
Dataset updated
Feb 19, 2024
Dataset provided by
The Royal Society
Authors
Qiyuan Fu; Chen Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Snakes can move through almost any terrain. Although their locomotion on flat surfaces using planar gaits is inherently stable, when snakes deform their body out of plane to traverse complex terrain, maintaining stability becomes a challenge. On trees and desert dunes, snakes grip branches or brace against depressed sand for stability. However, how they stably surmount obstacles like boulders too large and smooth to gain such ‘anchor points’ is less understood. Similarly, snake robots are challenged to stably traverse large, smooth obstacles for search and rescue and building inspection. Our recent study discovered that snakes combine body lateral undulation and cantilevering to stably traverse large steps. Here, we developed a snake robot with this gait and snake-like anisotropic friction and used it as a physical model to understand stability principles. The robot traversed steps as high as a third of its body length rapidly and stably. However, on higher steps, it was more likely to fail due to more frequent rolling and flipping over, which was absent in the snake with a compliant body. Adding body compliance reduced the robot's roll instability by statistically improving surface contact, without reducing speed. Besides advancing understanding of snake locomotion, our robot achieved high traversal speed surpassing most previous snake robots and approaching snakes, while maintaining high traversal probability.
2024 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2025). 2024 General Payment Data [Dataset]. https://healthdata.gov/CMS/2024-General-Payment-Data/2fsj-j6dj
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Jul 1, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2024 program year [January 1 – December 31, 2024]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
A
Earnings Public-Use File, 2006
data.amerigeoss.org
data.wu.ac.at
zip
Updated Jul 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2019). Earnings Public-Use File, 2006 [Dataset]. https://data.amerigeoss.org/it/dataset/earnings-public-use-file-2006
Explore at:
zipAvailable download formats
Dataset updated
Jul 27, 2019
Dataset provided by
United States
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Social Security Administration released Earnings Public-Use File (EPUF) for 2006. File contains earnings information for individuals drawn from a systematic random 1-percent sample of all Social Security numbers (SSNs) issued before January 2007. EPUF consists of two linkable subfiles. One contains selected demographic and aggregate earnings information for all 4,348,254 individuals in the file, and the second contains annual earnings records for the 3,131,424 individuals who had positive earnings in at least 1 year from 1951 through 2006. Please Note: This data set is very large and will not work properly in Microsoft Excel. Data software capable of handling large files should be used.
2023 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2024). 2023 General Payment Data [Dataset]. https://healthdata.gov/CMS/2023-General-Payment-Data/rjgu-is5n
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Jul 1, 2024
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2023 program year [January 1 – December 31, 2023]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
u
Electrification of Heat Demonstration Project, 2020-2023
datacatalogue.ukdataservice.ac.uk
Updated Dec 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Energy Systems Catapult (2024). Electrification of Heat Demonstration Project, 2020-2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-9050-2
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-9050-2
Dataset updated
Dec 19, 2024
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Energy Systems Catapult
Area covered
United Kingdom
Description
The heat pump monitoring datasets are a key output of the Electrification of Heat Demonstration (EoH) project, a government-funded heat pump trial assessing the feasibility of heat pumps across the UK’s diverse housing stock. These datasets are provided in both cleansed and raw form and allow analysis of the initial performance of the heat pumps installed in the trial. From the datasets, insights such as heat pump seasonal performance factor (a measure of the heat pump's efficiency), heat pump performance during the coldest day of the year, and half-hourly performance to inform peak demand can be gleaned.

For the second edition (December 2024), the data were updated to include cleaned performance data collected between November 2020 and September 2023. The only documentation currently available with the study is the Excel data dictionary. Reports and other contextual information can be found on the Energy Systems Catapult website.

The EoH project was funded by the Department of Business, Energy and Industrial Strategy. From 2023, it is covered by the new Department for Energy Security and Net Zero.

Data availability

This study comprises the open-access cleansed data from the EoH project and a summary dataset, available in four zipped files (see the 'Access Data' tab). Users must download all four zip files to obtain the full set of cleansed data and accompanying documentation.

When unzipped, the full cleansed data comprises 742 CSV files. Most of the individual CSV files are too large to open in Excel. Users should ensure they have sufficient computing facilities to analyse the data.

The UKDS also holds an accompanying study, SN 9049 Electrification of Heat Demonstration Project: Heat Pump Performance Raw Data, 2020-2023, which is available only to registered UKDS users. This contains the raw data from the EoH project. Since the data are very large, only the summary dataset is available to download; an order must be placed for FTP delivery of the remaining raw data. Other studies in the set include SN 9209, which comprises 30-minute interval heat pump performance data, and SN 9210, which includes daily heat pump performance data.

The Python code used to cleanse the raw data and then perform the analysis is accessible via the "https://github.com/ES-Catapult/electrification_of_heat" target="_blank"> Energy Systems Catapult Github
Data from: Student Academic Performance Dataset
kaggle.com
zip
Updated Sep 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olaniyan Julius (2024). Student Academic Performance Dataset [Dataset]. https://www.kaggle.com/datasets/olaniyanjulius/student-academic-performance-dataset
Explore at:
zip(1139508 bytes)Available download formats
Dataset updated
Sep 2, 2024
Authors
Olaniyan Julius
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset was carefully created to evaluate student performance using a more holistic approach that goes beyond the traditional metrics of Continuous Assessment (CA) and Examination (Exam) scores. This new dataset integrates additional variables to provide a comprehensive evaluation framework. The columns in the dataset are defined as follows: x1 (Attendance): Represents the student's attendance, measured on a scale of 0 to 10. x2 (Practical Skills): Captures the student's practical skills, also measured on a scale of 0 to 10. x3 (Demeanor): Reflects the student's demeanor, measured on a scale of 0 to 10. x4 (Presentation Quality): Assesses the quality of the student's presentations, with scores ranging from 0 to 10. x5 (Class Participation): Measures the level of participation in class, scored between 0 and 10. x6 (Continuous Assessment): Represents the continuous assessment scores, ranging from 0 to 10. x7 (Examination): Reflects the student's performance in examinations, with a range of 0 to 40 marks. total: This column is the sum of selected feature values, aggregating the performance across different metrics to provide a cumulative score. remarks: This column contains categorical values (1 to 3), which classify the overall performance based on the 'total' score or other feature values. The generated dataset contained approximately 72,000,000 records, which was too large to load as a single Excel file. To manage this, the dataset was divided into 200 files, each containing roughly 363,000 records. From each of these files, 1,000 records were randomly extracted, resulting in a final dataset of 200,000 records. These records span all the chosen grades, ensuring a comprehensive and balanced representation across different performance levels. This dataset provides a higher-dimensional representation of student performance, making it suitable for advanced analytical models and comprehensive evaluations of academic success.
f
CSV Data Dump for 31 Day Model Run
brunel.figshare.com
xlsx
Updated Jul 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ioana Pisica; Alex Gray (2023). CSV Data Dump for 31 Day Model Run [Dataset]. http://doi.org/10.17633/rd.brunel.23545038.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.17633/rd.brunel.23545038.v1
Dataset updated
Jul 14, 2023
Dataset provided by
Brunel University London
Authors
Ioana Pisica; Alex Gray
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The target company's hydraulic modelling package uses Innovyze InfoworksTM. This product enables third party integration through API’s and Ruby scripts when the ICM Exchange service is enabled. As a result, the research looked at opportunities to exploit scripting in order to run the chosen optimisation strategy. The first approach initially investigated the use of a CS-script tool that would export the results tables directly from the Innovyze InfoworksTM environment into CSV format workbooks. From here the data could then be inspected, with the application of mathematical tooling to optimise the pump start parameters before returning these back into the model and rerunning. Note, the computational resource the research obtained to deploy the modelling and analysis tools comprised the following specification. Hardware

Dell Poweredge R720

Intel Xeon Processor E5-2600 v2

2x Processor Sockets

32GB Memory random access memory (RAM) – 1866MT/s Virtual Machine

Hosted on VMWare Hypervisor v6.0.

Windows Server 2012R2.

Microsoft Excel 64bit.

16 virtual-central-processing-units (V-CPU’s).

Full provision of 32GB RAM – 1866MT/s.

were highlighted in the first round of data exports as, even with a dedicated

Issues server offering 16-V-CPUs, and the specification as shown above, the Excel frontend environment was unable to process the very large data matrices being generated. There were regular failings of the Excel executable which led to an overall inability to inspect the data let alone run calculations on the matrices. When considering the five- second sample over 31 days this resulted in matrices in the order of [44x535682] per model run, with the calculations in (14-19) needing to be applied on a per cell basis.
c
LADOT Parking Meter Occupancy - Archive
s.cnmilf.com
data.lacity.org
+1more
Updated Oct 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). LADOT Parking Meter Occupancy - Archive [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/ladot-parking-meter-occupancy-archive
Explore at:
Dataset updated
Oct 4, 2025
Dataset provided by
data.lacity.org
Description
Monthly archive of all parking meter sensor activity over the previous 36 months (3 years). Updated monthly for data 2 months prior (eg. January data will be published early March). For best-available current "live" status, see "LADOT Parking Meter Occupancy". For _location and parking policy details, see "LADOT Metered Parking Inventory & Policies". This dataset is geared towards database professionals and/or app developers. Each file is extremely large, over 300MB at minimum. Common applications like Microsoft Excel will not be able to open the file and show all data. ** For best results, import into a database or use advanced data access methods appropriate for processing large files.
2018 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jan 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2022). 2018 General Payment Data [Dataset]. https://healthdata.gov/widgets/yfej-cxn5?mobile_redirect=true
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Jan 21, 2022
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2018 program year [January 1 – December 31, 2018]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
All raw data files
figshare.com
hdf
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Saltzman (2025). All raw data files [Dataset]. http://doi.org/10.6084/m9.figshare.29260688.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29260688.v1
Dataset updated
Jun 6, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jordan Saltzman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All data files across all recording sessions in the Enhanced Naturalistic Habitat (ENH). Excel files contain data on behavior and USVs.mat files are the call detection filesAudio files are too large to be uploaded, please contact me for access to the audio files. The call detection files contain the spectrograms from each recording session. The spectrograms are boxed and labeled according to call type. To view these files, one needs to run DeepSqueak on Matlab. On DeepSqueak, select the detection file under the 'Select Detection Files' dropdownSelect the corresponding audio file under the 'Select Audio Files' dropdownAll files are labeled by date, so they can easily be cross-referenced.
Z
Measuring Bulk Crystallographic Texture from Ti-6Al-4V Hot-Rolled Sample...
data.niaid.nih.gov
Updated Feb 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel, Christopher Stuart; Zeng, Xiaohan; Quinta da Fonseca, João (2023). Measuring Bulk Crystallographic Texture from Ti-6Al-4V Hot-Rolled Sample Matrices using Synchrotron X-ray Diffraction (Analysis Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7437908
Explore at:
Dataset updated
Feb 3, 2023
Dataset provided by
The Univeristy of Manchester
The University of Manchester
Authors
Daniel, Christopher Stuart; Zeng, Xiaohan; Quinta da Fonseca, João
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset of synchrotron X-ray diffraction (SXRD) analysis files, recording the refinement of crystallographic texture from a number of Ti-6Al-4V (Ti-64) sample matrices, containing a total of 93 hot-rolled samples, from three different orthogonal sample directions. The aim of the work was to accurately quantify bulk macro-texture for both the α (hexagonal close packed, hcp) and β (body-centred cubic, bcc) phases across a range of different processing conditions.

Material

Prior to the experiment, the Ti-64 materials had been hot-rolled at a range of different temperatures, and to different reductions, followed by air-cooling, using a rolling mill at The University of Manchester. Rectangular specimens (6 mm x 5 mm x 2 mm) were then machined from the centre of these rolled blocks, and from the starting material. The samples were cut along different orthogonal rolling directions and are referenced according to alignment of the rolling directions (RD – rolling direction, TD – transverse direction, ND – normal direction) with the long horizontal (X) axis and short vertical (Y) axis of the rectangular specimens. Samples of the same orientation were glued together to form matrices for the synchrotron analysis. The material, rolling conditions, sample orientations and experiment reference numbers used for the synchrotron diffraction analysis are included in the data as an excel spreadsheet.

SXRD Data Collection

Data was recorded using a high energy 90 keV synchrotron X-ray beam and a 5 second exposure at the detector for each measurement point. The slits were adjusted to give a 0.5 x 0.5 mm beam area, chosen to optimally resolve both the α and β phase peaks. The SXRD data was recorded by stage-scanning the beam in sequential X-Y positions at 0.5 mm increments across the rectangular sample matrices, containing a number of samples glued together, to analyse a total of 93 samples from the different processing conditions and orientations. Post-processing of the data was then used to sort the data into a rectangular grid of measurement points from each individual sample.

Diffraction Pattern Averaging

The stage-scan diffraction pattern images from each matrix were sorted into individual samples, and the images averaged together for each specimen, using a Python notebook sxrd-tiff-summer. The averaged .tiff images each capture average diffraction peak intensities from an area of about 30 mm2 (equivalent to a total volume of ~ 60 mm3), with three different sample orientations then used to calculate the bulk crystallographic texture from each rolling condition.

SXRD Data Analysis

A new Fourier-based peak fitting method from the Continuous-Peak-Fit Python package was used to fit full diffraction pattern ring intensities, using a range of different lattice plane peaks for determining crystallographic texture in both the α and β phases. Bulk texture was calculated by combining the ring intensities from three different sample orientations.

A .poni calibration file was created using Dioptas, through a refinement matching peak intensities from a LaB6 or CeO2 standard diffraction pattern image. Two calibrations were needed as some of the data was collected in July 2022 and some of the data was collected in August 2022. Dioptas was then used to determine peak bounds in 2θ for characterising a total of 22 α and 4 β lattice plane rings from the averaged Ti-64 diffraction pattern images, which were recorded in a .py input script. Using these two inputs, Continuous-Peak-Fit automatically converts full diffraction pattern rings into profiles of intensity versus azimuthal angle, for each 2θ section, which can also include multiple overlapping α and β peaks.

The Continuous-Peak-Fit refinement can be launched in a notebook or from the terminal, to automatically calculate a full mathematical description, in the form of Fourier expansion terms, to match the intensity variation of each individual lattice plane ring. The results for peak position, intensity and half-width for all 22 α and 4 β lattice plane peaks were recorded at an azimuthal resolution of 1º and stored in a .fit output file. Details for setting up and running this analysis can be found in the continuous-peak-fit-analysis package. This package also includes a Python script for extracting lattice plane ring intensity distributions from the .fit files, matching the intensity values with spherical polar coordinates to parametrise the intensity distributions from each of the three different sample orientations, in the form of pole figures. The script can also be used to combine intensity distributions from different sample orientations. The final intensity variations are recorded for each of the lattice plane peaks as text files, which can be loaded into MTEX to plot and analyse both the α and β phase crystallographic texture.

Metadata

An accompanying YAML text file contains associated SXRD beamline metadata for each measurement. The raw data is in the form of synchrotron diffraction pattern .tiff images which were too large to upload to Zenodo and are instead stored on The University of Manchester's Research Database Storage (RDS) repository. The raw data can therefore be obtained by emailing the authors.

The material data folder documents the machining of the samples and the sample orientations.

The associated processing metadata for the Continuous-Peak-Fit analyses records information about the different packages used to process the data, along with details about the different files contained within this analysis dataset.
Excel file containing additional data too large to fit in a PDF, RNAseq...
plos.figshare.com
xlsx
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides (2024). Excel file containing additional data too large to fit in a PDF, RNAseq mCrip2 KO DEG Prol and Diff. [Dataset]. http://doi.org/10.1371/journal.pgen.1011495.s017
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1011495.s017
Dataset updated
Dec 26, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Odette Verdejo-Torres; David C. Klein; Lorena Novoa-Aponte; Jaime Carrazco-Carrillo; Denzel Bonilla-Pinto; Antonio Rivera; Arpie Bakhshian; Fa’alataitaua M. Fitisemanu; Martha L. Jiménez-González; Lyra Flinn; Aidan T. Pezacki; Antonio Lanzirotti; Luis Antonio Ortiz Frade; Christopher J. Chang; Juan G. Navea; Crysten E. Blaby-Haas; Sarah J. Hainer; Teresita Padilla-Benavides
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel file containing additional data too large to fit in a PDF, RNAseq mCrip2 KO DEG Prol and Diff.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Explore at:

Dataset updated

Apr 21, 2025

Dataset provided by

Agricultural Research Servicehttps://www.ars.usda.gov/

Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Clear search

Close search

Google apps

Main menu

Data from: Current and projected research data storage needs of Agricultural...

2022 Bikeshare Data -Reduced File Size -All Months

9 Columns:

Revisions

Vehicle licensing statistics data files

How to use CSV files

Download data files

Make and model by quarter

Make and model by age

Make and model by engine size

Excel file containing additional data too large to fit in a PDF,...

COVID-19 Nursing Home Data

[Superseded] Intellectual Property Government Open Data 2019

What is IPGOD?

How do I use IPGOD?

IP Data Platform

References

Updates

Tables and columns

Data quality improvements

2022 General Payment Data

IP Australia - [Superseded] Intellectual Property Government Open Data 2019...

Excel form S1. Data reported in the paper. from Robotic modelling of snake...

2024 General Payment Data

Earnings Public-Use File, 2006

2023 General Payment Data

Electrification of Heat Demonstration Project, 2020-2023

Data from: Student Academic Performance Dataset

CSV Data Dump for 31 Day Model Run

LADOT Parking Meter Occupancy - Archive

2018 General Payment Data

All raw data files

Measuring Bulk Crystallographic Texture from Ti-6Al-4V Hot-Rolled Sample...

Excel file containing additional data too large to fit in a PDF, RNAseq...

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016See More Versions

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016