34 datasets found

2023 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2024). 2023 General Payment Data [Dataset]. https://healthdata.gov/CMS/2023-General-Payment-Data/rjgu-is5n
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Jul 1, 2024
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2023 program year [January 1 – December 31, 2023]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
2022 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2023). 2022 General Payment Data [Dataset]. https://healthdata.gov/w/xgjv-zhkt/default?cur=ly0nnF2pd50
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Jul 1, 2023
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2022 program year [January 1 – December 31, 2022]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
US Broadband Usage Across Counties
kaggle.com
zip
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes/code
Explore at:
zip(46127 bytes)Available download formats
Dataset updated
Jan 6, 2023
Authors
The Devastator
Area covered
United States
Description
US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

By Amber Thomas [source]

About this dataset

This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use the US Broadband Usage Dataset

This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

Research Ideas

Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.

Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.

Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: broadband_data_2020October.csv

Acknowledgements

If you use this dataset in your research,...
2024 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2025). 2024 General Payment Data [Dataset]. https://healthdata.gov/CMS/2024-General-Payment-Data/2fsj-j6dj
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Jul 1, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2024 program year [January 1 – December 31, 2024]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
Individuals and Households Program - Valid Registrations
catalog.data.gov
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FEMA/Response and Recovery/Recovery Directorate (2025). Individuals and Households Program - Valid Registrations [Dataset]. https://catalog.data.gov/dataset/individuals-and-households-program-valid-registrations-nemis
Explore at:
Dataset updated
Jun 7, 2025
Dataset provided by
Federal Emergency Management Agencyhttp://www.fema.gov/
Description
This dataset contains FEMA applicant-level data for the Individuals and Households Program (IHP). All PII information has been removed. The location is represented by county, city, and zip code. This dataset contains Individual Assistance (IA) applications from DR1439 (declared in 2002) to those declared over 30 days ago. The full data set is refreshed on an annual basis and refreshed weekly to update disasters declared in the last 18 months. This dataset includes all major disasters and includes only valid registrants (applied in a declared county, within the registration period, having damage due to the incident and damage within the incident period). Information about individual data elements and descriptions are listed in the metadata information within the dataset.rnValid registrants may be eligible for IA assistance, which is intended to meet basic needs and supplement disaster recovery efforts. IA assistance is not intended to return disaster-damaged property to its pre-disaster condition. Disaster damage to secondary or vacation homes does not qualify for IHP assistance.rnData comes from FEMA's National Emergency Management Information System (NEMIS) with raw, unedited, self-reported content and subject to a small percentage of human error.rnAny financial information is derived from NEMIS and not FEMA's official financial systems. Due to differences in reporting periods, status of obligations and application of business rules, this financial information may differ slightly from official publication on public websites such as usaspending.gov. This dataset is not intended to be used for any official federal reporting. rnCitation: The Agency’s preferred citation for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnDue to the size of this file, tools other than a spreadsheet may be required to analyze, visualize, and manipulate the data. MS Excel will not be able to process files this large without data loss. It is recommended that a database (e.g., MS Access, MySQL, PostgreSQL, etc.) be used to store and manipulate data. Other programming tools such as R, Apache Spark, and Python can also be used to analyze and visualize data. Further, basic Linux/Unix tools can be used to manipulate, search, and modify large files.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.rnThis dataset is scheduled to be superceded by Valid Registrations Version 2 by early CY 2024.
M
MURA: MSK Xrays
stanfordaimi.azurewebsites.net
Updated Dec 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Research (2020). MURA: MSK Xrays [Dataset]. https://stanfordaimi.azurewebsites.net/datasets/3e00d84b-d86e-4fed-b2a4-bfe3effd661b
Explore at:
Dataset updated
Dec 7, 2020
Dataset authored and provided by
Microsoft Research
License
https://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/view
Description
MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal.

Musculoskeletal conditions affect more than 1.7 billion people worldwide, and are the most common cause of severe, long-term pain and disability, with 30 million emergency department visits annually and increasing. We hope that our dataset can lead to significant advances in medical imaging technologies which can diagnose at the level of experts, towards improving healthcare access in parts of the world where access to skilled radiologists is limited.

MURA is one of the largest public radiographic image datasets. We're making this dataset available to the community and hosting a competition to see if your models can perform as well as radiologists on the task.
2018 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jan 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2022). 2018 General Payment Data [Dataset]. https://healthdata.gov/widgets/yfej-cxn5?mobile_redirect=true
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Jan 21, 2022
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2018 program year [January 1 – December 31, 2018]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
Badger
kaggle.com
zip
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terry Eppler (2025). Badger [Dataset]. https://www.kaggle.com/datasets/terryeppler/badger/discussion?sort=undefined
Explore at:
zip(325078128 bytes)Available download formats
Dataset updated
Mar 16, 2025
Authors
Terry Eppler
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data sources for Badger an open source budget execution & data analysis tool for federal budget analysts with the environmental protection agency based on WPF, Net 6, and is written in C#.

⚙️Features

Multiple data providers.

Datasets can be found on Kaggle

Charting and reporting.

Internal web browser, Baby, with queries optimized for searching .gov domains.

Pre-defined schema for more than 100 environmental data models.

Editors for SQLite, SQL Compact Edition, MS Access, SQL Server Express.

Excel-ish UI on top of a real databases.

Mapping for congressional earmark reporting and monitoring of pollution sites.

Financial data bound to environmental programs and statutory authority.

Ad-hoc calculations.

Add agency/region/division-specific branding.

The Winforms version of Badger is Sherpa

📦 Database Providers

Databases play a critical role in environmental data analysis by providing a structured system to store, organize, and efficiently retrieve large amounts of data, allowing analysts to easily access and manipulate information needed to extract meaningful insights through queries and analysis tools; essentially acting as the central repository for data used in data analysis processes. Badger provides the following providers to store and analyze data locally.

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.

SQL CE is a discontinued but still useful relational database produced by Microsoft for applications that run on mobile devices and desktops.

SQL Server Express Edition is a scaled down, free edition of SQL Server, which includes the core database engine.

MS Access is a database management system (DBMS) from Microsoft that combines the relational Access Database Engine (ACE) with a graphical user interface and software-development tools. more here

💻 System requirements

You need VC++ 2019 Runtime 32-bit and 64-bit versions

You will need .NET 8.

You need to install the version of VC++ Runtime that Baby Browser needs. Since we are using CefSharp 106, according to this we need the above versions

📚Documentation

Compilation Guide - instructions on how to compile Badger.

Configuration Guide - instructions on how to configure Badger.

Distribution Guide - distributing Badger

📝 Code

Controls - main UI layer with numerous controls and related functionality.

Styles - XAML-based styles for the Badger UI layer.

Enumerations - various enumerations used for budgetary accounting.

Extensions- useful extension methods for budget analysis by type.

Clients - other tools used and available.

Ninja - models used in EPA budget data analysis.

IO - input output classes used for networking and the file system.

Static - static types used in the analysis of environmental budget data.

Interfaces - abstractions used in the analysis of environmental budget data.

bin - Binaries are included in the bin folder due to the complex Baby setup required. Don't empty this folder.

Badger uses CefSharp 106 for Baby Browser and is built on NET 8

Badger supports x64 specific builds

bin/storage - HTML and JS required for downloads manager and custom error pages _

Dashboards

Environmental...
Census of Population and Housing, 2010 [United States]: Summary File 2 With...
icpsr.umich.edu
search.datacite.org
Updated Jul 18, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States. Bureau of the Census (2013). Census of Population and Housing, 2010 [United States]: Summary File 2 With National Update [Dataset]. http://doi.org/10.3886/ICPSR34755.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR34755.v1
Dataset updated
Jul 18, 2013
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States. Bureau of the Census
License
https://www.icpsr.umich.edu/web/ICPSR/studies/34755/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/34755/terms
Time period covered
2010
Area covered
United States
Description
This data collection contains summary statistics on population and housing subjects derived from the responses to the 2010 Census questionnaire. Population items include sex, age, average household size, household type, and relationship to householder such as nonrelative or child. Housing items include tenure (whether a housing unit is owner-occupied or renter-occupied), age of householder, and household size for occupied housing units. Selected aggregates and medians also are provided. The summary statistics are presented in 71 tables, which are tabulated for multiple levels of observation (called "summary levels" in the Census Bureau's nomenclature), including, but not limited to, regions, divisions, states, metropolitan/micropolitan areas, counties, county subdivisions, places, ZIP Code Tabulation Areas (ZCTAs), school districts, census tracts, American Indian and Alaska Native areas, tribal subdivisions, and Hawaiian home lands. There are 10 population tables shown down to the county level and 47 population tables and 14 housing tables shown down to the census tract level. Every table cell is represented by a separate variable in the data. Each table is iterated for up to 330 population groups, which are called "characteristic iterations" in the Census Bureau's nomenclature: the total population, 74 race categories, 114 American Indian and Alaska Native categories, 47 Asian categories, 43 Native Hawaiian and Other Pacific Islander categories, and 51 Hispanic/not Hispanic groups. Moreover, the tables for some large summary areas (e.g., regions, divisions, and states) are iterated for portions of geographic areas ("geographic components" in the Census Bureau's nomenclature) such as metropolitan/micropolitan statistical areas and the principal cities of metropolitan statistical areas. The collection has a separate set of files for every state, the District of Columbia, Puerto Rico, and the National File. Each file set has 11 data files per characteristic iteration, a data file with geographic variables called the "geographic header file," and a documentation file called the "packing list" with information about the files in the file set. Altogether, the 53 file sets have 110,416 data files and 53 packing list files. Each file set is compressed in a separate ZIP archive (Datasets 1-56, 72, and 99). Another ZIP archive (Dataset 100) contains a Microsoft Access database shell and additional documentation files besides the codebook. The National File (Dataset 99) constitutes the National Update for Summary File 2. The National Update added summary levels for the United States as a whole, regions, divisions, and geographic areas that cross state lines such as Core Based Statistical Areas.
d
Data from: Alaska Geochemical Database Version 2.0 (AGDB2) - Including "Best...
dataone.org
data.wu.ac.at
Updated Dec 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Granitto; Jeanine M. Schmidt; Nora B. Shew; Bruce M. Gamble; Keith A. Labay (2016). Alaska Geochemical Database Version 2.0 (AGDB2) - Including "Best Value" Data Compilations for Geochemical Data for Rock, Sediment, Soil, Mineral, and Concentrate Sample Media [Dataset]. https://dataone.org/datasets/922c44f3-a83b-473d-9407-02acdc5272e7
Explore at:
Dataset updated
Dec 1, 2016
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Matthew Granitto; Jeanine M. Schmidt; Nora B. Shew; Bruce M. Gamble; Keith A. Labay
Time period covered
Jan 1, 1962 - Jan 1, 2010
Area covered
Alaska,
Variables measured
AU, au, id, ARS, BAR, CAS, CIN, CPY, FLR, GAL, and 605 more
Description
The Alaska Geochemical Database Version 2.0 (AGDB2) contains new geochemical data compilations in which each geologic material sample has one "best value" determination for each analyzed species, greatly improving speed and efficiency of use. Like the Alaska Geochemical Database (AGDB) before it, the AGDB2 was created and designed to compile and integrate geochemical data from Alaska in order to facilitate geologic mapping, petrologic studies, mineral resource assessments, definition of geochemical baseline values and statistics, environmental impact assessments, and studies in medical geology. This relational database, created from the Alaska Geochemical Database (AGDB) that was released in 2011, serves as a data archive in support of present and future Alaskan geologic and geochemical projects, and contains data tables in several different formats describing historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 85 laboratory and field analytical methods on 264,095 rock, sediment, soil, mineral and heavy-mineral concentrate samples. Most samples were collected by U.S. Geological Survey (USGS) personnel and analyzed in USGS laboratories or, under contracts, in commercial analytical laboratories. These data represent analyses of samples collected as part of various USGS programs and projects from 1962 through 2009. In addition, mineralogical data from 18,138 nonmagnetic heavy mineral concentrate samples are included in this database. The AGDB2 includes historical geochemical data originally archived in the USGS Rock Analysis Storage System (RASS) database, used from the mid-1960s through the late 1980s and the USGS PLUTO database used from the mid-1970s through the mid-1990s. All of these data are currently maintained in the National Geochemical Database (NGDB). Retrievals from the NGDB were used to generate most of the AGDB data set. These data were checked for accuracy regarding sample location, sample media type, and analytical methods used. This arduous process of reviewing, verifying and, where necessary, editing all USGS geochemical data resulted in a significantly improved Alaska geochemical dataset. USGS data that were not previously in the NGDB because the data predate the earliest USGS geochemical databases, or were once excluded for programmatic reasons, are included here in the AGDB2 and will be added to the NGDB. The AGDB2 data provided here are the most accurate and complete to date, and should be useful for a wide variety of geochemical studies. The AGDB2 data provided in the linked database may be updated or changed periodically.
f
High Performance Computational Analysis of Large-scale Proteome Data Sets to...
acs.figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nadin Neuhauser; Nagarjuna Nagaraj; Peter McHardy; Sara Zanivan; Richard Scheltema; Jürgen Cox; Matthias Mann (2023). High Performance Computational Analysis of Large-scale Proteome Data Sets to Assess Incremental Contribution to Coverage of the Human Genome [Dataset]. http://doi.org/10.1021/pr400181q.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/pr400181q.s001
Dataset updated
May 31, 2023
Dataset provided by
ACS Publications
Authors
Nadin Neuhauser; Nagarjuna Nagaraj; Peter McHardy; Sara Zanivan; Richard Scheltema; Jürgen Cox; Matthias Mann
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Computational analysis of shotgun proteomics data can now be performed in a completely automated and statistically rigorous way, as exemplified by the freely available MaxQuant environment. The sophisticated algorithms involved and the sheer amount of data translate into very high computational demands. Here we describe parallelization and memory optimization of the MaxQuant software with the aim of executing it on a large computer cluster. We analyze and mitigate bottlenecks in overall performance and find that the most time-consuming algorithms are those detecting peptide features in the MS1 data as well as the fragment spectrum search. These tasks scale with the number of raw files and can readily be distributed over many CPUs as long as memory access is properly managed. Here we compared the performance of a parallelized version of MaxQuant running on a standard desktop, an I/O performance optimized desktop computer (“game computer”), and a cluster environment. The modified gaming computer and the cluster vastly outperformed a standard desktop computer when analyzing more than 1000 raw files. We apply our high performance platform to investigate incremental coverage of the human proteome by high resolution MS data originating from in-depth cell line and cancer tissue proteome measurements.
2020 General Payment Data
healthdata.gov
csv, xlsx, xml
Updated Jan 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenPaymentsData.cms.gov (2022). 2020 General Payment Data [Dataset]. https://healthdata.gov/w/s9ha-fv52/default?cur=_6-yekKHZFV
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Jan 21, 2022
Dataset provided by
Centers for Medicare & Medicaid Services
Description
All general (non-research, non-ownership related) payments from the 2020 program year [January 1 – December 31, 2020]
NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.
d
Brown trout 29-year dataset - Glenariffe Stream, New Zealand
search.dataone.org
Updated Aug 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phil Jellyman; Don Jellyman (2025). Brown trout 29-year dataset - Glenariffe Stream, New Zealand [Dataset]. http://doi.org/10.5061/dryad.s7h44j1hp
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.s7h44j1hp
Dataset updated
Aug 28, 2025
Dataset provided by
Dryad Digital Repository
Authors
Phil Jellyman; Don Jellyman
Area covered
New Zealand
Description
Globally, many large rivers are modified to meet human needs, often with adverse impacts on fish populations. In New Zealand, such rivers support important recreational brown trout (Salmo trutta) fisheries, but understanding of river alterations on trout is limited. This study utilized New Zealandâ€™s most extensive brown trout dataset from Glenariffe Stream (1965â€“1993). Annual brown trout spawning numbers varied eight-fold; larger runs had more small, first-time spawners, while smaller runs were sustained by return spawners. Spawning timing differed by sex, with larger fish arriving two months later than initial smaller spawners. Juvenile outmigration was driven by water level, time of year, and lunar phase. Tagged fish data highlighted the significance of longitudinal connectivity for post-spawning adults, particularly females, which travelled over 100 km downstream to estuarine habitats to rapidly regain condition. Collectively, our findings quantify the inherent annual variability in ..., , , # Brown trout 29-year dataset - Glenariffe Stream, New Zealand

https://doi.org/10.5061/dryad.s7h44j1hp

Description of the data and file structure

The importance of river connectivity in maintaining headwater brown trout (Salmo trutta) stocks in a New Zealand river â€“ results from a 29-year study.

Study authors: Phillip G. Jellyman and Donald J. Jellyman

Corresponding Author:

Phillip G. Jellyman, National Institute of Water and Atmospheric Research Ltd., P.O. Box 8602, Christchurch, New Zealand.

Email: phillip.jellyman@niwa.co.nz,Â Phone: +64 3 343 8052

Data of data collection: 1965â€“1993

Location of primary data collection:Â a counting fence across Glenariffe Stream approximately 200 m above the confluence with the Rakaia River, Canterbury, New Zealand.

There are multiple datasets in this Microsoft Access Database file. These are primarily: adult salmonid trap, salmonid fry trap, tag return loca...,
c
LADOT Parking Meter Occupancy - Archive
s.cnmilf.com
data.lacity.org
+1more
Updated Oct 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). LADOT Parking Meter Occupancy - Archive [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/ladot-parking-meter-occupancy-archive
Explore at:
Dataset updated
Oct 4, 2025
Dataset provided by
data.lacity.org
Description
Monthly archive of all parking meter sensor activity over the previous 36 months (3 years). Updated monthly for data 2 months prior (eg. January data will be published early March). For best-available current "live" status, see "LADOT Parking Meter Occupancy". For _location and parking policy details, see "LADOT Metered Parking Inventory & Policies". This dataset is geared towards database professionals and/or app developers. Each file is extremely large, over 300MB at minimum. Common applications like Microsoft Excel will not be able to open the file and show all data. ** For best results, import into a database or use advanced data access methods appropriate for processing large files.
ContosoTR
kaggle.com
zip
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih Fidan (2025). ContosoTR [Dataset]. https://www.kaggle.com/datasets/kirshoff/contosotr
Explore at:
zip(55736552 bytes)Available download formats
Dataset updated
Jun 1, 2025
Authors
Fatih Fidan
Description
The contoso_TR.accdb dataset is a Microsoft Access relational database representing a localized version of the well-known Contoso retail business scenario, tailored for the Turkish market (TR). It provides a rich, realistic sample of sales, product, customer, and financial data that can be used for learning, reporting, and analytics purposes.

🧾 Dataset Description This dataset simulates the operations of Contoso Ltd., a fictitious retail company that sells electronic products and accessories through various sales channels across Turkey. The database is designed to support a wide range of data-driven tasks such as:

Data modeling and relationship design

SQL querying and data transformation

Business intelligence and reporting

Dashboard creation using Power BI or Excel

Training in Access VBA and macros

🌍 Localization Language: Turkish (column names and values are adapted)

Currency: Turkish Lira (₺)

Region: Turkey-specific location data (e.g., cities, regions, and stores)

Date format: gg.aa.yyyy (Turkish date format)

✅ Use Cases Practicing Access SQL queries

Creating forms and reports in Microsoft Access

Developing ETL pipelines using sample business data

Preparing Power BI dashboards with Turkish-language data

Learning how to normalize and relate data in a business context

📌 Notes The dataset is static and does not reflect real-time data.

No real customer information is included; all data is synthetic.

It is ideal for educational and demonstration purposes.

If you'd like, I can help you:

Design a Power BI report using this dataset

Convert it to SQL Server or another format

Write SQL queries to extract business insights
Microsoft Stocks from 1986 to 2023
kaggle.com
zip
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Bilal Hussain (2023). Microsoft Stocks from 1986 to 2023 [Dataset]. https://www.kaggle.com/bilalwaseer/microsoft-stocks-from-1986-to-2023
Explore at:
zip(122776 bytes)Available download formats
Dataset updated
May 16, 2023
Authors
Muhammad Bilal Hussain
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This comprehensive dataset provides a detailed analysis of Microsoft Corporation's stock performance from 1986 to 2023. It encompasses various important parameters, including stock price, low price, **high ** price, and trading volume, to provide a comprehensive overview of the company's market behavior throughout the years.

The dataset begins in 1986, marking the early years of Microsoft's presence in the stock market. As one of the pioneering companies in the technology industry, Microsoft's stock performance has been closely followed by investors, analysts, and enthusiasts alike. The dataset captures the fluctuations and trends in the stock market, reflecting the company's journey from its inception to its position as a global tech giant.

The stock price data offers a glimpse into the market valuation of Microsoft shares over time. By observing the daily closing prices, one can track the trajectory of the stock and identify key milestones in Microsoft's history. The dataset also includes the lowest and highest prices reached during each trading day, offering insight into the price range within which the stock fluctuated.

Trading volume data provides an additional dimension for understanding Microsoft's stock market activity. It highlights the level of investor interest and participation in buying and selling Microsoft shares during each trading day. Tracking trading volume can help identify periods of increased market activity or significant news events that influenced investor sentiment.

The dataset covers a span of several decades, enabling users to analyze long-term trends, market cycles, and historical patterns that have shaped Microsoft's stock performance. It can be used by researchers, investors, and analysts to conduct quantitative and qualitative studies, perform technical analyses, and gain insights into the dynamics of the technology industry and the broader market.

Please note that this dataset serves as a valuable historical resource and should be utilized alongside other relevant financial information and analysis to make informed decisions. The dataset captures Microsoft's stock performance up until 2023, ensuring that users have access to the latest available information.

Description: ChatGPT
d
Standardized Precipitation Index
datasets.ai
geohub.lio.gov.on.ca
+3more
21, 57
Updated Nov 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Ontario | Gouvernement de l'Ontario (2020). Standardized Precipitation Index [Dataset]. https://datasets.ai/datasets/522899b3-bce2-4244-935a-c2da2029c79a
Explore at:
21, 57Available download formats
Dataset updated
Nov 27, 2020
Dataset authored and provided by
Government of Ontario | Gouvernement de l'Ontario
Description
The Standardized Precipitation Index (SPI) was generated for certain Environment Canada long-term climate stations in Ontario. The SPI quantifies the precipitation deficit and surplus for multiple time scales , including: * one month * three months * six months * nine months * 12 months * 24 months You can use the SPI to study the impact of dry and wet weather conditions to create comprehensive water management approaches. The SPI data package is distributed as a Microsoft Access Geodatabase. This is a legacy dataset that we no longer maintain or support. The documents referenced in this record may contain URLs (links) that were valid when published, but now link to sites or pages that no longer exist.
BRAINTEASER ALS and MS Datasets
zenodo.org
data.europa.eu
Updated Jul 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guglielmo Faggioli; Alessandro Guazzo; Stefano Marchesin; Laura Menotti; Isotta Trescato; Helena Aidos; Roberto Bergamaschi; Giovanni Birolo; Paola Cavalla; Adriano Chiò; Arianna Dagliati; Mamede de Carvalho; Giorgio Maria Di Nunzio; Piero Fariselli; Jose Manuel García Dominguez; Marta Gromicho; Enrico Longato; Sara C. Madeira; Umberto Manera; Gianmaria Silvello; Eleonora Tavazzi; Erica Tavazzi; Marta Vettoretti; Barbara Di Camillo; Nicola Ferro; Nicola Ferro; Guglielmo Faggioli; Alessandro Guazzo; Stefano Marchesin; Laura Menotti; Isotta Trescato; Helena Aidos; Roberto Bergamaschi; Giovanni Birolo; Paola Cavalla; Adriano Chiò; Arianna Dagliati; Mamede de Carvalho; Giorgio Maria Di Nunzio; Piero Fariselli; Jose Manuel García Dominguez; Marta Gromicho; Enrico Longato; Sara C. Madeira; Umberto Manera; Gianmaria Silvello; Eleonora Tavazzi; Erica Tavazzi; Marta Vettoretti; Barbara Di Camillo (2024). BRAINTEASER ALS and MS Datasets [Dataset]. http://doi.org/10.5281/zenodo.8083181
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8083181
Dataset updated
Jul 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Guglielmo Faggioli; Alessandro Guazzo; Stefano Marchesin; Laura Menotti; Isotta Trescato; Helena Aidos; Roberto Bergamaschi; Giovanni Birolo; Paola Cavalla; Adriano Chiò; Arianna Dagliati; Mamede de Carvalho; Giorgio Maria Di Nunzio; Piero Fariselli; Jose Manuel García Dominguez; Marta Gromicho; Enrico Longato; Sara C. Madeira; Umberto Manera; Gianmaria Silvello; Eleonora Tavazzi; Erica Tavazzi; Marta Vettoretti; Barbara Di Camillo; Nicola Ferro; Nicola Ferro; Guglielmo Faggioli; Alessandro Guazzo; Stefano Marchesin; Laura Menotti; Isotta Trescato; Helena Aidos; Roberto Bergamaschi; Giovanni Birolo; Paola Cavalla; Adriano Chiò; Arianna Dagliati; Mamede de Carvalho; Giorgio Maria Di Nunzio; Piero Fariselli; Jose Manuel García Dominguez; Marta Gromicho; Enrico Longato; Sara C. Madeira; Umberto Manera; Gianmaria Silvello; Eleonora Tavazzi; Erica Tavazzi; Marta Vettoretti; Barbara Di Camillo
Description
BRAINTEASER (Bringing Artificial Intelligence home for a better care of amyotrophic lateral sclerosis and multiple sclerosis) is a data science project that seeks to exploit the value of big data, including those related to health, lifestyle habits, and environment, to support patients with Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS) and their clinicians. Taking advantage of cost-efficient sensors and apps, BRAINTEASER will integrate large, clinical datasets that host both patient-generated and environmental data.

As part of its activities, BRAINTEASER organized two open evaluation challenges on Intelligent Disease Progression Prediction (iDPP), iDPP@CLEF 2022 and iDPP@CLEF 2023, co-located with the Conference and Labs of the Evaluation Forum (CLEF).

The goal of iDPP@CLEF is to design and develop an evaluation infrastructure for AI algorithms able to:

better describe disease mechanisms;

stratify patients according to their phenotype assessed all over the disease evolution;

predict disease progression in a probabilistic, time dependent fashion.

The iDPP@CLEF challenges relied on retrospective ALS and MS patient data made available by the clinical partners of the BRAINTEASER consortium. The datasets contain data about 2,204 ALS patients (static variables, ALSFRS-R questionnaires, spirometry tests, environmental/pollution data) and 1,792 MS patients (static variables, EDSS scores, evoked potentials, relapses, MRIs).

More in detail, the BRAINTEASER project retrospective datasets derived from the merging of already existing datasets obtained by the clinical centers involved in the BRAINTEASER Project.

The ALS dataset was obtained by the merge and homogenisation of the Piemonte and Valle d’Aosta Registry for Amyotrophic Lateral Sclerosis (PARALS, Chiò et al., 2017) and the Lisbon ALS clinic (CENTRO ACADÉMICO DE MEDICINA DE LISBOA, Centro Hospitalar Universitário de Lisboa-Norte, Hospital de Santa Maria, Lisbon, Portugal,) dataset. Both datasets was initiated in 1995 and are currently maintained by researchers of the ALS Regional Expert Centre (CRESLA), University of Turin and of the CENTRO ACADÉMICO DE MEDICINA DE LISBOA-Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa. They include demographic and clinical data, comprehending both static and dynamic variables.

The MS dataset was obtained from the Pavia MS clinical dataset, that was started in 1990 and contains demographic and clinical information that are continuously updated by the researchers of the Institute and the Turin MS clinic dataset (Department of Neurosciences and Mental Health, Neurology Unit 1, Città della Salute e della Scienza di Torino.

Retrospective environmental data are accessible at various scales at the individual subject level. Thus, environmental data have been retrieved at different scales:

To gather macroscale air pollution data we’ve leveraged data coming from public monitoring stations that cover the whole extension of the involved countries, namely the European Air Quality Portal;

data from a network of air quality sensors (PurpleAir - Outdoor Air Quality Monitor / PurpleAir PA-II) installed in different points of the city of Pavia (Italy) were extracted as well. In both cases, environmental data were previously publicly available. In order to merge environmental data with individual subject location we leverage on postcodes (postcodes of the station for the pollutant detection and postcodes of subject address). Data were merged following an anonymization procedure based on hash keys. Environmental exposure trajectories have been pre-processed and aggregated in order to avoid fine temporal and spatial granularities. Thus, individual exposure information could not disclose personal addresses.

The datasets are shared in two formats:

RDF (serialized in Turtle) modeled according to the BRAINTEASER Ontology (BTO);

CSV, as shared during the iDPP@CLEF 2022 and 2023 challenges, split into training and test.

Each format corresponds to a specific folder in the datasets, where a dedicated README file provides further details on the datasets. Note that the ALS dataset is split into multiple ZIP files due to the size of the environmental data.

The BRAINTEASER Data Sharing Policy section below reports the details for requesting access to the datasets.
d
Data from: St. Louis Geotechnical Database, v2003
datasets.ai
data.usgs.gov
+2more
55
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2023). St. Louis Geotechnical Database, v2003 [Dataset]. https://datasets.ai/datasets/st-louis-geotechnical-database-v2003
Explore at:
55Available download formats
Dataset updated
May 31, 2023
Dataset authored and provided by
Department of the Interior
Area covered
St. Louis
Description
The St. Louis area has experienced minor earthquake damage at least 12 times in the past 205 years. The St. Louis metropolitan area, with a population of about 2.8 million, faces earthquake hazard from large earthquakes in the New Madrid and Wabash Valley seismic zones, as well as a closer region of diffuse historical and prehistoric seismicity to its south and east. Also, low attenuation of seismic energy in the region and a substantial number of historic older unreinforced brick and stone buildings make the St. Louis area vulnerable to moderate earthquakes at relatively large distances compared to the western United States. This geotechnical database was compiled by James Palmer and others at the Missouri Department of Natural Resources as the product of a U.S. Geological Survey (USGS) Earthquake Hazards Program external grant through the National Earthquake Hazards Reduction Program (NEHRP) supporting urban seismic hazards mapping efforts for the St Louis metropolitan area (https://earthquake.usgs.gov/cfusion/external_grants/reports/05HQGR0019.pdf). The data in Tables.zip have been exported from the original Microsoft Access database and have been reviewed for completeness. See Appendix A in the aforementioned report for additional details. For archival purposes, the Microsoft Access database is also provided here, but the queries within have not been reviewed and the user assumes all responsibility.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenPaymentsData.cms.gov (2024). 2023 General Payment Data [Dataset]. https://healthdata.gov/CMS/2023-General-Payment-Data/rjgu-is5n

2023 General Payment Data

Explore at:

xml, xlsx, csvAvailable download formats

Dataset updated

Jul 1, 2024

Dataset provided by

Centers for Medicare & Medicaid Services

Description

All general (non-research, non-ownership related) payments from the 2023 program year [January 1 – December 31, 2023]

NOTE: This is a very large file and, depending on your network characteristics and software, may take a long time to download or fail to download. Additionally, the number of rows in the file may be larger than the maximum rows your version of Microsoft Excel supports. If you can't download the file, we recommend engaging your IT support staff. If you are able to download the file but are unable to open it in MS Excel or get a message that the data has been truncated, we recommend trying alternative programs such as MS Access, Universal Viewer, Editpad or any other software your organization has available for large datasets.

Clear search

Close search

Google apps

Main menu

2023 General Payment Data

Data from: Current and projected research data storage needs of Agricultural...

2022 General Payment Data

US Broadband Usage Across Counties

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use the US Broadband Usage Dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

2024 General Payment Data

Individuals and Households Program - Valid Registrations

MURA: MSK Xrays

2018 General Payment Data

Badger

⚙️Features

📦 Database Providers

💻 System requirements

📚Documentation

📝 Code

Dashboards

Census of Population and Housing, 2010 [United States]: Summary File 2 With...

Data from: Alaska Geochemical Database Version 2.0 (AGDB2) - Including "Best...

High Performance Computational Analysis of Large-scale Proteome Data Sets to...

2020 General Payment Data

Brown trout 29-year dataset - Glenariffe Stream, New Zealand

Description of the data and file structure

LADOT Parking Meter Occupancy - Archive

ContosoTR

Microsoft Stocks from 1986 to 2023

Standardized Precipitation Index

BRAINTEASER ALS and MS Datasets

Data from: St. Louis Geotechnical Database, v2003

2023 General Payment Data