Master Beneficiary Summary Files (MBSF)
This dataset page includes some of the tables from the Medicare Data in PHS's possession. Other Medicare tables are included on other dataset pages on the PHS Data Portal. Depending upon your research question and your DUA with CMS, you may only need tables from a subset of the Medicare dataset pages, or you may need tables from all of them.
The location of each of the Medicare tables (i.e. a chart of which tables are included in each Medicare dataset page) is shown here.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
The Dockerfiles dataset is a set of approximately 130,000 Dockerfiles extracted in early summer 2018 across a sampling of search prefixes. This dataset is released under an MIT license
$ find data -type f -name Dockerfile | wc -l
129,519
The files are hosted as public images on Docker Hub and thus freely available for download and parsing.
The files are currently provided in their raw format,
each named Dockerfile
under an organization by the Docker Hub username. For example, here is the top level of folders under "data" in the repository:
data
├── 0
├── 1
├── 2
├── 3
├── 4
├── 5
├── 6
├── 7
├── 8
├── 9
├── a
├── b
├── c
...
├── w
├── x
├── y
└── z
36 directories, 0 files
and within each, we have folders that represent Docker Hub usernames:
data/a
├── a13r
├── a13xx
├── a1exanderjung
...
├── azuresdk
├── azzanatsu
└── azzra
And then each Dockerhub username has subfolders with container names, and the subfolders contain the Dockerfiles (no pun intended).
data/a/a13r
├── waecm-2018-group-16-bsp-1-backend
│ └── Dockerfile
├── waecm-2018-group-16-bsp-1-frontend
│ └── Dockerfile
└── waecm-2018-group-16-bsp-1-revproxy
└── Dockerfile
Since this dataset (despite the huge number of files!) fits still in a Github repository, the files are provided as is under version control, and don't require any special downloading aside from cloning the repo, or downloading the archive.
git clone https://www.github.com/vsoch/datasets
wget https://github.com/vsoch/dockerfiles/archive/1.0.0.zip
wget https://github.com/vsoch/dockerfiles/archive/1.0.0.tar.gz
Thanks for reading! If you have other questions, or want help for your project, please don't hesitate to reach out. If the dataset is useful to you, we have a Zenodo reference:
Many of the same questions about signatures of software can be tested or generally relevant for this dataset. Additionally, we might ask the following:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Master Beneficiary Summary Files (MBSF)
This dataset page includes some of the tables from the Medicare Data in PHS's possession. Other Medicare tables are included on other dataset pages on the PHS Data Portal. Depending upon your research question and your DUA with CMS, you may only need tables from a subset of the Medicare dataset pages, or you may need tables from all of them.
The location of each of the Medicare tables (i.e. a chart of which tables are included in each Medicare dataset page) is shown here.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.