6 datasets found

m
World’s Top 2% of Scientists list by Stanford University: An Analysis of its...
data.mendeley.com
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JOHN Philip (2023). World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness [Dataset]. http://doi.org/10.17632/td6tdp4m6t.1
Explore at:
Unique identifier
https://doi.org/10.17632/td6tdp4m6t.1
Dataset updated
Nov 17, 2023
Authors
JOHN Philip
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
John Ioannidis and co-authors [1] created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered.

To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below: Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names[1]. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2.
In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.
Medicare 20% [2006-2018] Enrollment/Summary (MBSF)
redivis.com
application/jsonl +7
Updated Dec 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2021). Medicare 20% [2006-2018] Enrollment/Summary (MBSF) [Dataset]. http://doi.org/10.57761/wnn9-b060
Explore at:
avro, spss, sas, application/jsonl, csv, arrow, parquet, stataAvailable download formats
Unique identifier
https://doi.org/10.57761/wnn9-b060
Dataset updated
Dec 17, 2021
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 1999 - Dec 31, 2018
Description
Abstract

Master Beneficiary Summary Files (MBSF)

Usage

This dataset page includes some of the tables from the Medicare Data in PHS's possession. Other Medicare tables are included on other dataset pages on the PHS Data Portal. Depending upon your research question and your DUA with CMS, you may only need tables from a subset of the Medicare dataset pages, or you may need tables from all of them.

The location of each of the Medicare tables (i.e. a chart of which tables are included in each Medicare dataset page) is shown here.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Metadata access is required to view this section.

Section 2

Metadata access is required to view this section.

Usage Notes

Metadata access is required to view this section.
Medicare RIF 20%
redivis.com
application/jsonl +7
Updated Apr 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2018). Medicare RIF 20% [Dataset]. http://doi.org/10.57761/2g49-b240
Explore at:
application/jsonl, stata, parquet, spss, avro, arrow, sas, csvAvailable download formats
Unique identifier
https://doi.org/10.57761/2g49-b240
Dataset updated
Apr 9, 2018
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

Stanford has a 20% sample of CMS data. These data are hosted on our secure servers and can only be accessed after completing a reuse application with CMS. You can explore these data using our CMS Public files which have no restrictions.

Documentation

**A checklist for the steps in gaining access to the CMS RIF 20% sample can be found here: **CMS RIF 20% Sample Access Checklist

**ResDAC has full and current **CMS File Availability and Documentation

The Stanford Center for Population Health Sciences has purchased a 20% sample (linked) of all records from for the files as listed below. Where available, we have purchased all data from 2006 – 2018, though for some files all years are not available. We have the following files and years. N/A indicates that we have not purchased these files.

Medicare Claims Inpatient: N/A Outpatient: 2006-2018 SNF: N/A Hospice: 2006-2018 Home Health: 2006-2018 Carrier: 2006-2018 DMERC: 2006-2018

Part D Event with actual Prescriber/Pharmacy identifiersDrug Characteristics: 2006-2018 Prescriber Characteristics File: N/A Formulary File: 2010-2018 Plan Characteristics Files: 2006-2018

MEDPAR All (SS/LS/SNF): 2006-2018

Enrollment/Summary FilesMaster Beneficiary Summary File: All years. Base Beneficiary Summary File A/B/C/D: 2006-2018 Chronic Conditions: 2006-2018 Cost & Utilization: 2006-2018 Other Chronic or Potentially Disabling Conditions: 2006-2018 National Death Index: N/A EDB User View: Current Vital Status File: Current

MiscellaneousMDPPAS: 2008-2018
Medicare 20% [2019-2020] Enrollment/Summary
redivis.com
stanford.redivis.com
application/jsonl +7
Updated Jul 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2023). Medicare 20% [2019-2020] Enrollment/Summary [Dataset]. http://doi.org/10.57761/xg2t-1343
Explore at:
avro, arrow, application/jsonl, parquet, spss, sas, csv, stataAvailable download formats
Unique identifier
https://doi.org/10.57761/xg2t-1343
Dataset updated
Jul 27, 2023
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Feb 11, 1815 - Dec 31, 2020
Description
Usage

This dataset page includes some of the tables from the Medicare Data in PHS's possession. Other Medicare tables are included on other dataset pages on the PHS Data Portal. Depending upon your research question and your DUA with CMS, you may only need tables from a subset of the Medicare dataset pages, or you may need tables from all of them.

The location of each of the Medicare tables (i.e. a chart of which tables are included in each Medicare dataset page) is shown here.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Metadata access is required to view this section.
Dockerfiles
kaggle.com
Updated Jun 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Research Computing Center (2018). Dockerfiles [Dataset]. https://www.kaggle.com/datasets/stanfordcompute/dockerfiles
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 22, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Stanford Research Computing Center
Description
Context

The Dockerfiles dataset is a set of approximately 130,000 Dockerfiles extracted in early summer 2018 across a sampling of search prefixes. This dataset is released under an MIT license

$ find data -type f -name Dockerfile | wc -l 129,519

The files are hosted as public images on Docker Hub and thus freely available for download and parsing.

Content

The files are currently provided in their raw format, each named Dockerfile under an organization by the Docker Hub username. For example, here is the top level of folders under "data" in the repository:

data ├── 0 ├── 1 ├── 2 ├── 3 ├── 4 ├── 5 ├── 6 ├── 7 ├── 8 ├── 9 ├── a ├── b ├── c ... ├── w ├── x ├── y └── z 36 directories, 0 files

and within each, we have folders that represent Docker Hub usernames:

data/a ├── a13r ├── a13xx ├── a1exanderjung ... ├── azuresdk ├── azzanatsu └── azzra

And then each Dockerhub username has subfolders with container names, and the subfolders contain the Dockerfiles (no pun intended).

data/a/a13r ├── waecm-2018-group-16-bsp-1-backend │ └── Dockerfile ├── waecm-2018-group-16-bsp-1-frontend │ └── Dockerfile └── waecm-2018-group-16-bsp-1-revproxy └── Dockerfile

Download

Since this dataset (despite the huge number of files!) fits still in a Github repository, the files are provided as is under version control, and don't require any special downloading aside from cloning the repo, or downloading the archive.

git clone https://www.github.com/vsoch/datasets wget https://github.com/vsoch/dockerfiles/archive/1.0.0.zip wget https://github.com/vsoch/dockerfiles/archive/1.0.0.tar.gz

Acknowledgements

Thanks for reading! If you have other questions, or want help for your project, please don't hesitate to reach out. If the dataset is useful to you, we have a Zenodo reference:

Inspiration

Many of the same questions about signatures of software can be tested or generally relevant for this dataset. Additionally, we might ask the following:

How do containers relate (or inherit) from one another? For example, if we use the FROM statements to build a graph, what interesting things do we find?

What are signatures (of installation routines?) common across different containers?

Can we classify different operating systems, domains of science, or package manages?

Resources

Github

Issue Board

Dinosaur Datasets

Background on Dinosaur Datasets

Stanford Research Computing
US ZIP codes to CBSA
redivis.com
application/jsonl +7
Updated Dec 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2019). US ZIP codes to CBSA [Dataset]. http://doi.org/10.57761/mk9y-ty94
Explore at:
arrow, application/jsonl, stata, parquet, avro, spss, csv, sasAvailable download formats
Unique identifier
https://doi.org/10.57761/mk9y-ty94
Dataset updated
Dec 2, 2019
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 2010 - Apr 1, 2019
Description
Abstract

A crosswalk matching US ZIP codes to corresponding CBSA (core-based statistical area)

Documentation

The denominators used to calculate the address ratios are the ZIP code totals. When a ZIP is split by any of the other geographies, that ZIP code is duplicated in the crosswalk file.

**Example: **ZIP code 03870 is split by two different Census tracts, 33015066000 and 33015071000, which appear in the tract column. The ratio of residential addresses in the first ZIP-Tract record to the total number of residential addresses in the ZIP code is .0042 (.42%). The remaining residential addresses in that ZIP (99.58%) fall into the second ZIP-Tract record.

So, for example, if one wanted to allocate data from ZIP code 03870 to each Census tract located in that ZIP code, one would multiply the number of observations in the ZIP code by the residential ratio for each tract associated with that ZIP code.

https://redivis.com/fileUploads/4ecb405e-f533-4a5b-8286-11e56bb93368%3E" alt="">(Note that the sum of each ratio column for each distinct ZIP code may not always equal 1.00 (or 100%) due to rounding issues.)

CBSA definition

A core-based statistical area (CBSA) is a U.S. geographic area defined by the Office of Management and Budget (OMB) that consists of one or more counties (or equivalents) anchored by an urban center of at least 10,000 people plus adjacent counties that are socioeconomically tied to the urban center by commuting. Areas defined on the basis of these standards applied to Census 2000 data were announced by OMB in June 2003. These standards are used to replace the definitions of metropolitan areas that were defined in 1990. The OMB released new standards based on the 2010 Census on July 15, 2015.

Further reading

The following article demonstrates how to more effectively use the U.S. Department of Housing and Urban Development (HUD) United States Postal Service ZIP Code Crosswalk Files when working with disparate geographies.

Wilson, Ron and Din, Alexander, 2018. “Understanding and Enhancing the U.S. Department of Housing and Urban Development’s ZIP Code Crosswalk Files,” Cityscape: A Journal of Policy Development and Research, Volume 20 Number 2, 277 – 294. URL: https://www.huduser.gov/portal/periodicals/cityscpe/vol20num2/ch16.pdf

Contact authors

Questions regarding these crosswalk files can be directed to Alex Din with the subject line HUD-Crosswalks.

Acknowledgement

This dataset is taken from the U.S. Department of Housing and Urban Development (HUD) office: https://www.huduser.gov/portal/datasets/usps_crosswalk.html#codebook
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

JOHN Philip (2023). World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness [Dataset]. http://doi.org/10.17632/td6tdp4m6t.1

World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness

Explore at:

Unique identifier

https://doi.org/10.17632/td6tdp4m6t.1

Dataset updated

Nov 17, 2023

Authors

JOHN Philip

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

John Ioannidis and co-authors [1] created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered.

To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below: Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names[1]. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2.
In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.

Clear search

Close search

Google apps

Main menu

World’s Top 2% of Scientists list by Stanford University: An Analysis of its...

Medicare 20% [2006-2018] Enrollment/Summary (MBSF)

Abstract

Usage

Before Manuscript Submission

Documentation

Section 2

Usage Notes

Medicare RIF 20%

Abstract

Documentation

Medicare 20% [2019-2020] Enrollment/Summary

Usage

Before Manuscript Submission

Documentation

Dockerfiles

Context

Content

Download

Acknowledgements

Inspiration

Resources

US ZIP codes to CBSA

Abstract

Documentation

World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness