A catalog of high-value public science and research data sets from across the Federal Government.
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
The comprehensive wealth accounts database provides a stock measure in total and per capita values, in both real and nominal terms. The wealth accounts were updated in 2024, using a new methodology described in The Changing Wealth of Nations 2024.
Catalog of high value data inventories produced by Connecticut executive branch agencies and compiled by the Office of Policy and Management. This catalog contains information on high value GIS data only. A catalog of high value non-GIS data may be found at the following link: https://data.ct.gov/Government/CT-Data-Catalog-Non-GIS-/ghmx-93jn
As required by Public Act 18-175, executive branch agencies must annually conduct a high value data inventory to capture information about the high value data that they collect.
High value data is defined as any data that the department head determines (A) is critical to the operation of an executive branch agency; (B) can increase executive branch agency accountability and responsiveness; (C) can improve public knowledge of the executive branch agency and its operations; (D) can further the core mission of the executive branch agency; (E) can create economic opportunity; (F) is frequently requested by the public; (G) responds to a need and demand as identified by the agency through public consultation; or (H) is used to satisfy any legislative or other reporting requirements.
This dataset was last updated 1/2/2019 and will continue to be updated as high value data inventories are submitted to OPM.
The Virginia Open Data Portal provides more than just data access. Within the portal, you can view stories and dashboards, create visualizations, filter data, and access it via APIs (application programming interfaces) to build solutions in web and mobile applications. The Commonwealth of Virginia is committed to continue growing the number of open datasets available through the portal facilitating public participation and engagement. We hope you enjoy using the portal and invite you to share your thoughts with us by clicking on the 'Contact' link at the top of the page.
A data catalog site is a portal site that provides a data catalog (a directory or index of data). It allows searching using metadata (data attributes and descriptive information; specifically, title, URL, data format, creator, etc.). Currently, the data published is mainly statistical information and geospatial information, and the number of data is still small, but we plan to expand it sequentially. Translated from Japanese Original Text: データカタログサイトとは、データカタログ(データの目録・索引)を提供するポータルサイトのことです。 メタデータ(データの属性・説明情報。具体的には、タイトル・URL・データ形式・作成者等)による検索が可能です。 現在公開しているデータは、統計情報や地理空間情報などが中心でまだまだデータ数が少ないですが、順次拡大していく予定です。
National statistical systems are facing significant challenges. These challenges arise from increasing demands for high quality and trustworthy data to guide decision making, coupled with the rapidly changing landscape of the data revolution. To help create a mechanism for learning amongst national statistical systems, the World Bank has developed improved Statistical Performance Indicators (SPI) to monitor the statistical performance of countries. The SPI focuses on five key dimensions of a country’s statistical performance: (i) data use, (ii) data services, (iii) data products, (iv) data sources, and (v) data infrastructure. This will replace the Statistical Capacity Index (SCI) that the World Bank has regularly published since 2004. The SPI focus on five key pillars of a country’s statistical performance: (i) data use, (ii) data services, (iii) data products, (iv) data sources, and (v) data infrastructure. The SPI are composed of more than 50 indicators and contain data for 174 countries. This set of countries covers 99.2 percent of the world population. The data extend from 2016-2019, with some indicators going back to 2004.
The Global Data Regulation Diagnostic provides a comprehensive assessment of the quality of the data governance environment. Diagnostic results show that countries have put in greater effort in adopting enabler regulatory practices than in safeguard regulatory practices. However, for public intent data, enablers for private intent data, safeguards for personal and nonpersonal data, cybersecurity and cybercrime, as well as cross-border data flows. Across all these dimensions, no income group demonstrates advanced regulatory frameworks across all dimensions, indicating significant room for the regulatory development of both enablers and safeguards remains at an intermediate stage: 47 percent of enabler good practices and 41 percent of good safeguard practices are adopted across countries. Under the enabler and safeguard pillars, the diagnostic covers dimensions of e-commerce/e-transactions, enablers further improvement on data governance environment.
The Global Data Regulation Diagnostic is the first comprehensive assessment of laws and regulations on data governance. It covers enabler and safeguard regulatory practices in 80 countries providing indicators to assess and compare their performance. This Global Data Regulation Diagnostic develops objective and standardized indicators to measure the regulatory environment for the data economy across countries. The indicators aim to serve as a diagnostic tool so countries can assess and compare their performance vis-á-vis other countries. Understanding the gap with global regulatory good practices is a necessary first step for governments when identifying and prioritizing reforms.
80 countries
Country
Observation data/ratings [obs]
The diagnostic is based on a detailed assessment of domestic laws, regulations, and administrative requirements in 80 countries selected to ensure a balanced coverage across income groups, regions, and different levels of digital technology development. Data are further verified through a detailed desk research of legal texts, reflecting the regulatory status of each country as of June 1, 2020.
Mail Questionnaire [mail]
The questionnaire comprises 37 questions designed to determine if a country has adopted good regulatory practice on data governance. The responses are then scored and assigned a normative interpretation. Related questions fall into seven clusters so that when the scores are averaged, each cluster provides an overall sense of how it performs in its corresponding regulatory and legal dimensions. These seven dimensions are: (1) E-commerce/e-transaction; (2) Enablers for public intent data; (3) Enablers for private intent data; (4) Safeguards for personal data; (5) Safeguards for nonpersonal data; (6) Cybersecurity and cybercrime; (7) Cross-border data transfers.
100%
A searchable data catalog that facilitates researchers'' access to large datasets available either publicly or through institutional or individual licensing. Dataset records include information about the content of the dataset, how to access the dataset, and local experts within NYULMC and NYU to assist in the use of these datasets. The data catalog will expand to include internally generated datasets from NYULMC and NYU in the near future. Use the contact form if you are interested in submitting a dataset to the data catalog.
TCGA Acute Myeloid Leukemia. Source data from GDAC Firehose. Previously known as TCGA Provisional. This dataset contains summary data visualizations and clinical data from a broad sampling of 200 carcinomas from 200 patients. The data was gathered as part of the Broad Institute of MIT and Harvard Firehose initiative, a cancer analysis pipeline. The clinical data includes mutation count, information about mutated genes, patient demographics, sample type, disease code, Abnormal Lymphocyte Percent, Atra Exposure, Basophils Cell Count, Blast Count, Cytogenetic abnormality type, and FAB. The dataset includes Next-Generation Clustered Heat Maps (NG-CHM) viewable via an embedded NG-CHM Heat Map Viewer, provided my MD Anderson Cancer Center, which provides a graphical environment for exploration of clustered or non-clustered heat map data. The data set also includes copy-number segment data downloadable as .seg files and viewable via the Integrative Genomics Viewer.
https://catalog.dvrpc.org/dvrpc_data_license.htmlhttps://catalog.dvrpc.org/dvrpc_data_license.html
Commute mode is tracked by the American Community Survey (ACS) by asking respondents to provide the means of transportation usually used to travel the longest distance to work the prior week. A follow-up question asks about vehicle occupancy when "car, truck, van" is selected. This dataset tracks the sum of all individuals not selecting "car, truck, van" with one person in it. Transportation professionals often group travel modes into "single-occupancy vehicles" (SOV) and "non-single-occupancy vehicles" (non-SOV) because SOVs are a less efficient use of roadway and environmental resources. It also shows the share of modes that are classified as non-SOV.
This data catalog is provided by the Northern Gulf of Alaska, Long Term Ecological Research (NGA LTER) project.
About data.overheid.nl - National data portal Welcome to the National Data Portal of the Dutch government: data.overheid.nl. On this portal you will find data made available by the Dutch government. Facts and figures More than 180 government organizations have published data on data.overheid.nl. Almost all datasets are updated every night. The DCAT standard is used as the standard for exchanging descriptions of datasets and data services (metadata). Datasets are published via CKAN. The portal is maintained by the Knowledge and Exploitation Centre for Official Government Publications (External link)(KOOP) on behalf of the Ministry of the Interior and Kingdom Relations. Data.overheid.nl is a register and offers assistance with opening and reusing government data. More information about the policy in the Netherlands with regard to open data can be found here. In addition, information on closed data is also discoverable. Data.overheid.nl guides government organizations in making available data open. We also support re-users in finding specific open datasets. Data that are not (yet) available as open data can be made available as open data through a data request. Translated from Dutch Original Text: Over data.overheid.nl - Nationale dataportaal Welkom op het Nationale Dataportaal van de Nederlandse overheid: data.overheid.nl. Op dit portaal vind je door de Nederlandse overheid beschikbaar gestelde data. Feiten en cijfers Meer dan 180 overheidsorganisaties hebben data op data.overheid.nl gepubliceerd. Vrijwel alle datasets worden iedere nacht geupdate. De DCAT-standaard wordt gebruikt als standaard om beschrijvingen over datasets en dataservices (metadata) met elkaar uit te wisselen. Datasets worden via CKAN gepubliceerd. Het portaal wordt onderhouden door het Kennis- en Exploitatiecentrum Officiële Overheidspublicaties (Externe link)(KOOP) in opdracht van het Ministerie van Binnenlandse Zaken en Koninkrijksrelaties. Data.overheid.nl is een register en biedt hulp bij het openen en hergebruik van data van de overheid. Meer informatie over het beleid in Nederland met betrekking tot open data kun je hier vinden. Daarnaast is vindbaar welke gesloten data er is. Data.overheid.nl begeleidt overheidsorganisaties bij het openstellen van beschikbare data. Ook ondersteunen we hergebruikers bij het vinden van specifieke open datasets. Data die (nog) niet als open data beschikbaar zijn, kunnen door middel van een dataverzoek als open data beschikbaar komen.
USGS data and tools are the digital information in a format suitable for direct input to software that can analyze its meaning in the scientific, engineering, or business context for which the data were collected.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The New Mexico Bureau of Geology and Mineral Resources (NMBGMR) Pecos Valley Pilot - WaterSMART data inventory provides a comprehensive catalog of 6,911 current and historical groundwater level, precipitation, stream gage, and water quality monitoring locations from a variety of organizations throughout the Pecos Valley region in New Mexico. Compiled by the ISC - Pecos Bureau, this inventory documents precise geographic coordinates, monitoring types, and data access URLs for groundwater level stations linked to the New Mexico Office of the State Engineer's database. Each entry includes location-specific metadata and notes referencing the Seven Rivers area, serving as an essential reference tool for researchers, water managers, and policymakers working on groundwater assessment and management in the Pecos Valley watershed.
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
The Global Database on Intergenerational Mobility (GDIM) contains estimates of intergenerational mobility (IGM) in education by 10-year cohorts, covering individuals born between 1940 and 1989. IGM is the extent to which living standards of a generation are higher than those of their parents or the extent to which an individual’s position on the socio-economic scale is independent of the position of his or her parents.
https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
Packet headers (upto transport layer, inclusive) for Anonymized Internet Traces 2016 Dataset. Derived from OC192 traces on Equinix San Jose and Chicago monitors.
Description from EGA:
"The dataset for Genome-wide cell-free DNA fragmentation in patients with cancer includes 538 bam files from whole genome next-generation sequencing on the Illumina HiSeq2500. The samples analyzed include plasma samples from healthy individuals and patients with cancer."
https://catalog.dvrpc.org/dvrpc_data_license.htmlhttps://catalog.dvrpc.org/dvrpc_data_license.html
The federal Clean Water Act was established to restore and maintain the chemical, physical, and biological integrity of the nation's waters. Water quality standards have been established by federal and state governments to ensure that waterbodies attain their designated uses. Designated uses include human uses and ecological conditions: general aquatic life, trout, recreation, drinking water supply, industrial water supply, agricultural water supply, shellfish harvesting, and fish consumption.
As mandated by the Clean Water Act, surface water quality in all states is monitored and assessed every two years. During this time, government-employed scientists take samples of water at various waterbody sites and test them to determine whether or not that waterbody has attained its designated use(s). The designated use of general aquatic life is the most indicative of overall surface water quality and is the most comprehensively monitored across the region. Therefore, aquatic life is used as the indicator of regional water quality.
Water quality in Pennsylvania is assessed based on stream segments. Attainment (or lack of attainment) is determined by analyzing the health of aquatic macroinvertebrates (i.e. insect larvae, crayfish, clams, snails, worms) present in the stream. Pennsylvania's Department of Environmental Protection's (PADEP) assessment plan covers the entire state in 10-year increments. Interim evaluations are performed using targeted sampling in each of the state's major subwatersheds every two years. New Jersey Department of Environmental Protection (NJDEP), on the other hand, assigns attainment or lack of attainment to entire subwatersheds (land area). Similar to PADEP, this determination is based on in-stream sampling of macroinvertebrates. New Jersey's most recent report for 2014 is based on data collected between 2008 and 2012.
Since the two states do not report water quality data using the same criteria (stream miles in Pennsylvania versus acres of subwatershed in New Jersey), the percentage of non-attaining water(s) in each state is taken according to its preferred unit, and then the two percentages are averaged together to obtain a regional value.
Polygon coverage and shapefile depicting the outline of the Sufco mine located in the southern Wasatch Plateau.
Description from GitHub:
"GenomicRanges-based classes for representing, querying, and manipulating matrices of genomic data (ie a map of pairs of genomic coordinates to a numeric value). Applications include visualization and analysis of Hi-C contact maps, barcode overlap in 10X, microhomology heatmaps, pairwise LD, and epistasis."
A catalog of high-value public science and research data sets from across the Federal Government.