Facebook
TwitterThe State Child Abuse and Neglect (SCAN) Policies Database, supported by the Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human services, compiles data on state definitions and policies related to the surveillance of child maltreatment incidence and associated risk and protective factors. The SCAN Policies Database is a resource for researchers, analysts, and others who are interested in examining differences in definitions and policies on child maltreatment across states. A primary use of these data is to allow researchers to link the analytic files to other data sources to address important questions about how variations in states’ definitions and policies are associated with the incidence of child maltreatment, the child welfare system response, and ultimately child safety and well-being. Other data sources that can be linked with the SCAN Policies Database include data from the National Child Abuse and Neglect Data System (NCANDS), the Adoption and Foster Care Analysis and Reporting System (AFCARS), state administrative data, and survey data. When data from the SCAN Policies Database are linked with other data sources, these data can be used to answer key research questions about how variations in definitions and policies are associated with key aspects of understanding the incidence of child abuse and neglect. The SCAN Policies Database includes state definitions and policies from the 50 states, the District of Columbia, and the Commonwealth of Puerto Rico. The data were collected from a review of statutes and state documentation between May 2019 - June 2020. Investigators: Elizabeth C. Weigensberg, PhD - Mathematica Nuzhat Islam, MS - Mathematica Jean Knab, PhD - Mathematica Mary A. Grider, MBA - Mathematica Jeremy Page, MA - Mathematica Sarah Bardin, BA - Mathematica
Facebook
TwitterThe SCAN Policies Database includes state definitions and policies from the 50 states, the District of Columbia, and the Commonwealth of Puerto Rico. The SCAN Policies Database 2019 represents data, collected, reviewed, and verified between May 2019 and July 2020, and the data reflect the state definitions and policies for the calendar year 2019. The SCAN Policies Database 2021 represents data collected, reviewed, and verified between July 2021 and January 2022, and the data reflect the state definitions and policies for the calendar year 2021. The SCAN Policies Database 2023 represents data, collected, reviewed, and verified between May 2023 and July 2024, and the data reflect the state definitions and policies for the calendar year 2023. Investigators: Elizabeth C. Weigensberg, PhD - Mathematica Nuzhat Islam, MS - Mathematica Jean Knab, PhD - Mathematica Mary A. Grider, MBA - Mathematica Jeremy Page, MA - Mathematica Sarah Bardin, BA - MathematicaAddison Larson, MS - MathematicaMilena Raketic, , M.Ed -Mathematica
Facebook
TwitterOther data sources that can be linked with the SCAN Policies Database include data from the National Child Abuse and Neglect Data System (NCANDS), the Adoption and Foster Care Analysis and Reporting System (AFCARS), state administrative data, and survey data. When data from the SCAN Policies Database are linked with other data sources, these data can be used to answer key research questions about how variations in definitions and policies are associated with key aspects of understanding the incidence of child abuse and neglect. Investigators: Elizabeth C. Weigensberg, PhD - Mathematica Nuzhat Islam, MS - Mathematica Milena Raketic, M.Ed - Mathematica Mary A. Grider, MBA - Mathematica Jeremy Page, MA - Mathematica
Facebook
TwitterThe SCAN Policies Database includes state definitions and policies from the 50 states, the District of Columbia, and the Commonwealth of Puerto Rico. The SCAN Policies Database 2021 represents data, collected, reviewed, and verified between May 2021 and July 2022, and the data reflect the state definitions and policies for the calendar year 2019. The SCAN Policies Database 2021 represents data collected, reviewed, and verified between July 2021 and January 2022, and the data reflect the state definitions and policies for the calendar year 2021. Investigators: Elizabeth C. Weigensberg, PhD - Mathematica Nuzhat Islam, MS - Mathematica Jean Knab, PhD - Mathematica Mary A. Grider, MBA - Mathematica Jeremy Page, MA - Mathematica Addison Larson, MS - Mathematica
Facebook
TwitterThe MarketScan Dental Database is a standalone product that corresponds with and is linkable to a given year and version of the IBM MarketScan Commercial Claims and Encounters Database and the MarketScan Medicare Supplemental and Coordination of Benefits Database. Currently, data is available for the years: 2005 - 2023. In order to view the MarketScan Dental user guide or data dictionary, you must have data access to this dataset.
In addition to what's on this page, we also have:
%3C!-- --%3E
%3C!-- --%3E
**Starting in 2026, there will be a data access fee for using the full dataset **(though the 1% sample will remain free to use). The pricing structure and other **relevant information can be found in this **FAQ Sheet.
All manuscripts (and other items you'd like to publish) must be submitted to
support@stanfordphs.freshdesk.com for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Data access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Facebook
TwitterCollection of neuroanatomically labeled MRI brain scans, created by neuroanatomical experts. Regions of interest include the sub-cortical structures (thalamus, caudate, putamen, hippocampus, etc), along with ventricles, brain stem, cerebellum, and gray and white matter and sub-divided cortex into parcellation units that are defined by gyral and sulcal landmarks.
Facebook
TwitterThe MarketScan Commercial Database (previously called the 'MarketScan Database') contains real-world data for healthcare research and analytics to examine health economics and treatment outcomes.
This page also contains the MarketScan Commercial Lab Database starting in 2018.
Starting in 2026, there will be a data access fee for using the full dataset. Please refer to the 'Usage Notes' section of this page for more information.
MarketScan Research Databases are a family of data sets that fully integrate many types of data for healthcare research, including:
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
The MarketScan Databases track millions of patients throughout the healthcare system. The data are contributed by large employers, managed care organizations, hospitals, EMR providers, and Medicare.
This page contains the MarketScan Commercial Database.
We also have the following on other pages:
%3C!-- --%3E
**Starting in 2026, there will be a data access fee for using the full dataset **(though the 1% sample will remain free to use). The pricing structure and other **relevant information can be found in this **FAQ Sheet.
All manuscripts (and other items you'd like to publish) must be submitted to support@stanfordphs.freshdesk.com for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Data access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Facebook
TwitterThe MarketScan Medicare Supplemental Database provides detailed cost, use and outcomes data for healthcare services performed in both inpatient and outpatient settings.
It Include Medicare Supplemental records for all years, and Medicare Advantage records starting in 2020. This page also contains the MarketScan Medicare Lab Database starting in 2018.
Starting in 2026, there will be a data access fee for using the full dataset. Please refer to the 'Usage Notes' section of this page for more information.
MarketScan Research Databases are a family of data sets that fully integrate many types of data for healthcare research, including:
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
The MarketScan Databases track millions of patients throughout the healthcare system. The data are contributed by large employers, managed care organizations, hospitals, EMR providers and Medicare.
This page contains the MarketScan Medicare Database.
We also have the following on other pages:
%3C!-- --%3E
**Starting in 2026, there will be a data access fee for using the full dataset **
(though the 1% sample will remain free to use). The pricing structure and other
**relevant information can be found in this **FAQ Sheet.
All manuscripts (and other items you'd like to publish) must be submitted to
support@stanfordphs.freshdesk.com for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Data access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on March 17, 2022. A large-scale database of genetics and genomics data associated to a web-interface and a set of methods and algorithms that can be used for mining the data in it. The database contains two categories of single nucleotide polymorphism (SNP) annotations: # Physical-based annotation where SNPs are categorized according to their position relative to genes (intronic, inter-genic, etc.) and according to linkage disequilibrium (LD) patterns (an inter-genic SNP can be annotated to a gene if it is in LD with variation in the gene). # Functional annotation where SNPs are classified according to their effects on expression levels, i.e. whether they are expression quantitative trait loci (eQTLs) for that gene. SCAN can be utilized in several ways including: (i) queries of the SNP and gene databases; (ii) analysis using the attached tools and algorithms; (iii) downloading files with SNP annotation for various GWA platforms. . eQTL files and reported GWAS from NHGRI may be downloaded., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.
Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.
Note : The TCIA team strongly encourages users to review pylidc and the Standardized representation of the TCIA LIDC-IDRI annotations using DICOM (DICOM-LIDC-IDRI-Nodules) of the annotations/segmentations included in this dataset before developing custom tools to analyze the XML version.
Facebook
TwitterPaper logs are the primary data collection tool used by observers of the Northeast Fisheries Observer Program deployed on commercial fishing vessels. After the data collected on the paper are entered into a database, the paper logs are scanned for each trip. After all trips for a calendar year are scanned, they are archived at the National Archives and Records Administration.
Facebook
TwitterMarine Trackline Geophysical data represented within the side-scan sonar data are from towed instruments closer to the seafloor that use sound to image features on the ocean floor. This technique can create shadows like shining a flashlight, which help determine size and features. This system is often used to map cultural heritage sites like shipwrecks, to characterize the makeup of the seafloor, and can even be used to help biologists identify habitats of marine animals.
Facebook
TwitterPROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
Facebook
TwitterSCAN tasks with various splits.
SCAN is a set of simple language-driven navigation tasks for studying compositional learning and zero-shot generalization.
Most splits are described at https://github.com/brendenlake/SCAN. For the MCD splits please see https://arxiv.org/abs/1912.09713.pdf.
Basic usage:
data = tfds.load('scan/length')
More advanced example:
import tensorflow_datasets as tfds
from tensorflow_datasets.datasets.scan import scan_dataset_builder
data = tfds.load(
'scan',
builder_kwargs=dict(
config=scan_dataset_builder.ScanConfig(
name='simple_p8', directory='simple_split/size_variations')))
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('scan', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterPopular DBMS, including MySQL, Postgres, MSSQL, Redis, Mongo, Oracle, ElasticSearch, Memcashed and database managers like phpMyAdmin.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 23,2022. Database / display tool of genome scans, with a web interface that lets the user view the data. It does not perform any analyses - these must be done by other software, and the results uploaded into it. The basic features of GSCANDB are: * Parallel viewing of scans for multiple phenotypes. * Parallel analyses of the same scan data. * Genome-wide views of genome scans * Chromosomal region views, with zooming * Gene and SNP Annotation is shown at high zoom levels * Haplotype block structure viewing * The positions of known Trait Loci can be overlayed and queried. * Links to Ensembl, MGI, NCBI, UCSC and other genome data browsers. In GSCANDB, a genome scan has a wide definition, including not only the usual statistical genetic measures of association between genetic variation at a series of loci and variation in a phenotype, but any quantitative measure that varies along the genome. This includes for example competitive genome hybridization data and some kinds of gene expression measurements.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Processed side scan data from Anton Dohrn and East Rockall Bank (2009_07-RVFranklin-AntonDhorn-RockallBank). The cruise 2009_03_MV_Franklin Surveyed two Areas of Search for offshore SACs Anton Dohrn and East Rockall Bank. The main aims of the survey were to acquire acoustic and photographic ground-truthing data to enable geological, geomorphological and biological characterisation of the Anton Dohrn Seamount and East Rockall Bank AoS
Facebook
Twitterhuggingface/DEH-image-scan-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Security Scan is a dataset for object detection tasks - it contains Objects annotations for 9,468 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCrab is a command line tool for Mac and Windows that scans file data into a SQLite database, so you can run SQL queries over it.
e.g. (Win) C:> crab C:\some\path\MyProject
or (Mac) $ crab /some/path/MyProject
You get a CRAB> prompt where you can enter SQL queries on the data, e.g. Count files by extension
SELECT extension, count(*)
FROM files
GROUP BY extension;
e.g. List the 5 biggest directories
SELECT parentpath, sum(bytes)/1e9 as GB
FROM files
GROUP BY parentpath
ORDER BY sum(bytes) DESC LIMIT 5;
Crab provides a virtual table, fileslines, which exposes file contents to SQL
e.g. Count TODO and FIXME entries in any .c files, recursively
SELECT fullpath, count(*) FROM fileslines
WHERE parentpath like '/Users/GN/HL3/%' and extension = '.c'
and (data like '%TODO%' or data like '%FIXME%')
GROUP BY fullpath;
As well there are functions to run programs or shell commands on any subset of files, or lines within files e.g. (Mac) unzip all the .zip files, recursively
SELECT exec('unzip', '-n', fullpath, '-d', '/Users/johnsmith/Target Dir/')
FROM files
WHERE parentpath like '/Users/johnsmith/Source Dir/%' and extension = '.zip';
(Here -n tells unzip not to overwrite anything, and -d specifies target directory)
There is also a function to write query output to file, e.g. (Win) Sort the lines of all the .txt files in a directory and write them to a new file
SELECT writeln('C:\Users\SJohnson\dictionary2.txt', data)
FROM fileslines
WHERE parentpath = 'C:\Users\SJohnson\' and extension = '.txt'
ORDER BY data;
In place of the interactive prompt you can run queries in batch mode. E.g. Here is a one-liner that returns the full path all the files in the current directory
C:> crab -batch -maxdepth 1 . "SELECT fullpath FROM files"
Crab SQL can also be used in Windows batch files, or Bash scripts, e.g. for ETL processing.
Crab is free for personal use, $5/mo commercial
See more details here (mac): [http://etia.co.uk/][1] or here (win): [http://etia.co.uk/win/about/][2]
An example SQLite database (Mac data) has been uploaded for you to play with. It includes an example files table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files.
To scan your own files, and get access to the virtual tables and support functions you have to use the Crab SQLite shell, available for download from this page (Mac): [http://etia.co.uk/download/][3] or this page (Win): [http://etia.co.uk/win/download/][4]
The FILES table contains details of every item scanned, file or directory. All columns are indexed except 'mode'
COLUMNS
fileid (int) primary key -- files table row number, a unique id for each item
name (text) -- item name e.g. 'Hei.ttf'
bytes (int) -- item size in bytes e.g. 7502752
depth (int) -- how far scan recursed to find the item, starts at 0
accessed (text) -- datetime item was accessed
modified (text) -- datetime item was modified
basename (text) -- item name without path or extension, e.g. 'Hei'
extension (text) -- item extension including the dot, e.g. '.ttf'
type (text) -- item type, 'f' for file or 'd' for directory
mode (text) -- further type info and permissions, e.g. 'drwxr-xr-x'
parentpath (text) -- absolute path of directory containing the item, e.g. '/Library/Fonts/'
fullpath (text) unique -- parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf'
PATHS
1) parentpath and fullpath don't support abbreviations such as ~ . or .. They're just strings.
2) Directory paths all have a '/' on the end.
The FILESLINES table is for querying data content of files. It has line number and data columns, with one row for each line of data in each file scanned by Crab.
This table isn't available in the example dataset, because it's a virtual table and doesn't physically contain data.
COLUMNS
linenumber (int) -- line number within file, restarts count from 1 at the first line of each file
data (text) -- data content of the files, one entry for each line
FILESLINES also duplicates the columns of the FILES table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This way you can restrict which files are searched without having to join tables.
An example SQLite database (Mac data), database.sqlite, has been uploaded for you to play with. It includes an example files table...
Facebook
TwitterThe State Child Abuse and Neglect (SCAN) Policies Database, supported by the Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human services, compiles data on state definitions and policies related to the surveillance of child maltreatment incidence and associated risk and protective factors. The SCAN Policies Database is a resource for researchers, analysts, and others who are interested in examining differences in definitions and policies on child maltreatment across states. A primary use of these data is to allow researchers to link the analytic files to other data sources to address important questions about how variations in states’ definitions and policies are associated with the incidence of child maltreatment, the child welfare system response, and ultimately child safety and well-being. Other data sources that can be linked with the SCAN Policies Database include data from the National Child Abuse and Neglect Data System (NCANDS), the Adoption and Foster Care Analysis and Reporting System (AFCARS), state administrative data, and survey data. When data from the SCAN Policies Database are linked with other data sources, these data can be used to answer key research questions about how variations in definitions and policies are associated with key aspects of understanding the incidence of child abuse and neglect. The SCAN Policies Database includes state definitions and policies from the 50 states, the District of Columbia, and the Commonwealth of Puerto Rico. The data were collected from a review of statutes and state documentation between May 2019 - June 2020. Investigators: Elizabeth C. Weigensberg, PhD - Mathematica Nuzhat Islam, MS - Mathematica Jean Knab, PhD - Mathematica Mary A. Grider, MBA - Mathematica Jeremy Page, MA - Mathematica Sarah Bardin, BA - Mathematica