42 datasets found
  1. SAS-2 Map Product Catalog - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). SAS-2 Map Product Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/sas-2-map-product-catalog
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This database is a collection of maps created from the 28 SAS-2 observation files. The original observation files can be accessed within BROWSE by changing to the SAS2RAW database. For each of the SAS-2 observation files, the analysis package FADMAP was run and the resulting maps, plus GIF images created from these maps, were collected into this database. Each map is a 60 x 60 pixel FITS format image with 1 degree pixels. The user may reconstruct any of these maps within the captive account by running FADMAP from the command line after extracting a file from within the SAS2RAW database. The parameters used for selecting data for these product map files are embedded keywords in the FITS maps themselves. These parameters are set in FADMAP, and for the maps in this database are set as 'wide open' as possible. That is, except for selecting on each of 3 energy ranges, all other FADMAP parameters were set using broad criteria. To find more information about how to run FADMAP on the raw event's file, the user can access help files within the SAS2RAW database or can use the 'fhelp' facility from the command line to gain information about FADMAP. This is a service provided by NASA HEASARC .

  2. d

    DHS data extractors for Stata

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Oster (2023). DHS data extractors for Stata [Dataset]. http://doi.org/10.7910/DVN/RRX3QD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Emily Oster
    Description

    This package contains two files designed to help read individual level DHS data into Stata. The first file addresses the problem that versions of Stata before Version 7/SE will read in only up to 2047 variables and most of the individual files have more variables than that. The file will read in the .do, .dct and .dat file and output new .do and .dct files with only a subset of the variables specified by the user. The second file deals with earlier DHS surveys in which .do and .dct file do not exist and only .sps and .sas files are provided. The file will read in the .sas and .sps files and output a .dct and .do file. If necessary the first file can then be run again to select a subset of variables.

  3. d

    SAS-2 Map Product Catalog

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    High Energy Astrophysics Science Archive Research Center (2025). SAS-2 Map Product Catalog [Dataset]. https://catalog.data.gov/dataset/sas-2-map-product-catalog
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    High Energy Astrophysics Science Archive Research Center
    Description

    This database is a collection of maps created from the 28 SAS-2 observation files. The original observation files can be accessed within BROWSE by changing to the SAS2RAW database. For each of the SAS-2 observation files, the analysis package FADMAP was run and the resulting maps, plus GIF images created from these maps, were collected into this database. Each map is a 60 x 60 pixel FITS format image with 1 degree pixels. The user may reconstruct any of these maps within the captive account by running FADMAP from the command line after extracting a file from within the SAS2RAW database. The parameters used for selecting data for these product map files are embedded keywords in the FITS maps themselves. These parameters are set in FADMAP, and for the maps in this database are set as 'wide open' as possible. That is, except for selecting on each of 3 energy ranges, all other FADMAP parameters were set using broad criteria. To find more information about how to run FADMAP on the raw event's file, the user can access help files within the SAS2RAW database or can use the 'fhelp' facility from the command line to gain information about FADMAP. This is a service provided by NASA HEASARC .

  4. SAS-2 Photon Events Catalog - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). SAS-2 Photon Events Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/sas-2-photon-events-catalog
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The SAS2RAW database is a log of the 28 SAS-2 observation intervals and contains target names, sky coordinates start times and other information for all 13056 photons detected by SAS-2. The original data came from 2 sources. The photon information was obtained from the Event Encyclopedia, and the exposures were derived from the original "Orbit Attitude Live Time" (OALT) tapes stored at NASA/GSFC. These data sets were combined into FITS format images at HEASARC. The images were formed by making the center pixel of a 512 x 512 pixel image correspond to the RA and DEC given in the event file. Each photon's RA and DEC was converted to a relative pixel in the image. This was done by using Aitoff projections. All the raw data from the original SAS-2 binary data files are now stored in 28 FITS files. These images can be accessed and plotted using XIMAGE and other columns of the FITS file extensions can be plotted with the FTOOL FPLOT. This is a service provided by NASA HEASARC .

  5. d

    Data from: Delta Neighborhood Physical Activity Study

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.

  6. H

    Current Population Survey (CPS)

    • dataverse.harvard.edu
    • search.dataone.org
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  7. d

    Patient-reported outcomes via electronic health record portal vs. telephone:...

    • datadryad.org
    • search.dataone.org
    • +2more
    zip
    Updated Oct 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heidi Munger Clary; Beverly Snively (2022). Patient-reported outcomes via electronic health record portal vs. telephone: process and retention data in a pilot trial of anxiety or depression symptoms in epilepsy [Dataset]. http://doi.org/10.5061/dryad.qz612jmk3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Dryad
    Authors
    Heidi Munger Clary; Beverly Snively
    Time period covered
    Oct 5, 2022
    Description

    The dataset was collected via a combination of the following: 1. manual extraction of EHR-based data followed by entry into REDCap and then analysis and further processing in SAS 9.4; 2. Data pull of Epic EHR-based data from Clarity database using standard programming techniques, followed by processing in SAS 9.4 and merging with data from REDCap; 3. Collection of data directly from participants via telephone with entry into REDCap and further processing in SAS 9.4; 4. Collection of process measures from study team tracking records followed by entry into REDCap and further processing in SAS 9.4. One file in the dataset contains aggregate data generated following merging of Clarity data pull-origin dataset with a REDCap dataset and further manual processing. Recruitment for the randomized trial began at an epilepsy clinic visit, with EHR-embedded validated anxiety and depression instruments, followed by automated EHR-based research screening consent and eligibility assessment. Full...

  8. d

    Editing EU-SILC UDB Longitudinal Data for Differential Mortality Analyses....

    • demo-b2find.dkrz.de
    Updated Sep 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Editing EU-SILC UDB Longitudinal Data for Differential Mortality Analyses. SAS code and documentation. - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/da423f51-0a3c-540f-8ee8-830d0c9e9ef0
    Explore at:
    Dataset updated
    Sep 22, 2025
    Description

    This SAS code extracts data from EU-SILC User Database (UDB) longitudinal files and edits it such that a file is produced that can be further used for differential mortality analyses. Information from the original D, R, H and P files is merged per person and possibly pooled over several longitudinal data releases. Vital status information is extracted from target variables DB110 and RB110, and time at risk between the first interview and either death or censoring is estimated based on quarterly date information. Apart from path specifications, the SAS code consists of several SAS macros. Two of them require parameter specification from the user. The other ones are just executed. The code was written in Base SAS, Version 9.4. By default, the output file contains several variables which are necessary for differential mortality analyses, such as sex, age, country, year of first interview, and vital status information. In addition, the user may specify the analytical variables by which mortality risk should be compared later, for example educational level or occupational class. These analytical variables may be measured either at the first interview (the baseline) or at the last interview of a respondent. The output file is available in SAS format and by default also in csv format.

  9. d

    Health and Retirement Study (HRS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

  10. Kaggle Datasets - Summary, Topics, Classification

    • kaggle.com
    zip
    Updated Nov 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Marsh (2020). Kaggle Datasets - Summary, Topics, Classification [Dataset]. https://www.kaggle.com/datasets/katherinemarsh/kaggle-datasets-summary-topics-classification
    Explore at:
    zip(273449 bytes)Available download formats
    Dataset updated
    Nov 16, 2020
    Authors
    Katherine Marsh
    Description

    Context

    Companies and individuals are storing increasingly more data digitally; however, much of the data is unused because it is unclassified. How many times have you opened your downloads folder, found a file you downloaded a year ago and you have no idea what the contents are? You can read through those files individually but imagine doing that for thousands of files. All that raw data in storage facilities create data lakes. As the amount of data grows and the complexity rises, data lakes become data swamps. The potentially valuable and interesting datasets will likely remain unused. Our tool addresses the need to classify these large pools of data in a visually effective and succinct manner by identifying keywords in datasets, and classifying datasets into a consistent taxonomy.

    The files listed within kaggleDatasetSummaryTopicsClassification.csv have been processed with our tool to generate the keywords and taxonomic classification as seen below. The summaries are not generated from our system. Instead they were retrieved from user input as they uploaded the files on Kaggle. We planned to utilize these summaries to create an NLG model to generate summaries from any input file. Unfortunately we were not able to collect enough data to build a good model. Hopefully the data within this set might help future users achieve that goal.

    Acknowledgements

    Developed with Senior Design Center at NC State in collaboration with SAS. Senior Design Team: Tanya Chu, Katherine Marsh, Nikhil Milind, Anna Owens SAS Representatives: : Nancy Rausch, Marty Warner, Brant Kay, Tyler Wendell, JP Trawinski

  11. Emerging Pathogens Initiative (EPI)

    • catalog.data.gov
    • data.va.gov
    • +1more
    Updated Aug 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2025). Emerging Pathogens Initiative (EPI) [Dataset]. https://catalog.data.gov/dataset/emerging-pathogens-initiative-epi
    Explore at:
    Dataset updated
    Aug 2, 2025
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    The Emerging Pathogens Initiative (EPI) database contains emerging pathogens information from the local Veterans Affairs Medical Centers (VAMCs). The EPI software package allows the VA to track emerging pathogens on the national level without additional data entry at the local level. The results from aggregation of data can be shared with the appropriate public health authorities including non-VA and the private health care sector allowing national planning, formulation of intervention strategies, and resource allocations. EPI is designed to automatically collect data on emerging diseases for Veterans Affairs Central Office (VACO) to analyze. The data is sent to the Austin Information Technology Center (AITC) from all Veterans Health Information Systems and Technology Architecture (VistA) systems for initial processing and combination with related workload data. VACO data retrieval and analysis is then carried out. The AITC creates two file structures both in Statistical Analysis Software (SAS) file format, which are used as a source of data for the Veterans Affairs Headquarters (VAHQ) Infectious Diseases Program Office. These files are manipulated and used for analysis and reporting by the National Infectious Diseases Service. Emerging Pathogens (as characterized by VACO) act as triggers for data acquisition activities in the automated program. The system retrieves relevant, predetermined, patient-specific information in the form of a Health Level Seven (HL7) message that is transmitted to the central data repository at the AITC. Once at that location, the data is converted to a SAS dataset for analysis by the VACO National Infectious Diseases Service. Before data transmission an Emerging Pathogens Verification Report is produced for the local sites to review, verify, and make corrections as needed. After data transmission to the AITC it is added to the EPI database.

  12. CDC Dataset

    • kaggle.com
    zip
    Updated Jun 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayur Bhorkar (2024). CDC Dataset [Dataset]. https://www.kaggle.com/datasets/mayurbhorkar/cdc-dataset/data
    Explore at:
    zip(1218211538 bytes)Available download formats
    Dataset updated
    Jun 22, 2024
    Authors
    Mayur Bhorkar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The data source for this project work is a large collection of raw data publicly available on CDC website. “CDC is the nation’s leading science-based, data-driven, service organisation that protects the public’s health. In 1984, the CDC established the Behavioral Risk Factor Surveillance System (BRFSS). The BRFSS is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviours, chronic health conditions, and use of preventive services.” (CDC - BRFSS Annual Survey Data, 2020).

    I have referred to a set of data collected between the years 2005 and 2021 and it contains more than 7 Million records in total (7,143,987 to be exact). For each year there are around 300 to 400 features available in the dataset, but not all of them are needed for this project, as some of them are irrelevant to my work. I have shortlisted a total of 22 features which are relevant for designing and developing my ML models and I have explained them in detail in the below table.

    The codebook link (of the year 2021) explains below columns in more details - https://www.cdc.gov/brfss/annual_data/2021/pdf/codebook21_llcp-v2-508.pdf

    All datasets are obtained from CDC website wherein they are available in Zip format containing a SAS format file with .xpt extension. So, I downloaded all the zip files, extracted them and then converted each one of them into a .csv format so I could easily fetch the records in my project code. I used below command in Anaconda Prompt to convert .xpt extension file into .csv extension file,

    C:\users\mayur\Downloads> python -m xport LLCP2020.xpt > LLCP2020.csv

  13. f

    Data from: Metadata and data files supporting the related article:...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Nov 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palakal, Maya; Vacek, Pamela M.; Malkov, Serghei; Sherman, Mark E.; Wang, Jeff; Mullooly, Maeve; Fan, Bo; Weaver, Donald L.; Hewitt, Stephen; Bejnordi, Babak Ehteshami; Gierach, Gretchen L.; Karssemeijer, Nico; van der Laak, Jeroen; Pfeiffer, Ruth M.; Shepherd, John A.; Sprague, Brian L.; Fan, Shaoqi; Beck, Andrew; Johnson, Jason M.; Hada, Manila; Mahmoudzadeh, Amir Pasha; Brinton, Louise A.; Herschorn, Sally D. (2019). Metadata and data files supporting the related article: Application of convolutional neural networks to breast biopsies to delineate tissue correlates of mammographic breast density [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000171150
    Explore at:
    Dataset updated
    Nov 19, 2019
    Authors
    Palakal, Maya; Vacek, Pamela M.; Malkov, Serghei; Sherman, Mark E.; Wang, Jeff; Mullooly, Maeve; Fan, Bo; Weaver, Donald L.; Hewitt, Stephen; Bejnordi, Babak Ehteshami; Gierach, Gretchen L.; Karssemeijer, Nico; van der Laak, Jeroen; Pfeiffer, Ruth M.; Shepherd, John A.; Sprague, Brian L.; Fan, Shaoqi; Beck, Andrew; Johnson, Jason M.; Hada, Manila; Mahmoudzadeh, Amir Pasha; Brinton, Louise A.; Herschorn, Sally D.
    Description

    Breast density is a radiologic feature that reflects fibroglandular tissue content relative to breast area or volume, and it is a breast cancer risk factor. This study employed deep learning approaches to identify histologic correlates in radiologically-guided biopsies that may underlie breast density and distinguish cancer among women with elevated and low density. Data access: Datasets supporting figure 2, tables 2 and 3 and supplementary table 2 of the published article are publicly available in the figshare repository, as part of this data record (https://doi.org/10.6084/m9.figshare.9786152). These datasets are contained in the zip file NPJ FigShare.zip. Datasets supporting figure 3, table 1 and supplementary table 1 of the published article are not publicly available to protect patient privacy, but can be made available on request from Dr. Gretchen L. Gierach, Senior Investigator, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA, email address: gierachg@mail.nih.gov. Study description and aims: The study aimed to identify tissue correlates of breast density that may be important for distinguishing malignant from benign biopsy diagnoses separately among women with high and low breast density, to help inform cancer risk stratification among women undergoing a biopsy following an abnormal mammogram. Haematoxylin and eosin (H&E)-stained digitized images from image-guided breast biopsies (n=852 patients) were evaluated. Breast density was assessed as global and localized fibroglandular volume (%). A convolutional neural network characterized H&E composition. 37 features were extracted from the network output, describing tissue quantities and morphological structure. A random forest regression model was trained to identify correlates most predictive of fibroglandular volume (n=588). Correlations between predicted and radiologically quantified fibroglandular volume were assessed in 264 independent patients. A second random forest classifier was trained to predict diagnosis (invasive vs. benign); performance was assessed using area under receiver-operating characteristics curves (AUC). For more details on the methodology please see the published article. Study approval: The Institutional Review Boards at the NCI and the University of Vermont approved the protocol for this project for either active consenting or a waiver of consent to enrol participants, link data and perform analytical studies. Dataset descriptions: Data supporting figure 2: Datasets Figure 2A H&E.jpg, Figure 2A Mammogram.jpg, Figure 2B H&E.jpg and Figure 2B Mammogram.jpg are in .jpg file format and consist of histological whole slide H&E images and corresponding full-field digital mammograms from patients whose biopsies yielded diagnoses of atypical ductal hyperplasia and invasive carcinoma. Data supporting figure 3: Dataset Figure 3.xls is in .xls file format and contains raw data used to generate the Receiver Operating Characteristic (ROC) curves for the prediction of invasive cancer among women with high percent global fibroglandular volume, low percent global fibroglandular volume, high percent localized fibroglandular volume and low percent localized fibroglandular volume. Data supporting table 1: Dataset Table1_analysis.sas7bdat is in SAS file format and contains the characteristics of study participants in the BREAST Stamp Project, who were referred for an image-guided breast biopsy, stratified by the training and testing sets (n = 852). Data supporting table 2: Datasets Global FGV.xls (accompanying Global FGV.png file) and Localized FGV.xls (accompanying Localized FGV.png file) are in .xls file format and the accompanying files are in .png file format. The data contain histologic features identified in the random forest model for the prediction of global and localized % fibroglandular volume. Data supporting table 3: Datasets HighGlobal_feature_importance.xls, HighGlobal_feature_importance.pdf, HighLocal_feature_importance.xls, HighLocal_feature_importance.pdf, LowGlobal_feature_importance.xls, LowGlobal_feature_importance.pdf, LowLocal_feature_importance.xls, LowLocal_feature_importance.pdf are in .xls file format. The accompanying figures generated from the data in the .xls files are in .pdf file format. These files contain histologic features identified in the random forest model for the prediction of invasive cancer status among women with high vs. low % fibroglandular volume. Data supporting supplementary table 1: Datasets testfeatures.xls and trainfeatures.xls are in .xls file format and include the distribution and description of the 37 histologic features extracted from the convolutional neural network deep learning output in the H&E stained whole slide images from the training and testing sets. Data supporting supplementary table 2: Datasets All_samples_global.xls, All_samples_global.png, All_samples_local.xls, All_samples_local.png, PostMeno_global.xls, PostMeno_global.png, PostMeno_local.xls, PostMeno_local.png, PreMeno_global.xls, PreMeno_global.png, PreMeno_local.xls, PreMeno_local.png are in .xls file format. The accompanying figures generated from the data in the .xls files are in .png file format. These data include the histologic features identified in the random forest model that included BMI for the prediction of global and localized % fibroglandular volume.Software needed to access the data: Data files in SAS file format require the SAS software to be accessed.

  14. Code for merging National Neighborhood Data Archive ZCTA level datasets with...

    • linkagelibrary.icpsr.umich.edu
    Updated Oct 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan Chenoweth; Anam Khan (2020). Code for merging National Neighborhood Data Archive ZCTA level datasets with the UDS Mapper ZIP code to ZCTA crosswalk [Dataset]. http://doi.org/10.3886/E124461V4
    Explore at:
    Dataset updated
    Oct 15, 2020
    Dataset provided by
    University of Michigan. Institute for Social Research
    Authors
    Megan Chenoweth; Anam Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The sample SAS and Stata code provided here is intended for use with certain datasets in the National Neighborhood Data Archive (NaNDA). NaNDA (https://www.openicpsr.org/openicpsr/nanda) contains some datasets that measure neighborhood context at the ZIP Code Tabulation Area (ZCTA) level. They are intended for use with survey or other individual-level data containing ZIP codes. Because ZIP codes do not exactly match ZIP code tabulation areas, a crosswalk is required to use ZIP-code-level geocoded datasets with ZCTA-level datasets from NaNDA. A ZIP-code-to-ZCTA crosswalk was previously available on the UDS Mapper website, which is no longer active. An archived copy of the ZIP-code-to-ZCTA crosswalk file has been included here. Sample SAS and Stata code are provided for merging the UDS mapper crosswalk with NaNDA datasets.

  15. 2019 Farm to School Census v2

    • agdatacommons.nal.usda.gov
    xlsx
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Food and Nutrition Service, Office of Policy Support (2025). 2019 Farm to School Census v2 [Dataset]. http://doi.org/10.15482/USDA.ADC/1523106
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Food and Nutrition Servicehttps://www.fns.usda.gov/
    United States Department of Agriculturehttp://usda.gov/
    Authors
    USDA Food and Nutrition Service, Office of Policy Support
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Note: This version supersedes version 1: https://doi.org/10.15482/USDA.ADC/1522654. In Fall of 2019 the USDA Food and Nutrition Service (FNS) conducted the third Farm to School Census. The 2019 Census was sent via email to 18,832 school food authorities (SFAs) including all public, private, and charter SFAs, as well as residential care institutions, participating in the National School Lunch Program. The questionnaire collected data on local food purchasing, edible school gardens, other farm to school activities and policies, and evidence of economic and nutritional impacts of participating in farm to school activities. A total of 12,634 SFAs completed usable responses to the 2019 Census. Version 2 adds the weight variable, “nrweight”, which is the Non-response weight. Processing methods and equipment used The 2019 Census was administered solely via the web. The study team cleaned the raw data to ensure the data were as correct, complete, and consistent as possible. This process involved examining the data for logical errors, contacting SFAs and consulting official records to update some implausible values, and setting the remaining implausible values to missing. The study team linked the 2019 Census data to information from the National Center of Education Statistics (NCES) Common Core of Data (CCD). Records from the CCD were used to construct a measure of urbanicity, which classifies the area in which schools are located. Study date(s) and duration Data collection occurred from September 9 to December 31, 2019. Questions asked about activities prior to, during and after SY 2018-19. The 2019 Census asked SFAs whether they currently participated in, had ever participated in or planned to participate in any of 30 farm to school activities. An SFA that participated in any of the defined activities in the 2018-19 school year received further questions. Study spatial scale (size of replicates and spatial scale of study area) Respondents to the survey included SFAs from all 50 States as well as American Samoa, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and Washington, DC. Level of true replication Unknown Sampling precision (within-replicate sampling or pseudoreplication) No sampling was involved in the collection of this data. Level of subsampling (number and repeat or within-replicate sampling) No sampling was involved in the collection of this data. Study design (before–after, control–impacts, time series, before–after-control–impacts) None – Non-experimental Description of any data manipulation, modeling, or statistical analysis undertaken Each entry in the dataset contains SFA-level responses to the Census questionnaire for SFAs that responded. This file includes information from only SFAs that clicked “Submit” on the questionnaire. (The dataset used to create the 2019 Farm to School Census Report includes additional SFAs that answered enough questions for their response to be considered usable.) In addition, the file contains constructed variables used for analytic purposes. The file does not include weights created to produce national estimates for the 2019 Farm to School Census Report. The dataset identified SFAs, but to protect individual privacy the file does not include any information for the individual who completed the questionnaire. Description of any gaps in the data or other limiting factors See the full 2019 Farm to School Census Report [https://www.fns.usda.gov/cfs/farm-school-census-and-comprehensive-review] for a detailed explanation of the study’s limitations. Outcome measurement methods and equipment used None Resources in this dataset:Resource Title: 2019 Farm to School Codebook with Weights. File Name: Codebook_Update_02SEP21.xlsxResource Description: 2019 Farm to School Codebook with WeightsResource Title: 2019 Farm to School Data with Weights CSV. File Name: census2019_public_use_with_weight.csvResource Description: 2019 Farm to School Data with Weights CSVResource Title: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets. File Name: Farm_to_School_Data_AgDataCommons_SAS_SPSS_R_STATA_with_weight.zipResource Description: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets

  16. p

    Core Service Review - Quantitative Data - Dataset - CKAN

    • ckan0.cf.opendata.inter.prod-toronto.ca
    Updated Jul 25, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Core Service Review - Quantitative Data - Dataset - CKAN [Dataset]. https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/core-service-review-quantitative-data
    Explore at:
    Dataset updated
    Jul 25, 2011
    Description

    To address Toronto's 2012 budget gap of $774 million, City Council has launched a review of all of its services and implemented a multi-year financial planning process. This data set contains the responses to the multiple- choice questions on the Core Services Review Public Consultation Feedback Form from members of the public. Approximately 13,000 responses were received (full and partial). The consultation was held between May 11 and June 17, 2011. As a public consultation, respondents chose to participate, and chose which questions to answer. This produced a self-selected sample of respondents. The majority of the responses were from City of Toronto residents. There were some responses from GTA residents. City staff reviewed the data and removed personal information and input violating city policies (for example, contravenes the City's current anti-discrimination policy or confidentiality policy). The .SAV file may be viewed with Statistics software such as SPSS or SAS.

  17. g

    SAS-2 Photon Events Catalog | gimi9.com

    • gimi9.com
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). SAS-2 Photon Events Catalog | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_sas-2-photon-events-catalog/
    Explore at:
    Dataset updated
    Feb 1, 2001
    Description

    The SAS2RAW database is a log of the 28 SAS-2 observation intervals and contains target names, sky coordinates start times and other information for all 13056 photons detected by SAS-2. The original data came from 2 sources. The photon information was obtained from the Event Encyclopedia, and the exposures were derived from the original "Orbit Attitude Live Time" (OALT) tapes stored at NASA/GSFC. These data sets were combined into FITS format images at HEASARC. The images were formed by making the center pixel of a 512 x 512 pixel image correspond to the RA and DEC given in the event file. Each photon's RA and DEC was converted to a relative pixel in the image. This was done by using Aitoff projections. All the raw data from the original SAS-2 binary data files are now stored in 28 FITS files. These images can be accessed and plotted using XIMAGE and other columns of the FITS file extensions can be plotted with the FTOOL FPLOT. This is a service provided by NASA HEASARC .

  18. d

    Data from: The Bronson Files, Dataset 4, Field 105, 2013

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). The Bronson Files, Dataset 4, Field 105, 2013 [Dataset]. https://catalog.data.gov/dataset/the-bronson-files-dataset-4-field-105-2013-a71c6
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Dr. Kevin Bronson provides this unique nitrogen and water management in wheat agricultural research dataset for compute. Ten irrigation treatments from a linear sprinkler were combined with nitrogen treatments. This dataset includes notation of field events and operations, an intermediate analysis mega-table of correlated and calculated parameters, including laboratory analysis results generated during the experimentation, plus high resolution plot level intermediate data tables of SAS process output, as well as the complete raw sensors records and logger outputs. This data was collected during the beginning time period of our USDA Maricopa terrestrial proximal high-throughput plant phenotyping tri-metric method generation, where a 5Hz crop canopy height, temperature and spectral signature are recorded coincident to indicate a plant health status. In this early development period, our Proximal Sensing Cart Mark1 (PSCM1) platform supplants people carrying the CropCircle (CC) sensors, and with an improved view mechanical performance result. See included README file for operational details and further description of the measured data signals. Summary: Active optical proximal wheat canopy sensing spatial data and including additional related metrics such as thermal are presented. Agronomic nitrogen and irrigation management related field operations are listed. Unique research experimentation intermediate analysis table is made available, along with raw data. The raw data recordings, and annotated table outputs with calculated VIs are made available. Plot polygon coordinate designations allow a re-intersection spatial analysis. Data was collected in the 2013 season at Maricopa Agricultural Center, Arizona, USA. High throughput proximal plant phenotyping via electronic sampling and data processing method approach is exampled. Acquired data using USDA Maricopa first mobile platforms, such as the Proximal Sensing Cart Mark 1, where the first cluster sensor bracket design and rickshaw inspired operator's handle were successfully employed. SAS and GIS compute processing output tables, including Excel formatted examples are presented, where intermediate data tabulation and analysis is available. The weekly proximal sensing data collected include canopy reflectance at six wavelengths, ultrasonic distance sensing of canopy height, and infrared thermometry. Ten levels gradient irrigation application from linear move sprinkler system were applied. Soil physical texture and fertility chemistry results are available. Yield and seed information is presented.

  19. H

    Documentation of Minnesota farmland value calculations for 2021

    • dataverse.harvard.edu
    • dataone.org
    Updated Feb 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William F Lazarus (2022). Documentation of Minnesota farmland value calculations for 2021 [Dataset]. http://doi.org/10.7910/DVN/GBQINF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    William F Lazarus
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/GBQINFhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/GBQINF

    Area covered
    Minnesota
    Description

    Documentation (Word file), SAS 9.4 program files, Excel spreadsheets, HTML, GIF, and PDFs used in generating a staff paper and a web-based database of Minnesota farmland sales prices and acreages by township for 2021. If you don't have SAS and would like to view the .sas program files, one approach is to make a copy of the file, rename it with a .txt extension, and open it in Notepad. The SAS database files can also be exported using R if you don't have SAS.

  20. Z

    Dataset used in the paper "Merge-and-Shrink Heuristics for Classical...

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Mar 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sievers, Silvan (2021). Dataset used in the paper "Merge-and-Shrink Heuristics for Classical Planning: Efficient Implementation and Partial Abstractions" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_1290643
    Explore at:
    Dataset updated
    Mar 12, 2021
    Dataset provided by
    University of Basel
    Authors
    Sievers, Silvan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all raw and processed data used in the paper. It has been generated using Downward-Lab (see https://doi.org/10.5281/zenodo.399255).

    Directories without the "-eval" ending contain raw data, distributed over a subdirectory for each experiment. Each of these contain a subdirectory tree structure "runs-*" where each planner run has its own directory. For each run, there are the input PDDL files, domain.pddl and problem.pddl, the compressed output as generated by the translator component of Fast Downward (output.sas.xz), the run log file "run.log" (stdout), possibly also a run error file "run.err" (stderr), the run script "run" used to start the experiment, and a "properties" file that contains data parsed from the log file(s).

    Directories with the "-eval" ending contain a "properties" file, which contains a JSON directory with combined data of all runs of the corresponding experiment. In essence, the properties file is the union over all properties files generated for each individual planner run.

    To process the data further, we used the scripts available in the software bundle of the paper: https://doi.org/10.5281/zenodo.1290524

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nasa.gov (2025). SAS-2 Map Product Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/sas-2-map-product-catalog
Organization logo

SAS-2 Map Product Catalog - Dataset - NASA Open Data Portal

Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description

This database is a collection of maps created from the 28 SAS-2 observation files. The original observation files can be accessed within BROWSE by changing to the SAS2RAW database. For each of the SAS-2 observation files, the analysis package FADMAP was run and the resulting maps, plus GIF images created from these maps, were collected into this database. Each map is a 60 x 60 pixel FITS format image with 1 degree pixels. The user may reconstruct any of these maps within the captive account by running FADMAP from the command line after extracting a file from within the SAS2RAW database. The parameters used for selecting data for these product map files are embedded keywords in the FITS maps themselves. These parameters are set in FADMAP, and for the maps in this database are set as 'wide open' as possible. That is, except for selecting on each of 3 energy ranges, all other FADMAP parameters were set using broad criteria. To find more information about how to run FADMAP on the raw event's file, the user can access help files within the SAS2RAW database or can use the 'fhelp' facility from the command line to gain information about FADMAP. This is a service provided by NASA HEASARC .

Search
Clear search
Close search
Google apps
Main menu