100+ datasets found
  1. d

    Incomplete Published Data Assets Report

    • datasets.ai
    • s.cnmilf.com
    • +1more
    Updated Aug 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Homeland Security (2024). Incomplete Published Data Assets Report [Dataset]. https://datasets.ai/datasets/incomplete-published-data-assets-report
    Explore at:
    Dataset updated
    Aug 26, 2024
    Dataset authored and provided by
    Department of Homeland Security
    Description

    Displays incomplete Published data assets. This report can be used to help improve the Data Asset Completeness score from the Enterprise Data Management (EDM) Scorecard by identifying which missing fields are required for completeness.

  2. f

    Data from: Robust multivariate mixture regression models with incomplete...

    • tandf.figshare.com
    • datasetcatalog.nlm.nih.gov
    text/x-tex
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hwa Kyung Lim; Naveen N. Narisetty; Sooyoung Cheon (2023). Robust multivariate mixture regression models with incomplete data [Dataset]. http://doi.org/10.6084/m9.figshare.3491345.v1
    Explore at:
    text/x-texAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Hwa Kyung Lim; Naveen N. Narisetty; Sooyoung Cheon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.

  3. A dataset from a survey investigating disciplinary differences in data...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, pdf, txt
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. http://doi.org/10.5281/zenodo.7555363
    Explore at:
    csv, txt, pdf, binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GENERAL INFORMATION

    Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

    Date of data collection: January to March 2022

    Collection instrument: SurveyMonkey

    Funding: Alfred P. Sloan Foundation


    SHARING/ACCESS INFORMATION

    Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

    Links to publications that cite or use the data:

    Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

    Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
    A survey investigating disciplinary differences in data citation.
    Zenodo. https://doi.org/10.5281/zenodo.7555266


    DATA & FILE OVERVIEW

    File List

    • Filename: MDCDatacitationReuse2021Codebook.pdf
      Codebook
    • Filename: MDCDataCitationReuse2021surveydata.csv
      Dataset format in csv
    • Filename: MDCDataCitationReuse2021surveydata.sav
      Dataset format in SPSS
    • Filename: MDCDataCitationReuseSurvey2021QNR.pdf
      Questionnaire

    Additional related data collected that was not included in the current data package: Open ended questions asked to respondents


    METHODOLOGICAL INFORMATION

    Description of methods used for collection/generation of data:

    The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

    Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

    Methods for processing the data:

    Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

    Instrument- or software-specific information needed to interpret the data:

    The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.


    DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

    Number of variables: 94

    Number of cases/rows: 2,492

    Missing data codes: 999 Not asked

    Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.

  4. f

    Data from: Archetypal Analysis With Missing Data: See All Samples by Looking...

    • tandf.figshare.com
    application/x-rar
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Epifanio; M. Victoria Ibáñez; Amelia Simó (2023). Archetypal Analysis With Missing Data: See All Samples by Looking at a Few Based on Extreme Profiles [Dataset]. http://doi.org/10.6084/m9.figshare.7445378.v2
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Irene Epifanio; M. Victoria Ibáñez; Amelia Simó
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this article, we propose several methodologies for handling missing or incomplete data in archetype analysis (AA) and archetypoid analysis (ADA). AA seeks to find archetypes, which are convex combinations of data points, and to approximate the samples as mixtures of those archetypes. In ADA, the representative archetypal data belong to the sample, that is, they are actual data points. With the proposed procedures, missing data are not discarded or previously filled by imputation and the theoretical properties regarding location of archetypes are guaranteed, unlike the previous approaches. The new procedures adapt the AA algorithm either by considering the missing values in the computation of the solution or by skipping them. In the first case, the solutions of previous approaches are modified to fulfill the theory and a new procedure is proposed, where the missing values are updated by the fitted values. In this second case, the procedure is based on the estimation of dissimilarities between samples and the projection of these dissimilarities in a new space, where AA or ADA is applied, and those results are used to provide a solution in the original space. A comparative analysis is carried out in a simulation study, with favorable results. The methodology is also applied to two real datasets: a well-known climate dataset and a global development dataset. We illustrate how these unsupervised methodologies allow complex data to be understood, even by nonexperts. Supplementary materials for this article are available online.

  5. ARCHIVED: COVID-19 Testing by Race/Ethnicity Over Time

    • healthdata.gov
    • data.sfgov.org
    • +1more
    application/rdfxml +5
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). ARCHIVED: COVID-19 Testing by Race/Ethnicity Over Time [Dataset]. https://healthdata.gov/dataset/ARCHIVED-COVID-19-Testing-by-Race-Ethnicity-Over-T/ntmc-mxb8
    Explore at:
    tsv, csv, json, application/rssxml, application/rdfxml, xmlAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    data.sfgov.org
    Description

    A. SUMMARY This dataset includes San Francisco COVID-19 tests by race/ethnicity and by date. This dataset represents the daily count of tests collected, and the breakdown of test results (positive, negative, or indeterminate). Tests in this dataset include all those collected from persons who listed San Francisco as their home address at the time of testing. It also includes tests that were collected by San Francisco providers for persons who were missing a locating address. This dataset does not include tests for residents listing a locating address outside of San Francisco, even if they were tested in San Francisco.

    The data were de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected). If a person tested multiple times on the same date, only one test is included from that date. When there are multiple tests on the same date, a positive result, if one exists, will always be selected as the record for the person. If a PCR and antigen test are taken on the same day, the PCR test will supersede. If a person tests multiple times on the same day and the results are all the same (e.g. all negative or all positive) then the first test done is selected as the record for the person.

    The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco.

    When a person gets tested for COVID-19, they may be asked to report information about themselves. One piece of information that might be requested is a person's race and ethnicity. These data are often incomplete in the laboratory and provider reports of the test results sent to the health department. The data can be missing or incomplete for several possible reasons:

    • The person was not asked about their race and ethnicity.
    • The person was asked, but refused to answer.
    • The person answered, but the testing provider did not include the person's answers in the reports.
    • The testing provider reported the person's answers in a format that could not be used by the health department.
    

    For any of these reasons, a person's race/ethnicity will be recorded in the dataset as “Unknown.”

    B. NOTE ON RACE/ETHNICITY The different values for Race/Ethnicity in this dataset are "Asian;" "Black or African American;" "Hispanic or Latino/a, all races;" "American Indian or Alaska Native;" "Native Hawaiian or Other Pacific Islander;" "White;" "Multi-racial;" "Other;" and “Unknown."

    The Race/Ethnicity categorization increases data clarity by emulating the methodology used by the U.S. Census in the American Community Survey. Specifically, persons who identify as "Asian," "Black or African American," "American Indian or Alaska Native," "Native Hawaiian or Other Pacific Islander," "White," "Multi-racial," or "Other" do NOT include any person who identified as Hispanic/Latino at any time in their testing reports that either (1) identified them as SF residents or (2) as someone who tested without a locating address by an SF provider. All persons across all races who identify as Hispanic/Latino are recorded as “"Hispanic or Latino/a, all races." This categorization increases data accuracy by correcting the way “Other” persons were counted. Previously, when a person reported “Other” for Race/Ethnicity, they would be recorded “Unknown.” Under the new categorization, they are counted as “Other” and are distinct from “Unknown.”

    If a person records their race/ethnicity as “Asian,” “Black or African American,” “American Indian or Alaska Native,” “Native Hawaiian or Other Pacific Islander,” “White,” or “Other” for their first COVID-19 test, then this data will not change—even if a different race/ethnicity is reported for this person for any future COVID-19 test. There are two exceptions to this rule. The first exception is if a person’s race/ethnicity value i

  6. G

    AI-Generated Synthetic Tabular Dataset Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). AI-Generated Synthetic Tabular Dataset Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-generated-synthetic-tabular-dataset-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI-Generated Synthetic Tabular Dataset Market Outlook



    According to our latest research, the AI-Generated Synthetic Tabular Dataset market size reached USD 1.42 billion in 2024 globally, reflecting the rapid adoption of artificial intelligence-driven data generation solutions across numerous industries. The market is expected to expand at a robust CAGR of 34.7% from 2025 to 2033, reaching a forecasted value of USD 19.17 billion by 2033. This exceptional growth is primarily driven by the increasing need for high-quality, privacy-preserving datasets for analytics, model training, and regulatory compliance, particularly in sectors with stringent data privacy requirements.




    One of the principal growth factors propelling the AI-Generated Synthetic Tabular Dataset market is the escalating demand for data-driven innovation amidst tightening data privacy regulations. Organizations across healthcare, finance, and government sectors are facing mounting challenges in accessing and sharing real-world data due to GDPR, HIPAA, and other global privacy laws. Synthetic data, generated by advanced AI algorithms, offers a solution by mimicking the statistical properties of real datasets without exposing sensitive information. This enables organizations to accelerate AI and machine learning development, conduct robust analytics, and facilitate collaborative research without risking data breaches or non-compliance. The growing sophistication of generative models, such as GANs and VAEs, has further increased confidence in the utility and realism of synthetic tabular data, fueling adoption across both large enterprises and research institutions.




    Another significant driver is the surge in digital transformation initiatives and the proliferation of AI and machine learning applications across industries. As businesses strive to leverage predictive analytics, automation, and intelligent decision-making, the need for large, diverse, and high-quality datasets has become paramount. However, real-world data is often siloed, incomplete, or inaccessible due to privacy concerns. AI-generated synthetic tabular datasets bridge this gap by providing scalable, customizable, and bias-mitigated data for model training and validation. This not only accelerates AI deployment but also enhances model robustness and generalizability. The flexibility of synthetic data generation platforms, which can simulate rare events and edge cases, is particularly valuable in sectors like finance and healthcare, where such scenarios are underrepresented in real datasets but critical for risk assessment and decision support.




    The rapid evolution of the AI-Generated Synthetic Tabular Dataset market is also underpinned by technological advancements and growing investments in AI infrastructure. The availability of cloud-based synthetic data generation platforms, coupled with advancements in natural language processing and tabular data modeling, has democratized access to synthetic datasets for organizations of all sizes. Strategic partnerships between technology providers, research institutions, and regulatory bodies are fostering innovation and establishing best practices for synthetic data quality, utility, and governance. Furthermore, the integration of synthetic data solutions with existing data management and analytics ecosystems is streamlining workflows and reducing barriers to adoption, thereby accelerating market growth.




    Regionally, North America dominates the AI-Generated Synthetic Tabular Dataset market, accounting for the largest share in 2024 due to the presence of leading AI technology firms, strong regulatory frameworks, and early adoption across industries. Europe follows closely, driven by stringent data protection laws and a vibrant research ecosystem. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digitalization, government initiatives, and increasing investments in AI research and development. Latin America and the Middle East & Africa are also witnessing growing interest, particularly in sectors like finance and government, though market maturity varies across countries. The regional landscape is expected to evolve dynamically as regulatory harmonization, cross-border data collaboration, and technological advancements continue to shape market trajectories globally.



  7. z

    Data from: Incomplete specimens in geometric morphometric analyses

    • zenodo.org
    • explore.openaire.eu
    • +1more
    Updated Oct 11, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arbour, Jessica H.; Brown, Caleb M. (2014). Data from: Incomplete specimens in geometric morphometric analyses [Dataset]. http://doi.org/10.5061/dryad.mp713
    Explore at:
    Dataset updated
    Oct 11, 2014
    Dataset provided by
    University of Toronto
    Authors
    Arbour, Jessica H.; Brown, Caleb M.
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    1.The analysis of morphological diversity frequently relies on the use of multivariate methods for characterizing biological shape. However, many of these methods are intolerant of missing data, which can limit the use of rare taxa and hinder the study of broad patterns of ecological diversity and morphological evolution. This study applied a mutli-dataset approach to compare variation in missing data estimation and its effect on geometric morphometric analysis across taxonomically-variable groups, landmark position and sample sizes. 2.Missing morphometric landmark data was simulated from five real, complete datasets, including modern fish, primates and extinct theropod dinosaurs. Missing landmarks were then estimated using several standard approaches and a geometric-morphometric-specific method. The accuracy of missing data estimation was determined for each estimation method, landmark position, and morphological dataset. Procrustes superimposition was used to compare the eigenvectors and principal component scores of a geometric morphometric analysis of the original landmark data, to datasets with A) missing values estimated, or B) simulated incomplete specimens excluded, for varying levels of specimens incompleteness and sample sizes. 3.Standard estimation techniques were more reliable estimators and had lower impacts on morphometric analysis compared to a geometric-morphometric-specific estimator. For most datasets and estimation techniques, estimating missing data produced a better fit to the structure of the original data than exclusion of incomplete specimens, and this was maintained even at considerably reduced sample sizes. The impact of missing data on geometric morphometric analysis was disproportionately affected by the most fragmentary specimens. 4.Missing data estimation was influenced by variability of specific anatomical features, and may be improved by a better understanding of shape variation present in a dataset. Our results suggest that the inclusion of incomplete specimens through the use of effective missing data estimators better reflects the patterns of shape variation within a dataset than using only complete specimens, however the effectiveness of missing data estimation can be maximized by excluding only the most incomplete specimens. It is advised that missing data estimators be evaluated for each dataset and landmark independently, as the effectiveness of estimators can vary strongly and unpredictably between different taxa and structures.

  8. D

    Campaign Finance - Transactions

    • data.sfgov.org
    • s.cnmilf.com
    • +1more
    csv, xlsx, xml
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Campaign Finance - Transactions [Dataset]. https://data.sfgov.org/City-Management-and-Ethics/Campaign-Finance-Transactions/pitq-e56w
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Oct 8, 2025
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    A. SUMMARY Transactions from FPPC Forms 460, 461, 496, 497, and 450. This dataset combines all schedules, pages, and includes unitemized totals. Only transactions from the "most recent" version of a filing (original/amendment) appear here.

    B. HOW THE DATASET IS CREATED Committees file campaign statements with the Ethics Commission on a periodic basis. Those statements are stored with the Commission's data provider. Data is generally presented as-filed by committees.

    If a committee files an amendment, the data from that filing completely replaces the original and any prior amendments in the filing sequence.

    C. UPDATE PROCESS Each night starting at midnight Pacific time a script runs to check for new filings with the Commission's database, and updates this dataset with transactions from new filings. The update process can take a variable amount of time to complete. Viewing or downloading this dataset while the update is running may result in incomplete data, therefore it is highly recommended to view or download this data before midnight or after 8am.

    During the update, some fields are copied from the Filings dataset into this dataset for viewing convenience. The copy process may occasionally fail for some transactions due to timing issues but should self-correct the following day. Transactions with a blank 'Filing Id Number' or 'Filing Date' field are such transactions, but can be joined with the appropriate record using the 'Filing Activity Nid' field shared between Filing and Transaction datasets.

    D. HOW TO USE THIS DATASET
    Transactions from rejected filings are not included in this dataset. Transactions from many different FPPC forms and schedules are combined in this dataset, refer to the column "Form Type" to differentiate transaction types. Properties suffixed with "-nid" can be used to join the data between Filers, Filings, and Transaction datasets. Refer to the Ethics Commission's webpage for more information. Fppc Form460 is organized into Schedules as follows:

    • A: Monetary Contributions Received
    • B1: Loans Received
    • B2: Loan Guarantors
    • C: Nonmonetary Contributions Received
    • D: Summary of Expenditures Supporting/Opposing Other Candidates, Measures and Committees
    • E: Payments Made
    • F: Accrued Expenses (Unpaid Bills)
    • G: Payments Made by an Agent or Independent Contractor (on Behalf of This Committee)
    • H: Loans Made to Others
    • I: Miscellaneous Increases to Cash

    RELATED DATASETS

  9. S

    Global scientific academies Dataset

    • scidb.cn
    Updated Nov 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chen xiaoli (2024). Global scientific academies Dataset [Dataset]. http://doi.org/10.57760/sciencedb.14674
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2024
    Dataset provided by
    Science Data Bank
    Authors
    chen xiaoli
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset was generated as part of the study aimed at profiling global scientific academies, which play a significant role in promoting scholarly communication and scientific progress. Below is a detailed description of the dataset:Data Generation Procedures and Tools: The dataset was compiled using a combination of web scraping, manual verification, and data integration from multiple sources, including Wikipedia categories,member of union of scientific organizations, and web searches using specific query phrases (e.g., "country name + (academy OR society) AND site:.country code"). The records were enriched by cross-referencing data from the Wikidata API, the VIAF API, and the Research Organisation Registry (ROR). Additional manual curation ensured accuracy and consistency.Temporal and Geographical Scopes: The dataset covers scientific academies from a wide temporal scope, ranging from the 15th century to the present. The geographical scope includes academies from all continents, with emphasis on both developed and post-developing countries. The dataset aims to capture the full spectrum of scientific academies across different periods of historical development.Tabular Data Description: The dataset comprises a total of 301 academy records and 14,008 website navigation sections. Each row in the dataset represents a single scientific academy, while the columns describe attributes such as the academy’s name, founding date, location (city and country), website URL, email, and address.Missing Data: Although the dataset offers comprehensive coverage, some entries may have missing or incomplete fields. For instance, section was not available for all records.Data Errors and Error Ranges: The data has been verified through manual curation, reducing the likelihood of errors. However, the use of crowd-sourced data from platforms like Wikipedia introduces potential risks of outdated or incomplete information. Any errors are likely minor and confined to fields such as navigation menu classifications, which may not fully reflect the breadth of an academy's activities.Data Files, Formats, and Sizes: The dataset is provided in CSV format and JSON format, ensuring compatibility with a wide range of software applications, including Microsoft Excel, Google Sheets, and programming languages such as Python (via libraries like pandas).This dataset provides a valuable resource for further research into the organizational behaviors, geographic distribution, and historical significance of scientific academies across the globe. It can be used for large-scale analyses, including comparative studies across different regions or time periods.Any feedback on the data is welcome! Please contact the maintaner of the dataset!If you use the data, please cite the following paper:Xiaoli Chen and Xuezhao Wang. 2024. Profiling Global Scientific Academies. In The 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’24), December 16–20, 2024, Hong Kong, China. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3677389.3702582

  10. A

    ‘Store Transaction data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Store Transaction data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-store-transaction-data-ccd5/dd67e58b/?iid=035-749&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Store Transaction data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamprateek/store-transaction-data on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    Nielsen receives transaction level scanning data (POS Data) from its partner stores on a regular basis. Stores sharing POS data include bigger format store types such as supermarkets, hypermarkets as well as smaller traditional trade grocery stores (Kirana stores), medical stores etc. using a POS machine.

    While in a bigger format store, all items for all transactions are scanned using a POS machine, smaller and more localized shops do not have a 100% compliance rate in terms of scanning and inputting information into the POS machine for all transactions.

    A transaction involving a single packet of chips or a single piece of candy may not be scanned and recorded to spare customer the inconvenience or during rush hours when the store is crowded with customers.

    Thus, the data received from such stores is often incomplete and lacks complete information of all transactions completed within a day.

    Additionally, apart from incomplete transaction data in a day, it is observed that certain stores do not share data for all active days. Stores share data ranging from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2 days of a month using 28 days of actual historical data, the vice versa is not recommended.

    Nielsen encourages you to create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.

    Content

    You are provided with the dataset that contains store level data by brands and categories for select stores-

    Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the last 3 months. This can be referred to as the ideal data.

    Hackathon_Working_Data - This contains data for selected stores which are missing and/or incomplete.

    Hackathon_Mapping_File - This file is provided to help understand the column names in the data set.

    Hackathon_Validation_Data - This file contains the data stores and product groups for which you have to predict the Total_VALUE.

    Sample Submission - This file represents what needs to be uploaded as output by candidate in the same format. The sample data is provided in the file to help understand the columns and values required.

    Acknowledgements

    Nielsen Holdings plc (NYSE: NLSN) is a global measurement and data analytics company that provides the most complete and trusted view available of consumers and markets worldwide. Nielsen is divided into two business units. Nielsen Global Media, the arbiter of truth for media markets, provides media and advertising industries with unbiased and reliable metrics that create a shared understanding of the industry required for markets to function. Nielsen Global Connect provides consumer packaged goods manufacturers and retailers with accurate, actionable information and insights and a complete picture of the complex and changing marketplace that companies need to innovate and grow. Our approach marries proprietary Nielsen data with other data sources to help clients around the world understand what’s happening now, what’s happening next, and how to best act on this knowledge. An S&P 500 company, Nielsen has operations in over 100 countries, covering more than 90% of the world’s population.

    Know more: https://www.nielsen.com/us/en/

    Inspiration

    Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.

    --- Original source retains full ownership of the source dataset ---

  11. Store Transaction data

    • kaggle.com
    Updated Mar 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prateek Gupta (2020). Store Transaction data [Dataset]. https://www.kaggle.com/iamprateek/store-transaction-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prateek Gupta
    Description

    Context

    Nielsen receives transaction level scanning data (POS Data) from its partner stores on a regular basis. Stores sharing POS data include bigger format store types such as supermarkets, hypermarkets as well as smaller traditional trade grocery stores (Kirana stores), medical stores etc. using a POS machine.

    While in a bigger format store, all items for all transactions are scanned using a POS machine, smaller and more localized shops do not have a 100% compliance rate in terms of scanning and inputting information into the POS machine for all transactions.

    A transaction involving a single packet of chips or a single piece of candy may not be scanned and recorded to spare customer the inconvenience or during rush hours when the store is crowded with customers.

    Thus, the data received from such stores is often incomplete and lacks complete information of all transactions completed within a day.

    Additionally, apart from incomplete transaction data in a day, it is observed that certain stores do not share data for all active days. Stores share data ranging from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2 days of a month using 28 days of actual historical data, the vice versa is not recommended.

    Nielsen encourages you to create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.

    Content

    You are provided with the dataset that contains store level data by brands and categories for select stores-

    Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the last 3 months. This can be referred to as the ideal data.

    Hackathon_Working_Data - This contains data for selected stores which are missing and/or incomplete.

    Hackathon_Mapping_File - This file is provided to help understand the column names in the data set.

    Hackathon_Validation_Data - This file contains the data stores and product groups for which you have to predict the Total_VALUE.

    Sample Submission - This file represents what needs to be uploaded as output by candidate in the same format. The sample data is provided in the file to help understand the columns and values required.

    Acknowledgements

    Nielsen Holdings plc (NYSE: NLSN) is a global measurement and data analytics company that provides the most complete and trusted view available of consumers and markets worldwide. Nielsen is divided into two business units. Nielsen Global Media, the arbiter of truth for media markets, provides media and advertising industries with unbiased and reliable metrics that create a shared understanding of the industry required for markets to function. Nielsen Global Connect provides consumer packaged goods manufacturers and retailers with accurate, actionable information and insights and a complete picture of the complex and changing marketplace that companies need to innovate and grow. Our approach marries proprietary Nielsen data with other data sources to help clients around the world understand what’s happening now, what’s happening next, and how to best act on this knowledge. An S&P 500 company, Nielsen has operations in over 100 countries, covering more than 90% of the world’s population.

    Know more: https://www.nielsen.com/us/en/

    Inspiration

    Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.

  12. m

    GDM register with risk factors for screening selection criteria pilot...

    • data.mendeley.com
    • researchdata.edu.au
    Updated Apr 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ezekiel Uba Nwose (2023). GDM register with risk factors for screening selection criteria pilot dataset [Dataset]. http://doi.org/10.17632/r8s3j8hfdb.1
    Explore at:
    Dataset updated
    Apr 24, 2023
    Authors
    Ezekiel Uba Nwose
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Gestational diabetes Mellitus (GDM) if unmanaged can complicate pregnancy outcomes. Selective screening of GDM is a common policy hence, the need for complete medical records of patients. The extent and pattern that incomplete documentation of patients’ records can prevent recall of antenatal patients requires elucidation. Aim: To describe effectiveness of phone contacts on medical records and GDM risk factors among those reachable by telehealth. Data: Initial data were collected in 2018, which continued in 2019 at Eku Baptist Government Hospital (EBGH). Demographic data were complete in all patients, but incomplete documentation was observed with as much as 98%. 301/391 lacked complete data about 95% of the cases, this was solely due to missing height measurements. In 2020, records of 123 case files were reviewed for effectiveness of phone contacts to do telehealth, and with simultaneous GDM risk assessment. 98/123 have phone details on medical records, of which 41/98 cases followed up were reached hence constituted the pilot dataset. Analysis performed: Descriptive frequency analysis Reuse potentials: This dataset is reusable and useful in the future for potential systematic review and/or meta-analysis. Also, for GDM screening selection criteria, medical records and telehealth services – this pilot dataset is useful for health service limitations and areas for improvement.

  13. f

    Evaluating Functional Diversity: Missing Trait Data and the Importance of...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello (2023). Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation [Dataset]. http://doi.org/10.1371/journal.pone.0149270
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.

  14. NNDSS - TABLE 1V. Malaria to Measles, Indigenous

    • catalog.data.gov
    • healthdata.gov
    • +4more
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). NNDSS - TABLE 1V. Malaria to Measles, Indigenous [Dataset]. https://catalog.data.gov/dataset/nndss-table-1v-malaria-to-measles-indigenous
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    NNDSS - TABLE 1V. Malaria to Measles, Indigenous - 2020. In this Table, provisional cases* of notifiable diseases are displayed for United States, U.S. territories, and Non-U.S. residents. Notice: Data from California published in week 29 for years 2019 and 2020 were incomplete when originally published on July 24, 2020. On August 4, 2020, incomplete case counts were replaced with a "U" indicating case counts are not available for specified time period. Notice: Measles data for weeks 1-4 (in Table 1v) were updated on 02-28-2020 to correct the classification of imported and indigenous. For all weeks, measles is considered imported if the disease was acquired outside of the United States and is considered indigenous if the disease was acquired anywhere within the United States or it is not known where the disease was acquired. Note: This table contains provisional cases of national notifiable diseases from the National Notifiable Diseases Surveillance System (NNDSS). NNDSS data from the 50 states, New York City, the District of Columbia and the U.S. territories are collated and published weekly on the NNDSS Data and Statistics web page (https://wwwn.cdc.gov/nndss/data-and-statistics.html). Cases reported by state health departments to CDC for weekly publication are provisional because of the time needed to complete case follow-up. Therefore, numbers presented in later weeks may reflect changes made to these counts as additional information becomes available. The national surveillance case definitions used to define a case are available on the NNDSS web site at https://wwwn.cdc.gov/nndss/. Information about the weekly provisional data and guides to interpreting data are available at: https://wwwn.cdc.gov/nndss/infectious-tables.html. Footnotes: U: Unavailable — The reporting jurisdiction was unable to send the data to CDC or CDC was unable to process the data. -: No reported cases — The reporting jurisdiction did not submit any cases to CDC. N: Not reportable — The disease or condition was not reportable by law, statute, or regulation in the reporting jurisdiction. NN: Not nationally notifiable — This condition was not designated as being nationally notifiable. NP: Nationally notifiable but not published. NC: Not calculated — There is insufficient data available to support the calculation of this statistic. Cum: Cumulative year-to-date counts. Max: Maximum — Maximum case count during the previous 52 weeks. * Case counts for reporting years 2019 and 2020 are provisional and subject to change. Cases are assigned to the reporting jurisdiction submitting the case to NNDSS, if the case's country of usual residence is the U.S., a U.S. territory, unknown, or null (i.e. country not reported); otherwise, the case is assigned to the 'Non-U.S. Residents' category. Country of usual residence is currently not reported by all jurisdictions or for all conditions. For further information on interpretation of these data, see https://wwwn.cdc.gov/nndss/document/Users_guide_WONDER_tables_cleared_final.pdf. †Previous 52 week maximum and cumulative YTD are determined from periods of time when the condition was reportable in the jurisdiction (i.e., may be less than 52 weeks of data or incomplete YTD data). § Measles is considered imported if the disease was acquired outside of the United States and is considered indigenous if the disease was acquired anywhere within the United States or it is not known where the disease was acquired.

  15. d

    Percentage of P1 Cohort who Did Not Complete Secondary Education

    • data.gov.sg
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Education (2024). Percentage of P1 Cohort who Did Not Complete Secondary Education [Dataset]. https://data.gov.sg/dataset/percentage-of-p1-cohort-who-did-not-complete-secondary-education
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset authored and provided by
    Ministry of Education
    License

    https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence

    Time period covered
    Jan 2003 - Jan 2023
    Description

    Dataset from Ministry of Education. For more information, visit https://data.gov.sg/datasets/d_eb818056f2f6b839256edfd2abb86d7c/view

  16. E

    Replication Data for: Sparse multi-trait genomic prediction under incomplete...

    • data.moa.gov.et
    html
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CIMMYT Ethiopia (2025). Replication Data for: Sparse multi-trait genomic prediction under incomplete block designs [Dataset]. https://data.moa.gov.et/dataset/hdl-11529-10548787
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    CIMMYT Ethiopia
    Description

    The efficiency of genomic selection methodologies can be increased by sparse testing where a subset of materials are evaluated in different environments. Seven different multi-environment plant breeding datasets were used to evaluate four different methods for allocating lines to environments in a multi-trait genomic prediction problem. The results of the analysis are presented in the accompanying article.

  17. c

    Campaign Finance - Summary Totals

    • s.cnmilf.com
    • data.sfgov.org
    • +1more
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). Campaign Finance - Summary Totals [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/campaign-finance-summary-totals
    Explore at:
    Dataset updated
    Oct 4, 2025
    Dataset provided by
    data.sfgov.org
    Description

    A. SUMMARY This dataset contains current summary information for electronically filed FPPC campaign forms. The columns in this dataset correspond to the figures reported on the summary page of FPPC forms 450, 460, 461, and 465. Refer to the FPPC Forms represented in this dataset. B. HOW THE DATASET IS CREATED Committees file campaign statements with the Ethics Commission on a periodic basis. Those statements are stored with the Commission's provider. Data is generally presented as-filed by committees. If a committee files an amendment, the data from that filing completely replaces the original and any prior amendments in the filing sequence. C. UPDATE PROCESS Each night starting at midnight Pacific time a script runs to check for new filings with the Commission's database, and updates this dataset with transactions from new filings. The update process can take a variable amount of time to complete. Viewing or downloading this dataset while the update is running may result in incomplete data, therefore it is highly recommended to view or download this data before midnight or after 8am. D. HOW TO USE THIS DATASET Transactions from rejected and superseded filings are not included in this dataset. Transactions from many different FPPC forms are combined in this dataset, refer to the column "Form Type" to differentiate transaction types. A row with no value in the SyncFlag column indicates a paper filing amended an electronic filing. The SFEC is working on how to automatically deal with these cases. Properties suffixed with "-nid" can be used to join the data between Filers, Filings, and Transaction datasets. Refer to the Ethics Commission's webpage for more information. RELATED DATASETS San Francisco Campaign Filers Filings Received by SFEC Summary Totals Transactions

  18. k

    Comprehensive battery aging dataset: capacity and impedance fade...

    • radar.kit.edu
    • service.tib.eu
    • +1more
    tar
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Luh; Thomas Blank (2024). Comprehensive battery aging dataset: capacity and impedance fade measurements of a lithium-ion NMC/C-SiO cell [dataset] [Dataset]. http://doi.org/10.35097/1947
    Explore at:
    tar(69375563264 bytes)Available download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Karlsruhe Institute of Technology
    Authors
    Matthias Luh; Thomas Blank
    Description

    The data is described in detail in the open-access publication "Comprehensive battery aging dataset: capacity and impedance fade measurements of a lithium-ion NMC/C-SiO cell" published in Nature Scientific Data under the DOI: 10.1038/s41597-024-03831-x, also see “Related identifier”. An updated dataset is published under the DOI 10.35097/1969 (result data, e.g., capacity fade and impedance increase) and 10.35097/kww7jv8ajuvchcah (log data), also see “Related identifier”. Python example code to read, process, and visualize the data is provided in the GitHub repository: https://github.com/energystatusdata/bat-age-data-scripts/ Note: The "cell_eisv2.zip" file in this dataset is incomplete and only contains data for cells P001_1 to P044_2. The corrected file "cell_eisv2_fixed.zip" containing data for all 228 cells P001_1 to P076_3 can be found in the dataset “Addendum to "Comprehensive battery aging dataset: capacity and impedance fade measurements of a lithium-ion NMC/C-SiO cell [dataset]"” with the DOI 10.35097/krk531nmj4bsshha (see “Related identifier”).

  19. f

    Data from: Main Effects and Interactions in Mixed and Incomplete Data Frames...

    • tandf.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geneviève Robin; Olga Klopp; Julie Josse; Éric Moulines; Robert Tibshirani (2023). Main Effects and Interactions in Mixed and Incomplete Data Frames [Dataset]. http://doi.org/10.6084/m9.figshare.8191850.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Geneviève Robin; Olga Klopp; Julie Josse; Éric Moulines; Robert Tibshirani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A mixed data frame (MDF) is a table collecting categorical, numerical, and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously main effects, such as row, column, or group effects and interactions, for which a low-rank model has often been suggested. Although the literature on low-rank approximations is very substantial, with few exceptions, existing methods do not allow to incorporate main effects and interactions while providing statistical guarantees. The present work fills this gap. We propose an estimation method which allows to recover simultaneously the main effects and the interactions. We show that our method is near optimal under conditions which are met in our targeted applications. We also propose an optimization algorithm which provably converges to an optimal solution. Numerical experiments reveal that our method, mimi, performs well when the main effects are sparse and the interaction matrix has low-rank. We also show that mimi compares favorably to existing methods, in particular when the main effects are significantly large compared to the interactions, and when the proportion of missing entries is large. The method is available as an R package on the Comprehensive R Archive Network. Supplementary materials for this article are available online.

  20. C

    CA State Lands Commission Leases

    • data.cnra.ca.gov
    • data.ca.gov
    • +7more
    Updated Jun 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Lands Commission (2025). CA State Lands Commission Leases [Dataset]. https://data.cnra.ca.gov/dataset/ca-state-lands-commission-leases
    Explore at:
    arcgis geoservices rest api, csv, kml, zip, xlsx, txt, gdb, gpkg, geojson, htmlAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    California State Lands Commissionhttps://www.slc.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    The California State Lands Commission (CSLC) may issue leases or permits on state lands under its jurisdiction. Additional information regarding CSLC leasing can be found at: https://www.slc.ca.gov/leases-permits/. This is a point feature dataset indicating the approximate locations, often represented by a center point, of the general lease area of lands leased by the CSLC on state sovereign lands and school lands, including coastal marine areas, bays, rivers, and lakes. This dataset is to be considered incomplete as new leases are being entered into the CSLC database, and existing leases are modified or terminated, on an ongoing basis. Many marine areas, bays, rivers, and lakes are not under the leasing jurisdiction of the CSLC because they have been legislatively granted in trust to other government entities. Additionally, some leases are not shown at all for a variety of reasons. Many point features in this dataset provide links to maps and/or land descriptions used in the CSLC lease approval process. These documents are hosted on the CSLC archives website at https://www.slc.ca.gov/archives/. In some cases, these documents provide reliable, current lease boundary information, while in other cases, additional information is necessary to properly define lease boundaries. Additionally, revisions to lease boundaries may have occurred subsequent to CSLC approval, as in the case of as-built locations that differ from originally approved alignments. Further, the boundary of some leases may be the mean high tide line, which in a state of nature is both ambulatory, and in the absence of a survey conducted by a licensed land surveyor, not readily locatable.


    Disclaimer of Liability
    This dataset is not suitable for any legal purpose. The CSLC makes no warranty of any kind, express or implied, nor assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any of the information contained in the dataset. The CSLC assumes no legal liability or responsibility for anyone's use of the information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department of Homeland Security (2024). Incomplete Published Data Assets Report [Dataset]. https://datasets.ai/datasets/incomplete-published-data-assets-report

Incomplete Published Data Assets Report

Explore at:
Dataset updated
Aug 26, 2024
Dataset authored and provided by
Department of Homeland Security
Description

Displays incomplete Published data assets. This report can be used to help improve the Data Asset Completeness score from the Enterprise Data Management (EDM) Scorecard by identifying which missing fields are required for completeness.

Search
Clear search
Close search
Google apps
Main menu