54 datasets found
  1. Data from: Wikipedia Category Granularity (WikiGrain) data

    • zenodo.org
    csv, txt
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jürgen Lerner; Jürgen Lerner (2020). Wikipedia Category Granularity (WikiGrain) data [Dataset]. http://doi.org/10.5281/zenodo.1005175
    Explore at:
    txt, csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jürgen Lerner; Jürgen Lerner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The "Wikipedia Category Granularity (WikiGrain)" data consists of three files that contain information about articles of the English-language version of Wikipedia (https://en.wikipedia.org).

    The data has been generated from the database dump dated 20 October 2016 provided by the Wikimedia foundation licensed under the GNU Free Documentation License (GFDL) and the Creative Commons Attribution-Share-Alike 3.0 License.

    WikiGrain provides information on all 5,006,601 Wikipedia articles (that is, pages in Namespace 0 that are not redirects) that are assigned to at least one category.

    The WikiGrain Data is analyzed in the paper

    Jürgen Lerner and Alessandro Lomi: Knowledge categorization affects popularity and quality of Wikipedia articles. PLoS ONE, 13(1):e0190674, 2018.

    ===============================================================
    Individual files (tables in comma-separated-values-format):

    ---------------------------------------------------------------
    * article_info.csv contains the following variables:

    - "id"
    (integer) Unique identifier for articles; identical with the page_id in the Wikipedia database.

    - "granularity"
    (decimal) The granularity of an article A is defined to be the average (mean) granularity of the categories of A, where the granularity of a category C is the shortest path distance in the parent-child subcategory network from the root category (Category:Articles) to C. Higher granularity values indicate articles whose topics are less general, narrower, more specific.

    - "is.FA"
    (boolean) True ('1') if the article is a featured article; false ('0') else.

    - "is.FA.or.GA"
    (boolean) True ('1') if the article is a featured article or a good article; false ('0') else.

    - "is.top.importance"
    (boolean) True ('1') if the article is listed as a top importance article by at least one WikiProject; false ('0') else.

    - "number.of.revisions"
    (integer) Number of times a new version of the article has been uploaded.


    ---------------------------------------------------------------
    * article_to_tlc.csv
    is a list of links from articles to the closest top-level categories (TLC) they are contained in. We say that an article A is a member of a TLC C if A is in a category that is a descendant of C and the distance from C to A (measured by the number of parent-child category links) is minimal over all TLC. An article can thus be member of several TLC.
    The file contains the following variables:

    - "id"
    (integer) Unique identifier for articles; identical with the page_id in the Wikipedia database.

    - "id.of.tlc"
    (integer) Unique identifier for TLC in which the article is contained; identical with the page_id in the Wikipedia database.

    - "title.of.tlc"
    (string) Title of the TLC in which the article is contained.

    ---------------------------------------------------------------
    * article_info_normalized.csv
    contains more variables associated with articles than article_info.csv. All variables, except "id" and "is.FA" are normalized to standard deviation equal to one. Variables whose name has prefix "log1p." have been transformed by the mapping x --> log(1+x) to make distributions that are skewed to the right 'more normal'.
    The file contains the following variables:

    - "id"
    Article id.

    - "is.FA"
    Boolean indicator for whether the article is featured.

    - "log1p.length"
    Length measured by the number of bytes.

    - "age"
    Age measured by the time since the first edit.

    - "log1p.number.of.edits"
    Number of times a new version of the article has been uploaded.

    - "log1p.number.of.reverts"
    Number of times a revision has been reverted to a previous one.

    - "log1p.number.of.contributors"
    Number of unique contributors to the article.

    - "number.of.characters.per.word"
    Average number of characters per word (one component of 'reading complexity').

    - "number.of.words.per.sentence"
    Average number of words per sentence (second component of 'reading complexity').

    - "number.of.level.1.sections"
    Number of first level sections in the article.

    - "number.of.level.2.sections"
    Number of second level sections in the article.

    - "number.of.categories"
    Number of categories the article is in.

    - "log1p.average.size.of.categories"
    Average size of the categories the article is in.

    - "log1p.number.of.intra.wiki.links"
    Number of links to pages in the English-language version of Wikipedia.

    - "log1p.number.of.external.references"
    Number of external references given in the article.

    - "log1p.number.of.images"
    Number of images in the article.

    - "log1p.number.of.templates"
    Number of templates that the article uses.

    - "log1p.number.of.inter.language.links"
    Number of links to articles in different language edition of Wikipedia.

    - "granularity"
    As in article_info.csv (but normalized to standard deviation one).

  2. m

    Data source of Integrated perspective on granular characteristics of Martian...

    • data.mendeley.com
    Updated Sep 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Zhang (2025). Data source of Integrated perspective on granular characteristics of Martian soils (including GSD data from Gale, Jezero, Gusev and Meridiani Planum) [Dataset]. http://doi.org/10.17632/kr8bndwzrs.1
    Explore at:
    Dataset updated
    Sep 5, 2025
    Authors
    Jun Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    S1: Detailed GSD data (soil weight percentage within each grain size category) from Gale Crater and the corresponding GSD parameters; S2: Detailed GSD data (soil weight percentage within each grain size category) from Jezero Crater and the corresponding GSD parameters; S3: Detailed GSD data (soil weight percentage within each grain size category) from Gusev Crater and the corresponding GSD parameters; S4: Detailed GSD data (soil weight percentage within each grain size category) from Meridiani Planum and the corresponding GSD parameters.

  3. d

    Capital Flow Management Measures Database

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Binici, Mahir (2023). Capital Flow Management Measures Database [Dataset]. http://doi.org/10.7910/DVN/SABBBB
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Binici, Mahir
    Description

    A highly granular database of nearly 500 capital flow management measures that cover 14 instruments and 49 countries at monthly frequency between 2008 and 2021.

  4. D

    Time‑Series Database For Network Telemetry Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Time‑Series Database For Network Telemetry Market Research Report 2033 [Dataset]. https://dataintelo.com/report/timeseries-database-for-network-telemetry-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Time‑Series Database for Network Telemetry Market Outlook



    According to our latest research, the global Time-Series Database for Network Telemetry market size in 2024 reached USD 1.23 billion, reflecting the rapid adoption of advanced database solutions for real-time network management. The market is experiencing robust expansion, with a CAGR of 19.7% projected over the forecast period. By 2033, the market is expected to attain a value of USD 5.94 billion, driven by the imperative need for scalable, high-performance data management platforms to support increasingly complex network infrastructures. The primary growth factor is the surge in network traffic, the proliferation of IoT devices, and the escalating demand for actionable network insights in real time.




    A key driver behind the exponential growth of the Time-Series Database for Network Telemetry market is the unprecedented expansion of digital transformation initiatives across industries. Enterprises and service providers are generating massive volumes of telemetry data from network devices, applications, and endpoints. Traditional relational databases are ill-equipped to handle the high velocity and granularity of time-stamped data required for effective network telemetry. Time-series databases, purpose-built for this data type, enable organizations to ingest, process, and analyze millions of data points per second, facilitating proactive network management. The shift towards cloud-native architectures, edge computing, and the adoption of 5G networks further amplify the need for efficient telemetry data storage and analytics, reinforcing the critical role of time-series databases in modern network operations.




    Another significant growth factor is the rising complexity of network environments, spurred by the advent of hybrid and multi-cloud deployments. As organizations embrace distributed infrastructures and software-defined networking, the challenge of monitoring, diagnosing, and optimizing network performance becomes more acute. Time-series databases for network telemetry empower IT teams with the ability to correlate historical and real-time data, detect anomalies, and automate fault management. This capability is particularly vital for sectors such as telecommunications, IT service providers, and large enterprises, where network downtime or performance degradation can have substantial financial and reputational repercussions. The integration of artificial intelligence and machine learning with time-series databases is also enabling advanced predictive analytics, further enhancing operational efficiency and network reliability.




    The growing emphasis on network security and compliance is another pivotal factor fueling the adoption of time-series databases for network telemetry. With cyber threats becoming more sophisticated and regulatory requirements tightening, organizations must maintain comprehensive visibility into network activities and ensure rapid incident detection and response. Time-series databases provide the high-resolution data capture and retention necessary for security analytics, forensic investigations, and regulatory audits. As network telemetry evolves to encompass not only performance metrics but also security events and policy violations, the demand for scalable and secure time-series database solutions is expected to surge across both public and private sectors.




    From a regional perspective, North America currently dominates the Time-Series Database for Network Telemetry market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of major technology vendors, early adoption of advanced network management solutions, and substantial investments in digital infrastructure. However, the Asia Pacific region is poised for the fastest growth, with a projected CAGR of 22.4% through 2033, driven by rapid urbanization, expanding telecommunications networks, and increasing enterprise digitization. Europe and the Middle East & Africa are also witnessing steady growth, supported by government initiatives to modernize network infrastructure and enhance cybersecurity capabilities.



    Database Type Analysis



    The Database Type segment of the Time-Series Database for Network Telemetry market is bifurcated into Open Source and Commercial solutions, each catering to distinct

  5. d

    National Youth in Transition Database - Outcomes Survey

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Mar 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACF (2025). National Youth in Transition Database - Outcomes Survey [Dataset]. https://catalog.data.gov/dataset/national-youth-in-transition-database-outcomes-survey
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    ACF
    Description

    States report information from two reporting populations: (1) The Served Population which is information on all youth receiving at least one independent living services paid or provided by the Chafee Program agency, and (2) Youth completing the NYTD Survey. States survey youth regarding six outcomes: financial self-sufficiency, experience with homelessness, educational attainment, positive connections with adults, high-risk behaviors, and access to health insurance. States collect outcomes information by conducting a survey of youth in foster care on or around their 17th birthday, also referred to as the baseline population. States will track these youth as they age and conduct a new outcome survey on or around the youth's 19th birthday; and again on or around the youth's 21st birthday, also referred to as the follow-up population. States will collect outcomes information on these older youth at ages 19 or 21 regardless of their foster care status or whether they are still receiving independent living services from the State. Depending on the size of the State's foster care youth population, some States may conduct a random sample of the baseline population of the 17-year-olds that participate in the outcomes survey so that they can follow a smaller group of youth as they age. All States will collect and report outcome information on a new baseline population cohort every three years. Units of Response: Current and former youth in foster care Type of Data: Survey Tribal Data: No Periodicity: Annual Demographic Indicators: Ethnicity;Race;Sex SORN: Not Applicable Data Use Agreement: https://www.ndacan.acf.hhs.gov/datasets/request-dataset.cfm Data Use Agreement Location: https://www.ndacan.acf.hhs.gov/datasets/order_forms/termsofuseagreement.pdf Granularity: Individual Spatial: United States Geocoding: State

  6. G

    Digital Terrain Database Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Digital Terrain Database Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/digital-terrain-database-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 6, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Digital Terrain Database Market Outlook



    According to our latest research, the global Digital Terrain Database market size in 2024 stands at USD 2.54 billion, with a robust year-on-year growth trajectory. The market is expected to expand at a CAGR of 9.2% from 2025 to 2033, reaching a forecasted value of USD 5.67 billion by 2033. This growth is primarily driven by the increasing adoption of advanced geospatial technologies across various sectors, including defense, civil engineering, and urban planning, as organizations seek to leverage high-precision terrain data for enhanced decision-making and operational efficiency.




    The Digital Terrain Database market is experiencing significant momentum due to the rising demand for accurate topographical information in mission-critical applications. The integration of digital terrain data in aerospace and defense operations, such as flight simulation, mission planning, and navigation, is a key growth factor. These sectors require precise elevation models to ensure safety, optimize routes, and enhance situational awareness. Furthermore, the proliferation of unmanned aerial vehicles (UAVs) and autonomous systems has intensified the need for real-time, high-resolution terrain data, propelling the adoption of sophisticated digital terrain databases. As defense budgets continue to prioritize geospatial intelligence, the market is poised for sustained expansion.




    Another pivotal growth driver for the Digital Terrain Database market is the rapid urbanization and infrastructure development observed globally. Civil engineering and urban planning sectors are increasingly relying on detailed terrain models for designing resilient infrastructure, mitigating natural hazards, and optimizing land use. The surge in smart city initiatives, particularly in emerging economies, necessitates the deployment of advanced geospatial solutions. Digital terrain databases enable planners and engineers to simulate various scenarios, assess environmental impacts, and streamline construction processes. The integration of terrain data with Building Information Modeling (BIM) and Geographic Information Systems (GIS) further amplifies its value, fostering market growth across public and private sectors.




    Technological advancements and the growing accessibility of cloud-based geospatial solutions are also catalyzing market expansion. Cloud deployment models are democratizing access to high-quality terrain data, enabling organizations of all sizes to leverage these resources without significant upfront investments in hardware or infrastructure. The evolution of data acquisition methods, such as LiDAR, satellite imagery, and photogrammetry, has enhanced the accuracy and granularity of digital terrain databases. This, coupled with the increasing emphasis on environmental monitoring, disaster management, and agricultural optimization, is broadening the application landscape and stimulating demand for digital terrain databases across diverse verticals.




    From a regional perspective, North America currently dominates the Digital Terrain Database market, attributed to the presence of leading technology providers, robust defense spending, and widespread adoption of geospatial technologies. Europe follows closely, driven by stringent regulatory frameworks and substantial investments in infrastructure modernization. The Asia Pacific region is anticipated to exhibit the fastest growth during the forecast period, fueled by rapid urbanization, government-led smart city projects, and expanding applications in agriculture and environmental monitoring. Latin America and the Middle East & Africa are also witnessing increased adoption, albeit from a lower base, as digital transformation initiatives gain traction across these regions.





    Component Analysis



    The Digital Terrain Database market by component is segmented into Software, Hardware, and Services, each playing a vital role in the overall ecosystem. Software solutions form the backbone

  7. D

    Terrain And Obstacle Database Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Terrain And Obstacle Database Market Research Report 2033 [Dataset]. https://dataintelo.com/report/terrain-and-obstacle-database-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Terrain and Obstacle Database Market Outlook



    According to our latest research, the global Terrain and Obstacle Database market size reached USD 6.8 billion in 2024, reflecting a robust surge in demand across key sectors. The market is projected to expand at a CAGR of 10.7% from 2025 to 2033, with the total market value forecasted to hit USD 17.2 billion by 2033. This impressive growth is primarily fueled by the increasing adoption of advanced navigation systems, the proliferation of autonomous vehicles, and stringent regulatory mandates for safety in aviation and defense sectors.




    The principal growth driver for the Terrain and Obstacle Database market is the rapid evolution and integration of digital mapping technologies within critical applications such as aviation, defense, and autonomous vehicles. As industries transition towards automation and real-time decision-making, the need for highly accurate, up-to-date terrain and obstacle data has become paramount. Modern aircraft, for example, require seamless access to global terrain and obstacle databases to enhance situational awareness, avoid potential hazards, and comply with international safety standards. Similarly, defense and military operations are increasingly dependent on these databases for mission planning, threat detection, and tactical navigation. The convergence of artificial intelligence, machine learning, and geospatial analytics is further accelerating the sophistication and utility of terrain and obstacle databases, making them indispensable for next-generation mobility and security solutions.




    Another significant factor propelling the expansion of the Terrain and Obstacle Database market is the escalating emphasis on public safety and urban planning. With the proliferation of smart cities and the growing complexity of urban environments, municipal authorities and infrastructure planners are leveraging detailed terrain and obstacle data to optimize land use, enhance emergency response, and mitigate risks associated with natural disasters and urban expansion. The increasing deployment of drones for commercial, delivery, and surveillance applications also necessitates comprehensive databases to ensure safe navigation through densely populated or obstacle-rich environments. These trends are encouraging both public and private entities to invest in robust data acquisition, curation, and management solutions, thereby driving sustained market growth.




    Furthermore, the surge in demand for real-time, cloud-based data solutions is reshaping the competitive dynamics of the Terrain and Obstacle Database market. Cloud deployment offers scalability, remote accessibility, and seamless updates, making it particularly attractive for global enterprises and government agencies managing large-scale operations. The integration of terrain and obstacle databases with IoT devices, 5G networks, and edge computing is enhancing the granularity and timeliness of data delivery, supporting critical applications such as autonomous vehicle navigation, disaster management, and precision agriculture. As regulatory frameworks continue to tighten and technology adoption accelerates, the market is poised for significant innovation and value creation over the next decade.




    From a regional perspective, North America currently dominates the Terrain and Obstacle Database market, accounting for the largest revenue share in 2024. The region’s leadership is attributed to the presence of major aerospace, defense, and technology firms, as well as early adoption of advanced navigation and data management solutions. Europe and Asia Pacific are also witnessing substantial growth, driven by increasing investments in smart infrastructure, autonomous mobility, and national security initiatives. The Asia Pacific region, in particular, is expected to register the highest CAGR during the forecast period, fueled by rapid urbanization, expanding aviation sectors, and government-driven digital transformation projects.



    Component Analysis



    The Component segment of the Terrain and Obstacle Database market comprises Database Software, Data Services, and Hardware, each playing a critical role in the value chain. Database software forms the backbone of the market, enabling users to store, retrieve, and analyze vast quantities of terrain and obstacle data with high precision. The demand for robust, scalable, and user-friendly da

  8. i

    tecnalia/humanet

    • impactcybertrust.org
    Updated Jun 8, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Data Source (2012). tecnalia/humanet [Dataset]. http://doi.org/10.23721/100/1478897
    Explore at:
    Dataset updated
    Jun 8, 2012
    Authors
    External Data Source
    Description

    Our study analyzes the limitations of Bluetooth-based trace acquisition initiatives carried out until now in terms of granularity and reliability. We then go on to propose an optimal configuration for the acquisition of proximity traces and movement information using a fine-tuned Bluetooth system based on custom HW. With this system and based on such a configuration, we have carried out an intensive human trace acquisition experiment resulting in a proximity and mobility database of more than 5 million traces with a minimum granularity of 5 s. ; josemari.cabero@tecnalia.com

  9. D

    Database Observability Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Database Observability Market Research Report 2033 [Dataset]. https://dataintelo.com/report/database-observability-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Database Observability Market Outlook



    According to our latest research, the global database observability market size in 2024 stands at USD 2.1 billion, with a robust compound annual growth rate (CAGR) of 18.2% projected from 2025 to 2033. By the end of the forecast period in 2033, the market is expected to reach USD 10.9 billion. The primary growth factor driving this surge is the increasing complexity of data architectures and the critical need for real-time insights into database performance, security, and compliance across enterprises worldwide.




    One of the key growth drivers for the database observability market is the exponential rise in data volume and diversity, fueled by rapid digitization, cloud migration, and the proliferation of new data sources such as IoT devices and edge computing platforms. Modern enterprises are dealing with data landscapes that are not only larger but also more heterogeneous and distributed than ever before. This complexity creates significant challenges in monitoring, diagnosing, and optimizing database performance. As a result, organizations are increasingly investing in advanced database observability tools that offer real-time visibility, proactive alerting, and automated root cause analysis. These solutions help ensure high availability, reliability, and performance of mission-critical applications, thereby driving adoption across sectors.




    Another significant factor contributing to the growth of the database observability market is the rising importance of security and regulatory compliance. With the increasing number of data breaches and stringent global regulations such as GDPR, HIPAA, and CCPA, enterprises face mounting pressure to secure sensitive information and demonstrate compliance. Database observability solutions provide granular visibility into database activities, enabling organizations to detect suspicious behavior, enforce security policies, and generate audit-ready reports. The integration of AI and machine learning into observability platforms further enhances their ability to identify anomalies and potential threats in real time, making them indispensable for organizations operating in highly regulated industries such as BFSI, healthcare, and government.




    The shift toward cloud-native architectures and hybrid environments is also accelerating the demand for database observability. As organizations migrate their workloads to public, private, and multi-cloud environments, the complexity of managing and monitoring databases increases substantially. Traditional monitoring tools often fall short in providing comprehensive visibility across diverse deployment models. Modern database observability platforms are designed to address these challenges by offering unified dashboards, automated anomaly detection, and end-to-end tracing across on-premises and cloud databases. This capability is particularly valuable for large enterprises with distributed operations, as it enables seamless monitoring, rapid troubleshooting, and optimized resource allocation, ultimately enhancing operational efficiency and reducing downtime.




    Regionally, North America currently leads the database observability market, accounting for the largest share in 2024. This dominance is attributed to the high concentration of technology-driven enterprises, early adoption of advanced IT solutions, and significant investments in cloud infrastructure. However, the Asia Pacific region is expected to witness the highest CAGR during the forecast period, driven by rapid digital transformation, expanding IT sectors, and increasing focus on data governance in emerging economies such as China, India, and Southeast Asia. Europe and Latin America are also experiencing steady growth, supported by evolving regulatory landscapes and the growing need for secure, resilient data management solutions.



    Component Analysis



    The component segment of the database observability market is bifurcated into software and services. The software sub-segment dominates the market, primarily due to the increasing demand for advanced analytics, automation, and AI-driven monitoring capabilities. Database observability software provides comprehensive tools for real-time monitoring, performance optimization, anomaly detection, and detailed reporting, which are essential for modern data-driven enterprises. As organizations strive to maintain high availability and efficiency in th

  10. D

    Just-in-Time Database Access Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Just-in-Time Database Access Market Research Report 2033 [Dataset]. https://dataintelo.com/report/just-in-time-database-access-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Just-in-Time Database Access Market Outlook



    According to our latest research, the global Just-in-Time Database Access market size reached USD 2.3 billion in 2024, reflecting its rapid adoption across diverse industries. The market is projected to grow at a robust CAGR of 15.2% from 2025 to 2033, with the total market value anticipated to reach approximately USD 7.1 billion by 2033. This exceptional growth is driven by the increasing need for secure, real-time data access solutions, especially as organizations intensify their focus on cybersecurity and regulatory compliance in the digital era.



    The primary growth factor for the Just-in-Time Database Access market is the escalating threat landscape associated with unauthorized database access and data breaches. As organizations across sectors such as banking, healthcare, and retail store ever-increasing volumes of sensitive information, the imperative for robust access control mechanisms has become paramount. Just-in-Time (JIT) database access solutions offer granular, time-bound access privileges, ensuring that users or applications can only access databases when necessary and for the shortest possible duration. This approach significantly minimizes the attack surface and reduces the risk of insider threats, making it a critical component of modern cybersecurity strategies. The surge in high-profile data breaches and the tightening of regulatory frameworks—such as GDPR, HIPAA, and PCI DSS—have further accelerated the adoption of JIT database access solutions globally.



    Another crucial driver propelling the Just-in-Time Database Access market is the ongoing digital transformation initiatives across enterprises of all sizes. With the proliferation of cloud computing, hybrid IT environments, and remote workforces, traditional perimeter-based security models are no longer sufficient. Organizations are increasingly embracing zero-trust architectures, where JIT database access plays a pivotal role by enforcing least-privilege principles and enabling dynamic, context-aware access controls. The agility and scalability offered by JIT solutions allow businesses to adapt quickly to evolving operational needs while maintaining strict security postures. Furthermore, the integration of artificial intelligence and machine learning into JIT platforms is enabling more sophisticated access policies based on real-time risk assessments, further enhancing their value proposition.



    In addition, the rising adoption of DevOps and agile development methodologies is fueling the demand for Just-in-Time Database Access solutions. Development teams often require temporary elevated privileges to perform tasks such as database migrations, testing, or troubleshooting. JIT access mechanisms provide a secure, auditable way to grant such privileges without exposing databases to persistent risks. This not only streamlines operational workflows but also ensures compliance with internal and external security standards. As organizations continue to prioritize speed and innovation, the need for seamless and secure database access solutions will remain a significant market driver.



    From a regional perspective, North America currently dominates the Just-in-Time Database Access market, accounting for over 38% of the global revenue in 2024. This leadership is attributed to the region’s advanced IT infrastructure, high awareness of cybersecurity best practices, and stringent regulatory requirements. Europe follows closely, driven by robust data privacy laws and widespread cloud adoption. The Asia Pacific region, while still emerging, is expected to witness the fastest CAGR during the forecast period, fueled by rapid digitalization, expanding enterprise ecosystems, and increasing investments in cybersecurity infrastructure. Latin America and the Middle East & Africa are also showing promising growth trajectories, albeit from a smaller base, as organizations in these regions ramp up their digital transformation efforts.



    Component Analysis



    The Component segment of the Just-in-Time Database Access market is divided into Software, Hardware, and Services. Software solutions constitute the largest share, as they form the core of JIT access mechanisms, enabling organizations to enforce time-bound, role-based access controls to critical databases. The demand for advanced software platforms is being driven by the need for centralized management, real-time monitoring, and automated policy enforcement. These platform

  11. DataSheet1_Data Sources for Drug Utilization Research in Brazil—DUR-BRA...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes (2023). DataSheet1_Data Sources for Drug Utilization Research in Brazil—DUR-BRA Study.xlsx [Dataset]. http://doi.org/10.3389/fphar.2021.789872.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Background: In Brazil, studies that map electronic healthcare databases in order to assess their suitability for use in pharmacoepidemiologic research are lacking. We aimed to identify, catalogue, and characterize Brazilian data sources for Drug Utilization Research (DUR).Methods: The present study is part of the project entitled, “Publicly Available Data Sources for Drug Utilization Research in Latin American (LatAm) Countries.” A network of Brazilian health experts was assembled to map secondary administrative data from healthcare organizations that might provide information related to medication use. A multi-phase approach including internet search of institutional government websites, traditional bibliographic databases, and experts’ input was used for mapping the data sources. The reviewers searched, screened and selected the data sources independently; disagreements were resolved by consensus. Data sources were grouped into the following categories: 1) automated databases; 2) Electronic Medical Records (EMR); 3) national surveys or datasets; 4) adverse event reporting systems; and 5) others. Each data source was characterized by accessibility, geographic granularity, setting, type of data (aggregate or individual-level), and years of coverage. We also searched for publications related to each data source.Results: A total of 62 data sources were identified and screened; 38 met the eligibility criteria for inclusion and were fully characterized. We grouped 23 (60%) as automated databases, four (11%) as adverse event reporting systems, four (11%) as EMRs, three (8%) as national surveys or datasets, and four (11%) as other types. Eighteen (47%) were classified as publicly and conveniently accessible online; providing information at national level. Most of them offered more than 5 years of comprehensive data coverage, and presented data at both the individual and aggregated levels. No information about population coverage was found. Drug coding is not uniform; each data source has its own coding system, depending on the purpose of the data. At least one scientific publication was found for each publicly available data source.Conclusions: There are several types of data sources for DUR in Brazil, but a uniform system for drug classification and data quality evaluation does not exist. The extent of population covered by year is unknown. Our comprehensive and structured inventory reveals a need for full characterization of these data sources.

  12. Data extracted from the official German topographic database ATKIS at scale...

    • figshare.com
    application/x-dbf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Gedicke (2023). Data extracted from the official German topographic database ATKIS at scale 1:25.000 [Dataset]. http://doi.org/10.6084/m9.figshare.12987710.v2
    Explore at:
    application/x-dbfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sven Gedicke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data set extracted from the official German topographic database ATKIS with the highest provided granularity at scale 1:25.000. It covers a rural area in the northwest of North Rhine-Westphalia containing 1647 polygons of six different land-use classes and a road network comprising line features of different road types.

  13. m

    Open source database for validating and falsifying discrete mechanics models...

    • data.mendeley.com
    Updated Oct 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ritesh Gupta (2018). Open source database for validating and falsifying discrete mechanics models using synthetic granular materials Part I: Experimental tests with particles manufactured by a 3D printer [Dataset]. http://doi.org/10.17632/n6t49stxrh.1
    Explore at:
    Dataset updated
    Oct 15, 2018
    Authors
    Ritesh Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is made available to third-parties as a part of the effort to make verification and validation procedures transparent and reproducible for granular material research. This dataset includes the the microCT images of Hostun sand and the synthetic particle manufactured by 3D printer, the results of the oedometric test conducted on assembles of synthetic particles, the labelled volume and the discrete digital correlation data that provides the trajectories of individual particles in the assembles.

    Please refer to the 'description manual' document for content and information on shared database utilization.

  14. G

    Database Activity Monitoring Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Database Activity Monitoring Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/database-activity-monitoring-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Database Activity Monitoring Market Outlook



    According to our latest research, the global Database Activity Monitoring (DAM) market size reached USD 2.68 billion in 2024, with a robust year-on-year growth trajectory. The market is anticipated to expand at a CAGR of 13.7% from 2025 to 2033, ultimately reaching a forecasted value of USD 8.46 billion by 2033. This remarkable growth is primarily driven by the escalating frequency and sophistication of cyber threats targeting sensitive data, along with stringent global regulatory requirements that demand real-time database monitoring and protection.




    The primary growth factor for the Database Activity Monitoring market is the increasing prevalence of data breaches and cyber-attacks across industries. Organizations are storing vast amounts of sensitive information in databases, making these repositories prime targets for malicious actors. As cyber threats continue to evolve, businesses are compelled to deploy advanced DAM solutions to detect, analyze, and respond to suspicious activities in real-time. The growing adoption of cloud services and digital transformation initiatives further amplifies the need for robust database security, as organizations expand their digital footprints and expose themselves to new vulnerabilities. Regulatory compliance, such as GDPR, HIPAA, and PCI DSS, has also become a significant driver, with non-compliance leading to severe financial and reputational repercussions.




    Another crucial growth factor is the rapid expansion of data-driven business models and the proliferation of big data analytics. Enterprises are increasingly leveraging data to gain actionable insights and drive decision-making processes, which necessitates the storage and processing of massive volumes of information. As a result, the attack surface for potential data breaches widens, driving demand for comprehensive DAM solutions that provide granular visibility into database activities. The integration of artificial intelligence and machine learning within DAM platforms has further enhanced their capabilities, enabling organizations to proactively identify anomalous behaviors and mitigate risks before they escalate. This technological evolution is fostering greater adoption across both large enterprises and small and medium-sized businesses (SMEs), which are recognizing the value of advanced database security.




    The surge in remote work and hybrid workplace models post-pandemic has also contributed to the rising importance of database activity monitoring. With employees accessing databases from various locations and devices, the risk of unauthorized access and data exfiltration has increased significantly. DAM solutions play a pivotal role in monitoring user activities, enforcing access controls, and generating real-time alerts for suspicious actions. Moreover, the growing trend of Bring Your Own Device (BYOD) policies in organizations further complicates database security, necessitating the deployment of sophisticated monitoring tools. The convergence of these factors is expected to sustain the strong growth momentum of the DAM market through the forecast period.




    From a regional perspective, North America continues to dominate the Database Activity Monitoring market, accounting for the largest market share in 2024, primarily due to the presence of major technology players, stringent data privacy regulations, and high adoption rates of advanced security solutions. Europe follows closely, driven by robust regulatory frameworks and increasing investments in cybersecurity infrastructure. The Asia Pacific region is expected to witness the highest CAGR during the forecast period, fueled by rapid digitalization, expanding IT infrastructure, and growing awareness of data security. Meanwhile, Latin America and Middle East & Africa are gradually catching up, supported by government initiatives, rising cyber threats, and increased adoption of cloud technologies.





    Component Analysis

    <br /&

  15. D

    Database Monitoring Tools Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Database Monitoring Tools Market Research Report 2033 [Dataset]. https://dataintelo.com/report/database-monitoring-tools-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Database Monitoring Tools Market Outlook



    According to our latest research, the global Database Monitoring Tools market size reached USD 2.65 billion in 2024, demonstrating robust expansion driven by the increasing complexity and scale of enterprise data environments. With a steady compound annual growth rate (CAGR) of 13.8% forecast from 2025 to 2033, the market is projected to attain a value of USD 8.15 billion by 2033. The primary growth factor is the surging demand for real-time data analytics, reliability, and performance optimization across diverse industries, as organizations increasingly rely on data-driven decision-making and digital transformation initiatives.




    The growing adoption of cloud computing and hybrid IT infrastructures is a significant driver for the Database Monitoring Tools market. As enterprises migrate their databases from traditional on-premises environments to cloud-based platforms, the need for sophisticated monitoring solutions that can ensure seamless performance, security, and availability becomes paramount. Cloud-native databases and multi-cloud strategies have introduced new layers of complexity, making it essential for organizations to deploy advanced monitoring tools capable of supporting distributed environments. Additionally, the proliferation of microservices and containerized applications is further fueling demand for comprehensive monitoring solutions that can provide granular insights across diverse database architectures.




    Another crucial growth factor for the Database Monitoring Tools market is the rise in regulatory compliance requirements and data security concerns. Industries such as BFSI, healthcare, and government are under increasing pressure to maintain stringent data governance and privacy standards. This has led to a surge in the implementation of monitoring tools that offer real-time auditing, anomaly detection, and alerting mechanisms to ensure compliance with global standards such as GDPR, HIPAA, and PCI DSS. These capabilities help organizations mitigate risks associated with data breaches, unauthorized access, and downtime, thereby safeguarding critical business operations and customer trust.




    Furthermore, the rapid advancement of artificial intelligence and machine learning technologies is reshaping the Database Monitoring Tools market. Modern monitoring solutions are incorporating AI-driven analytics to proactively identify performance bottlenecks, predict potential failures, and automate remediation processes. This not only enhances operational efficiency but also reduces the burden on IT teams, enabling them to focus on strategic initiatives. The integration of intelligent automation and predictive analytics is expected to be a key differentiator for vendors in this market, as enterprises seek to optimize their database environments for scalability, resilience, and cost-effectiveness.




    From a regional perspective, North America continues to dominate the Database Monitoring Tools market, accounting for the largest revenue share in 2024. The region’s leadership is attributed to the widespread adoption of advanced IT infrastructure, presence of major technology providers, and strong emphasis on data security and compliance. Asia Pacific, on the other hand, is emerging as the fastest-growing market, driven by the digital transformation of enterprises in countries like China, India, and Japan. Europe also remains a significant contributor, particularly in sectors such as BFSI and healthcare, where regulatory mandates and data privacy concerns are fueling demand for robust monitoring solutions.



    Component Analysis



    The Component segment in the Database Monitoring Tools market is bifurcated into software and services, each playing a pivotal role in addressing the evolving needs of enterprises. Software solutions form the backbone of this segment, offering a comprehensive suite of features such as real-time monitoring, performance analytics, automated alerts, and historical data visualization. These tools are designed to provide IT teams with actionable insights into database health, query performance, and resource utilization, enabling proactive management of potential issues. The software segment is witnessing continuous innovation, with vendors introducing AI-driven capabilities and user-friendly dashboards to enhance the monitoring experience and accelerate troubleshooting.



    <br /&

  16. s

    United Utilities Domestic Drinking Water Quality 2023-2024

    • streamwaterdata.co.uk
    • hub.arcgis.com
    Updated Sep 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UnitedUtilities3 (2025). United Utilities Domestic Drinking Water Quality 2023-2024 [Dataset]. https://www.streamwaterdata.co.uk/items/da952fcae81b4c4aa82c384f14e50dbc
    Explore at:
    Dataset updated
    Sep 23, 2025
    Dataset authored and provided by
    UnitedUtilities3
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Data Origin:Samples were taken from customer taps. They were then analysed, and the results were uploaded to a database. This dataset is an extract from this database.Data Triage Considerations:Granularity:We decided to share as individual results at the lowest level of granularity.Anonymisation:It is a requirement that this data cannot be used to identify a singular person or household. We discussed many options for aggregating the data to a specific geography to ensure this requirement is met. The following geographical aggregations were discussed: Water Supply Zone (WSZ) - Limits interoperability with other datasets Postcode – Some postcodes contain very few households and may not offer necessary anonymisation Postal Sector – Deemed not granular enough in highly populated areas Rounded Co-ordinates – Not a recognised standard and may cause overlapping areas MSOA – Deemed not granular enough LSOA – Agreed as a recognised standard appropriate for England and Wales Data Zones – Agreed as a recognised standard appropriate for Scotland Data Specifications:Each dataset will cover a calendar year of samplesThis dataset will be published annuallyThe Determinands included in the dataset are as per the list that is required to be reported to the Drinking Water Inspectorate Context:Many UK water companies provide a search tool on their websites where you can search for water quality in your area by postcode. The results of the search may identify the water supply zone that supplies the postcode searched. Water supply zones are not linked to LSOAs which means the results may differ to this dataset. Some sample results are influenced by internal plumbing and may not be representative of drinking water quality in the wider area. Some samples are tested on site and others are sent to scientific laboratories.Prior to undertaking analysis on any new instruments or utilising new analytical techniques, the laboratory undertakes validation of the equipment to ensure it continues to meet the regulatory requirements. This means that the limit of quantification may change for the method either increasing or decreasing from the previous value. Any results below the limit of quantification will be reported as < with a number. For example, a limit of quantification change from <0.68 mg/l to <2.4 mg/l does not mean that there has been a deterioration in the quality of the water supplied. Data Publishing Frequency:AnnuallySupplementary information:Below is a curated selection of links for additional reading, which provide a deeper understanding of this dataset: Drinking Water Inspectorate Standards and Regulations Description for LSOA boundaries by the ONS: Census 2021 geographies - Office for National Statistics Postcode to LSOA lookup tables: Postcode to 2021 Census Output Area to Lower Layer Super Output Area to Middle Layer Super Output (February 2024)Legislation history: Legislation - Drinking Water InspectorateInformation about lead pipes: Lead pipes and lead in your water - United UtilitiesDataset Schema:SAMPLE_ID: Identity of the sampleSAMPLE_DATE: The date the sample was takenDETERMINAND: The determinand being measuredDWI_CODE: The corresponding DWI code for the determinandUNITS: The expression of resultsOPERATOR: The measurement operator for limit of detectionRESULT: The test resultsLSOA: Lower Super Output Area (population weighted centroids used by the Office for National Statistics (ONS) for geo-anonymisation)

  17. a

    Covid 19 urban areas Database May2020 Aug 2021

    • hub.arcgis.com
    • data.unhabitat.org
    • +1more
    Updated Aug 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UN-Habitat (2021). Covid 19 urban areas Database May2020 Aug 2021 [Dataset]. https://hub.arcgis.com/documents/b63e1433ba4f49bc8bd960a2160683f7
    Explore at:
    Dataset updated
    Aug 30, 2021
    Dataset authored and provided by
    UN-Habitat
    Description

    COVID 19 data at city/urban granularity compiled on a monthly basis since May 2020. Due to changes in reporting, there are variations in the number of cities in each monthly update.

  18. a

    Portsmouth Water Drinking Water Quality Data 2022 2023 2024

    • hub.arcgis.com
    • streamwaterdata.co.uk
    • +1more
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AHughes_Portsmouth (2025). Portsmouth Water Drinking Water Quality Data 2022 2023 2024 [Dataset]. https://hub.arcgis.com/datasets/d3165fd17d624b22a9900d47677dfa45
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    AHughes_Portsmouth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    Water companies in the UK are responsible for testing the quality of drinking water. This dataset contains the results of samples taken from the taps in domestic households to make sure they meet the standards set out by UK and European legislation. This data shows the location, date, and measured levels of determinands set out by the Drinking Water Inspectorate (DWI).

    Key Definitions

    Aggregation

    Process involving summarizing or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes

    Anonymisation

    Anonymised data is a type of information sanitization in which data anonymisation tools encrypt or remove personally identifiable information from datasets for the purpose of preserving a data subject's privacy

    Dataset

    Structured and organized collection of related elements, often stored digitally, used for analysis and interpretation in various fields.

    Determinand

    A constituent or property of drinking water which can be determined or estimated.

    DWI

    Drinking Water Inspectorate, an organisation “providing independent reassurance that water supplies in England and Wales are safe and drinking water quality is acceptable to consumers.”

    DWI Determinands

    Constituents or properties that are tested for when evaluating a sample for its quality as per the guidance of the DWI. For this dataset, only determinands with “point of compliance” as “customer taps” are included.

    Granularity

    Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours

    ID

    Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.

    LSOA

    Lower-Level Super Output Area is made up of small geographic areas used for statistical and administrative purposes by the Office for National Statistics. It is designed to have homogeneous populations in terms of population size, making them suitable for statistical analysis and reporting. Each LSOA is built from groups of contiguous Output Areas with an average of about 1,500 residents or 650 households allowing for granular data collection useful for analysis, planning and policy- making while ensuring privacy.

    ONS

    Office for National Statistics

    Open Data Triage

    The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data. <

    Sample

    A sample is a representative segment or portion of water taken from a larger whole for the purpose of analysing or testing to ensure compliance with safety and quality standards.

    Schema

    Structure for organizing and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.

    Units

    Standard measurements used to quantify and compare different physical quantities.

    Water Quality

    The chemical, physical, biological, and radiological characteristics of water, typically in relation to its suitability for a specific purpose, such as drinking, swimming, or ecological health. It is determined by assessing a variety of parameters, including but not limited to pH, turbidity, microbial content, dissolved oxygen, presence of substances and temperature.

    Data History

    Data Origin

    These samples were taken from customer taps. They were then analysed for water quality, and the results were uploaded to a database. This dataset is an extract from this database.

    Data Triage Considerations

    Granularity

    Is it useful to share results as averages or individual?

    We decided to share as individual results as the lowest level of granularity

    Anonymisation

    It is a requirement that this data cannot be used to identify a singular person or household. We discussed many options for aggregating the data to a specific geography to ensure this requirement is met. The following geographical aggregations were discussed:

    <!--·
    Water Supply Zone (WSZ) - Limits interoperability with other datasets

    <!--·
    Postcode – Some postcodes contain very few households and may not offer necessary anonymisation

    <!--·
    Postal Sector – Deemed not granular enough in highly populated areas

    <!--·
    Rounded Co-ordinates – Not a recognised standard and may cause overlapping areas

    <!--·
    MSOA – Deemed not granular enough

    <!--·
    LSOA – Agreed as a recognised standard appropriate for England and Wales

    <!--·
    Data Zones – Agreed as a recognised standard appropriate for Scotland

    Data Specifications

    Each dataset will cover a calendar year of samples

    This dataset will be published annually

    Historical datasets will be published as far back as 2016 from the introduction of of The Water Supply (Water Quality) Regulations 2016

    The Determinands included in the dataset are as per the list that is required to be reported to the Drinking Water Inspectorate.

    Context

    Many UK water companies provide a search tool on their websites where you can search for water quality in your area by postcode. The results of the search may identify the water supply zone that supplies the postcode searched. Water supply zones are not linked to LSOAs which means the results may differ to this dataset

    Some sample results are influenced by internal plumbing and may not be representative of drinking water quality in the wider area.

    Some samples are tested on site and others are sent to scientific laboratories.

    Data Publish Frequency

    Annually

    Data Triage Review Frequency

    Annually unless otherwise requested

    Supplementary information

    Below is a curated selection of links for additional reading, which provide a deeper understanding of this dataset.

    <!--1.
    Drinking Water Inspectorate Standards and Regulations:

    <!--2.
    https://www.dwi.gov.uk/drinking-water-standards-and-regulations/

    <!--3.
    LSOA (England and Wales) and Data Zone (Scotland):

    <!--4. https://www.nrscotland.gov.uk/files/geography/2011-census/geography-bckground-info-comparison-of-thresholds.pdf

    <!--5.
    Description for LSOA boundaries by the ONS: Census 2021 geographies - Office for National Statistics (ons.gov.uk)

    <!--[6.
    Postcode to LSOA lookup tables: Postcode to 2021 Census Output Area to Lower Layer Super Output Area to Middle Layer Super Output Area to Local Authority District (August 2023) Lookup in the UK (statistics.gov.uk)

    <!--7.
    Legislation history: Legislation - Drinking Water Inspectorate (dwi.gov.uk)

  19. Z

    Public Utility Data Liberation Project (PUDL) Data Release

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella (2025). Public Utility Data Liberation Project (PUDL) Data Release [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653158
    Explore at:
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Catalyst Cooperative
    Authors
    Selvans, Zane A.; Gosnell, Christina M.; Sharpe, Austen; Norman, Bennett; Schira, Zach; Lamb, Katherine; Xia, Dazhong; Belfer, Ella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PUDL v2025.2.0 Data Release

    This is our regular quarterly release for 2025Q1. It includes updates to all the datasets that are published with quarterly or higher frequency, plus initial verisons of a few new data sources that have been in the works for a while.

    One major change this quarter is that we are now publishing all processed PUDL data as Apache Parquet files, alongside our existing SQLite databases. See Data Access for more on how to access these outputs.

    Some potentially breaking changes to be aware of:

    In the EIA Form 930 – Hourly and Daily Balancing Authority Operations Report a number of new energy sources have been added, and some old energy sources have been split into more granular categories. See Changes in energy source granularity over time.

    We are now running the EPA’s CAMD to EIA unit crosswalk code for each individual year starting from 2018, rather than just 2018 and 2021, resulting in more connections between these two datasets and changes to some sub-plant IDs. See the note below for more details.

    Many thanks to the organizations who make these regular updates possible! Especially GridLab, RMI, and the ZERO Lab at Princeton University. If you rely on PUDL and would like to help ensure that the data keeps flowing, please consider joining them as a PUDL Sustainer, as we are still fundraising for 2025.

    New Data

    EIA 176

    Add a couple of semi-transformed interim EIA-176 (natural gas sources and dispositions) tables. They aren’t yet being written to the database, but are one step closer. See #3555 and PRs #3590, #3978. Thanks to @davidmudrauskas for moving this dataset forward.

    Extracted these interim tables up through the latest 2023 data release. See #4002 and #4004.

    EIA 860

    Added EIA 860 Multifuel table. See #3438 and #3946.

    FERC 1

    Added three new output tables containing granular utility accounting data. See #4057, #3642 and the table descriptions in the data dictionary:

    out_ferc1_yearly_detailed_income_statements

    out_ferc1_yearly_detailed_balance_sheet_assets

    out_ferc1_yearly_detailed_balance_sheet_liabilities

    SEC Form 10-K Parent-Subsidiary Ownership

    We have added some new tables describing the parent-subsidiary company ownership relationships reported in the SEC’s Form 10-K, Exhibit 21 “Subsidiaries of the Registrant”. Where possible these tables link the SEC filers or their subsidiary companies to the corresponding EIA utilities. This work was funded by a grant from the Mozilla Foundation. Most of the ML models and data preparation took place in the mozilla-sec-eia repository separate from the main PUDL ETL, as it requires processing hundreds of thousands of PDFs and the deployment of some ML experiment tracking infrastructure. The new tables are handed off as nearly finished products to the PUDL ETL pipeline. Note that these are preliminary, experimental data products and are known to be incomplete and to contain errors. Extracting data tables from unstructured PDFs and the SEC to EIA record linkage are necessarily probabalistic processes.

    See PRs #4026, #4031, #4035, #4046, #4048, #4050 and check out the table descriptions in the PUDL data dictionary:

    out_sec10k_parents_and_subsidiaries

    core_sec10k_quarterly_filings

    core_sec10k_quarterly_exhibit_21_company_ownership

    core_sec10k_quarterly_company_information

    Expanded Data Coverage

    EPA CEMS

    Added 2024 Q4 of CEMS data. See #4041 and #4052.

    EPA CAMD EIA Crosswalk

    In the past, the crosswalk in PUDL has used the EPA’s published crosswalk (run with 2018 data), and an additional crosswalk we ran with 2021 EIA 860 data. To ensure that the crosswalk reflects updates in both EIA and EPA data, we re-ran the EPA R code which generates the EPA CAMD EIA crosswalk with 4 new years of data: 2019, 2020, 2022 and 2023. Re-running the crosswalk pulls the latest data from the CAMD FACT API, which results in some changes to the generator and unit IDs reported on the EPA side of the crosswalk, which feeds into the creation of core_epa_assn_eia_epacamd.

    The changes only result in the addition of new units and generators in the EPA data, with no changes to matches at the plant level. However, the updates to generator and unit IDs have resulted in changes to the subplant IDs - some EIA boilers and generators which previously had no matches to EPA data have now been matched to EPA unit data, resulting in an overall reduction in the number of rows in the core_epa_assn_eia_epacamd_subplant_ids table. See issues #4039 and PR #4056 for a discussion of the changes observed in the course of this update.

    EIA 860M

    Added EIA 860m through December 2024. See #4038 and #4047.

    EIA 923

    Added EIA 923 monthly data through September 2024. See #4038 and #4047.

    EIA Bulk Electricity Data

    Updated the EIA Bulk Electricity data to include data published up through 2024-11-01. See #4042 and PR #4051.

    EIA 930

    Updated the EIA 930 data to include data published up through the beginning of February 2025. See #4040 and PR #4054. 10 new energy sources were added and 3 were retired; see Changes in energy source granularity over time for more information.

    Bug Fixes

    Fix an accidentally swapped set of starting balance / ending balance column rename parameters in the pre-2021 DBF derived data that feeds into core_ferc1_yearly_other_regulatory_liabilities_sched278. See issue #3952 and PRs #3969, #3979. Thanks to @yolandazzz13 for making this fix.

    Added preliminary data validation checks for several FERC 1 tables that were missing it #3860.

    Fix spelling of Lake Huron and Lake Saint Clair in out_vcerare_hourly_available_capacity_factor and related tables. See issue #4007 and PR #4029.

    Quality of Life Improvements

    We added a sources parameter to pudl.metadata.classes.DataSource.from_id() in order to make it possible to use the pudl-archiver repository to archive datasets that won’t necessarily be ingested into PUDL. See this PUDL archiver issue and PRs #4003 and #4013.

    Other PUDL v2025.2.0 Resources

    PUDL v2025.2.0 Data Dictionary

    PUDL v2025.2.0 Documentation

    PUDL in the AWS Open Data Registry

    PUDL v2025.2.0 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2025.2.0/

    PUDL v2025.2.0 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2025.2.0/

    Zenodo archive of the PUDL GitHub repo for this release

    PUDL v2025.2.0 release on GitHub

    PUDL v2025.2.0 package in the Python Package Index (PyPI)

    Contact Us

    If you're using PUDL, we would love to hear from you! Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:

    Follow us on GitHub

    Use the PUDL Github issue tracker to let us know about any bugs or data issues you encounter

    GitHub Discussions is where we provide user support.

    Watch our GitHub Project to see what we're working on.

    Email us at hello@catalyst.coop for private communications.

    On Mastodon: @CatalystCoop@mastodon.energy

    On BlueSky: @catalyst.coop

    On Twitter: @CatalystCoop

    Connect with us on LinkedIn

    Play with our data and notebooks on Kaggle

    Combine our data with ML models on HuggingFace

    Learn more about us on our website: https://catalyst.coop

    Subscribe to our announcements list for email updates.

  20. National Domestic Violence Hotline Advocate Caller Application Database

    • healthdata.gov
    • data.virginia.gov
    csv, xlsx, xml
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). National Domestic Violence Hotline Advocate Caller Application Database [Dataset]. https://healthdata.gov/ACF/National-Domestic-Violence-Hotline-Advocate-Caller/3pjt-r4wi
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Nov 17, 2023
    Description

    The Advocate Caller Application database includes information about each contact to the National Domestic Violence Hotline (The Hotline) or loveisrespect (LIR) helpline, made by telephone, chat, text, e-mail, or social media. This information is entered into the database manually by advocates at the time of contact. It is primarily used for service provision and operational purposes. It does not include any PII.

    The Advocate Caller Application database includes demographic information about the person who called, chatted, texted, etc., and his/her situation (e.g., type of abuse), and information about what happened during the call, chat, or text (e.g., topics discussed, services provided, etc.). It also includes information about caller needs and reported barriers to receiving services.

    Units of Response: Abuse Victims

    Type of Data: Administrative

    Tribal Data: Unavailable

    COVID-19 Data: Unavailable

    Periodicity: Unavailable

    SORN: https://www.federalregister.gov/documents/2015/04/02/2015-07440/privacy-act-of-1974-system-of-records-notice

    Data Use Agreement: https://www.icpsr.umich.edu/rpxlogin

    Data Use Agreement Location: Unavailable

    Equity Indicators: Unavailable

    Granularity: Individual

    Spatial: Unavailable

    Geocoding: Unavailable

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jürgen Lerner; Jürgen Lerner (2020). Wikipedia Category Granularity (WikiGrain) data [Dataset]. http://doi.org/10.5281/zenodo.1005175
Organization logo

Data from: Wikipedia Category Granularity (WikiGrain) data

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
txt, csvAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jürgen Lerner; Jürgen Lerner
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The "Wikipedia Category Granularity (WikiGrain)" data consists of three files that contain information about articles of the English-language version of Wikipedia (https://en.wikipedia.org).

The data has been generated from the database dump dated 20 October 2016 provided by the Wikimedia foundation licensed under the GNU Free Documentation License (GFDL) and the Creative Commons Attribution-Share-Alike 3.0 License.

WikiGrain provides information on all 5,006,601 Wikipedia articles (that is, pages in Namespace 0 that are not redirects) that are assigned to at least one category.

The WikiGrain Data is analyzed in the paper

Jürgen Lerner and Alessandro Lomi: Knowledge categorization affects popularity and quality of Wikipedia articles. PLoS ONE, 13(1):e0190674, 2018.

===============================================================
Individual files (tables in comma-separated-values-format):

---------------------------------------------------------------
* article_info.csv contains the following variables:

- "id"
(integer) Unique identifier for articles; identical with the page_id in the Wikipedia database.

- "granularity"
(decimal) The granularity of an article A is defined to be the average (mean) granularity of the categories of A, where the granularity of a category C is the shortest path distance in the parent-child subcategory network from the root category (Category:Articles) to C. Higher granularity values indicate articles whose topics are less general, narrower, more specific.

- "is.FA"
(boolean) True ('1') if the article is a featured article; false ('0') else.

- "is.FA.or.GA"
(boolean) True ('1') if the article is a featured article or a good article; false ('0') else.

- "is.top.importance"
(boolean) True ('1') if the article is listed as a top importance article by at least one WikiProject; false ('0') else.

- "number.of.revisions"
(integer) Number of times a new version of the article has been uploaded.


---------------------------------------------------------------
* article_to_tlc.csv
is a list of links from articles to the closest top-level categories (TLC) they are contained in. We say that an article A is a member of a TLC C if A is in a category that is a descendant of C and the distance from C to A (measured by the number of parent-child category links) is minimal over all TLC. An article can thus be member of several TLC.
The file contains the following variables:

- "id"
(integer) Unique identifier for articles; identical with the page_id in the Wikipedia database.

- "id.of.tlc"
(integer) Unique identifier for TLC in which the article is contained; identical with the page_id in the Wikipedia database.

- "title.of.tlc"
(string) Title of the TLC in which the article is contained.

---------------------------------------------------------------
* article_info_normalized.csv
contains more variables associated with articles than article_info.csv. All variables, except "id" and "is.FA" are normalized to standard deviation equal to one. Variables whose name has prefix "log1p." have been transformed by the mapping x --> log(1+x) to make distributions that are skewed to the right 'more normal'.
The file contains the following variables:

- "id"
Article id.

- "is.FA"
Boolean indicator for whether the article is featured.

- "log1p.length"
Length measured by the number of bytes.

- "age"
Age measured by the time since the first edit.

- "log1p.number.of.edits"
Number of times a new version of the article has been uploaded.

- "log1p.number.of.reverts"
Number of times a revision has been reverted to a previous one.

- "log1p.number.of.contributors"
Number of unique contributors to the article.

- "number.of.characters.per.word"
Average number of characters per word (one component of 'reading complexity').

- "number.of.words.per.sentence"
Average number of words per sentence (second component of 'reading complexity').

- "number.of.level.1.sections"
Number of first level sections in the article.

- "number.of.level.2.sections"
Number of second level sections in the article.

- "number.of.categories"
Number of categories the article is in.

- "log1p.average.size.of.categories"
Average size of the categories the article is in.

- "log1p.number.of.intra.wiki.links"
Number of links to pages in the English-language version of Wikipedia.

- "log1p.number.of.external.references"
Number of external references given in the article.

- "log1p.number.of.images"
Number of images in the article.

- "log1p.number.of.templates"
Number of templates that the article uses.

- "log1p.number.of.inter.language.links"
Number of links to articles in different language edition of Wikipedia.

- "granularity"
As in article_info.csv (but normalized to standard deviation one).

Search
Clear search
Close search
Google apps
Main menu