20 datasets found
  1. Sales Dataset (Raw v Clean)#fic_PAK_region_data

    • kaggle.com
    zip
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fahad Rehman (2025). Sales Dataset (Raw v Clean)#fic_PAK_region_data [Dataset]. https://www.kaggle.com/datasets/fahad0rehman/sales-dataset-raw-v-cleanfic-pak-region-data
    Explore at:
    zip(98904 bytes)Available download formats
    Dataset updated
    Aug 21, 2025
    Authors
    Fahad Rehman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Fahad Rehman

    Released under Attribution 4.0 International (CC BY 4.0)

    Contents

  2. StackOverflow questions filtered 2009 - 2020

    • kaggle.com
    zip
    Updated Jun 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael FUMERY (2021). StackOverflow questions filtered 2009 - 2020 [Dataset]. https://www.kaggle.com/michaelfumery/stackoverflow-questions-filtered-2011-2021
    Explore at:
    zip(72786401 bytes)Available download formats
    Dataset updated
    Jun 25, 2021
    Authors
    Michael FUMERY
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    Dataset from the famous Stack Overflow site, exported thanks to Stack Exchange. These data are used within the framework of the processing of textual data to create a program of automatic generation of tags for the questions asked.

    Content

    This set of 13 CSV files includes the following variables:

    • Id: Unique identifier of the post
    • CreationDate: Creation date of the post
    • Title: Post title
    • Body: Complete question in HTML format
    • Tags: The tags used by users for the question
    • ViewCount: Number of views
    • CommentCount: Number of comments
    • AnswerCount: Number of answers
    • Score: Upvote score of the post.

    The data was extracted using the following SQL query: ```SQL DECLARE @start_date DATE DECLARE @end_date DATE SET @start_date = '2011-01-01' SET @end_date = DATEADD(m , 12 , @start_date)

    SELECT p.Id, p.CreationDate, p.Title, p.Body, p.Tags, p.ViewCount, p.CommentCount, p.AnswerCount, p.Score FROM Posts as p LEFT JOIN PostTypes as t ON p.PostTypeId = t.id WHERE p.CreationDate between @start_date and @end_date AND t.Name = 'Question' AND p.ViewCount > 20 AND p.CommentCount > 5 AND p.AnswerCount > 1 AND p.Score > 5 AND len(p.Tags) > 0 ```

    Inspiration

    Data cleaning on textual data, automatic tag generator, NLP ...

  3. G

    Data Clean Rooms Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Clean Rooms Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-clean-rooms-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Clean Rooms Market Outlook



    According to our latest research, the global Data Clean Rooms market size reached USD 1.62 billion in 2024, reflecting robust adoption across multiple sectors. The market is poised for significant expansion, forecasted to reach USD 8.73 billion by 2033, growing at a remarkable CAGR of 20.6% from 2025 to 2033. This impressive growth trajectory is primarily driven by the increasing demand for privacy-compliant data collaboration, stringent regulatory requirements, and the proliferation of data-driven marketing strategies worldwide.




    The surge in demand for Data Clean Rooms is closely linked to the evolving landscape of data privacy regulations such as GDPR, CCPA, and other global standards. As organizations strive to harness the power of consumer data while remaining compliant, data clean rooms have emerged as a vital solution. These platforms enable secure, privacy-centric data collaboration between multiple parties without exposing raw, personally identifiable information. The rising adoption of advanced analytics and AI-driven marketing, especially in sectors like advertising, healthcare, and financial services, is further catalyzing the need for sophisticated data clean room solutions. The rapid shift to digital platforms and the phasing out of third-party cookies are also compelling enterprises to seek new ways to unlock value from their data assets without compromising consumer trust or regulatory compliance.




    Another crucial growth driver for the Data Clean Rooms market is the increasing complexity and volume of data being generated across industries. As organizations collect vast amounts of first-party and second-party data, the need for secure environments to analyze and share this data with partners is becoming paramount. Data clean rooms offer a secure infrastructure that supports advanced analytics, machine learning, and audience segmentation while maintaining strict data governance. The proliferation of cloud-based data ecosystems and the integration of data clean rooms with existing martech and adtech stacks are also accelerating market adoption, enabling organizations to derive actionable insights without breaching privacy norms.




    The growing emphasis on collaborative data partnerships is further fueling the expansion of the Data Clean Rooms market. Enterprises are increasingly recognizing the value of combining their data with that of trusted partners to enhance customer intelligence, optimize campaigns, and drive innovation. Data clean rooms facilitate these partnerships by providing a neutral, privacy-preserving environment for joint data analysis. This trend is particularly pronounced in sectors like retail, BFSI, and media, where the ability to share insights without exposing sensitive information is critical to maintaining competitive advantage and customer trust. The ongoing advancements in encryption, federated learning, and privacy-enhancing technologies are also making data clean rooms more scalable, secure, and accessible to organizations of all sizes.




    Regionally, North America continues to dominate the Data Clean Rooms market due to its mature digital ecosystem, stringent privacy regulations, and early adoption of advanced data management solutions. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, expanding e-commerce, and increasing regulatory focus on data privacy. Europe also holds a significant share, propelled by the strict enforcement of GDPR and the widespread adoption of privacy-centric technologies. Latin America and the Middle East & Africa are gradually catching up, with growing investments in digital infrastructure and rising awareness of data privacy risks. The global outlook remains highly positive, with all regions expected to witness substantial growth through 2033.





    Component Analysis



    The Data Clean Rooms market is segmented by component into Software and Services, each playing a pivotal role in the overall ecosystem. The s

  4. R

    Data Clean Rooms for Travel Brands Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). Data Clean Rooms for Travel Brands Market Research Report 2033 [Dataset]. https://researchintelo.com/report/data-clean-rooms-for-travel-brands-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    Data Clean Rooms for Travel Brands Market Outlook



    According to our latest research, the Global Data Clean Rooms for Travel Brands market size was valued at $1.2 billion in 2024 and is projected to reach $6.8 billion by 2033, expanding at a robust CAGR of 21.7% during 2024–2033. One of the primary growth drivers for this market is the increasing need for privacy-compliant data sharing and collaboration in the travel industry. As travel brands face rising data privacy regulations and restrictions on third-party cookies, data clean rooms have emerged as a vital technology to enable secure, privacy-first analytics and marketing collaboration. This, combined with the growing adoption of advanced analytics and personalization strategies by airlines, hotels, online travel agencies, and car rental services, is fueling significant market expansion globally.



    Regional Outlook



    North America currently holds the largest share in the Data Clean Rooms for Travel Brands market, accounting for approximately 38% of the global market value in 2024. This dominance is attributed to the region's mature digital infrastructure, early adoption of privacy-enhancing technologies, and the presence of leading travel brands and technology vendors. The United States, in particular, exhibits strong demand due to stringent regulatory frameworks such as CCPA and evolving consumer expectations around data privacy. Furthermore, North American travel brands are actively investing in advanced analytics and customer data platforms, further accelerating the integration of data clean rooms into their technology stacks. The region's robust ecosystem of tech startups, established software providers, and strategic alliances between travel and data analytics companies continues to reinforce its leadership position.



    The Asia Pacific region is expected to register the fastest growth, with a projected CAGR exceeding 25% through 2033. This rapid expansion is driven by the surge in digital transformation initiatives across emerging economies, particularly in China, India, and Southeast Asia. Travel brands in this region are increasingly leveraging cloud-based data clean rooms to capitalize on the booming online travel market and the proliferation of mobile-first consumers. Investments in smart tourism, government-backed digitalization programs, and a burgeoning middle class with rising travel aspirations are further propelling demand. Additionally, the region’s technology landscape is witnessing significant inflows of venture capital and strategic partnerships, enabling the development and deployment of innovative data collaboration solutions tailored to local market needs.



    In Latin America, the Middle East, and Africa, adoption of data clean rooms for travel brands is still at a nascent stage but is poised for steady growth. These emerging economies face unique challenges including limited digital infrastructure, evolving regulatory environments, and lower awareness among travel operators. However, as global travel brands expand their footprint and local players seek to enhance customer engagement and operational efficiency, there is increasing interest in secure data collaboration platforms. Policymakers in these regions are also beginning to introduce data privacy regulations, which, while posing initial compliance hurdles, are expected to drive long-term adoption of data clean rooms. Localization of solutions, capacity building, and public-private partnerships will be critical in overcoming barriers and unlocking the market’s full potential.



    Report Scope





    <

    Attributes Details
    Report Title Data Clean Rooms for Travel Brands Market Research Report 2033
    By Component Software, Services
    By Deployment Mode On-Premises, Cloud
    By Application Customer Insights & Analytics, Marketing & Advertising, Personalization, Data Collaboration, Others
  5. D

    Data Collaboration Clean Room Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data Collaboration Clean Room Market Research Report 2033 [Dataset]. https://dataintelo.com/report/data-collaboration-clean-room-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Collaboration Clean Room Market Outlook



    According to our latest research, the global Data Collaboration Clean Room market size reached USD 1.24 billion in 2024, driven by the increasing emphasis on privacy-centric data sharing and analytics across industries. The market is poised to expand at a robust CAGR of 18.7% from 2025 to 2033, reaching a projected value of USD 6.10 billion by 2033. This remarkable growth is attributed to the surging demand for secure and compliant environments that enable organizations to collaborate on sensitive data without compromising privacy or regulatory mandates.




    One of the primary growth factors fueling the Data Collaboration Clean Room market is the intensifying regulatory landscape surrounding data privacy and protection. With the enforcement of strict regulations like GDPR in Europe, CCPA in California, and similar frameworks worldwide, organizations are under mounting pressure to implement solutions that facilitate compliant data collaboration. Clean rooms offer a secure, controlled environment where multiple parties can analyze and share datasets without exposing raw data, thus mitigating legal and compliance risks. This capability is especially critical for industries such as healthcare, finance, and advertising, where data sensitivity and privacy are paramount. The growing frequency of high-profile data breaches has further accelerated the adoption of data clean room technologies, as enterprises seek to safeguard customer trust and avoid punitive fines.




    Another significant driver is the exponential growth of data-driven marketing and advertising practices. As third-party cookies phase out and user tracking becomes increasingly restricted, advertisers and marketers are turning to data clean rooms to enable privacy-compliant audience insights, measurement, and attribution. These secure environments allow brands, publishers, and technology partners to collaborate on campaign performance data while ensuring that individual user identities remain protected. The rise of programmatic advertising, coupled with the need for advanced analytics in a privacy-first world, is pushing agencies and enterprises to invest in sophisticated data collaboration solutions. This trend is expected to intensify as consumer expectations for data privacy continue to rise, making clean rooms a foundational component of the modern marketing tech stack.




    Technological advancements are also playing a pivotal role in the expansion of the Data Collaboration Clean Room market. Innovations in cryptographic techniques, such as secure multi-party computation and differential privacy, are enabling more robust and scalable clean room architectures. Cloud-based deployment models have made these solutions more accessible, reducing the barriers to entry for small and medium enterprises while offering enhanced flexibility and lower operational costs. Additionally, the integration of artificial intelligence and machine learning capabilities within clean rooms is unlocking new possibilities for advanced analytics and predictive modeling, further enhancing the value proposition for end-users across various sectors.




    Regionally, North America continues to dominate the market, accounting for the largest share in 2024, thanks to its mature data ecosystem, early adoption of privacy technologies, and a strong presence of leading clean room solution providers. Europe follows closely, driven by stringent data protection regulations and a rapidly evolving digital economy. Meanwhile, the Asia Pacific region is witnessing the fastest growth, fueled by digital transformation initiatives, a burgeoning e-commerce sector, and increasing regulatory focus on data privacy. Latin America and the Middle East & Africa are also showing promising potential, albeit from a smaller base, as organizations in these regions begin to recognize the strategic importance of secure data collaboration.



    Component Analysis



    The Component segment of the Data Collaboration Clean Room market is bifurcated into Software and Services, each playing a distinct role in the overall ecosystem. Software solutions form the backbone of data clean rooms, providing the infrastructure and tools necessary for secure data sharing, privacy-preserving analytics, and compliance management. These platforms are designed to handle large volumes of sensitive data, offering advanced features such as access controls, encryption, aud

  6. G

    PEM Electrolyzer Stack Health Analytics Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). PEM Electrolyzer Stack Health Analytics Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/pem-electrolyzer-stack-health-analytics-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 7, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    PEM Electrolyzer Stack Health Analytics Market Outlook



    According to our latest research, the global PEM Electrolyzer Stack Health Analytics market size reached USD 412 million in 2024, driven by the accelerating adoption of hydrogen as a clean energy carrier and the growing demand for advanced asset management solutions in the hydrogen production ecosystem. The market is projected to grow at a robust CAGR of 17.8% from 2025 to 2033, reaching a forecasted value of USD 1,434 million by 2033. This remarkable growth trajectory is fueled by the increasing need to optimize operational efficiency, ensure stack reliability, and minimize downtime across various end-user industries leveraging PEM electrolyzer technologies.




    The rapid expansion of the PEM Electrolyzer Stack Health Analytics market can be attributed to several key growth factors. Firstly, the global shift toward decarbonization and the rising prominence of green hydrogen production have placed PEM electrolyzers at the forefront of clean energy strategies. As governments and industries intensify their focus on sustainable energy solutions, the operational efficiency and longevity of PEM electrolyzer stacks become paramount. Health analytics platforms, by delivering real-time monitoring and predictive insights, empower operators to maximize stack uptime, reduce maintenance costs, and extend asset lifecycles. This capability is critical for industries such as power generation, transportation, and chemicals, where uninterrupted hydrogen supply and cost-effectiveness are essential for competitiveness and compliance with stringent environmental regulations.




    Secondly, technological advancements in sensors, Industrial Internet of Things (IIoT), and artificial intelligence have revolutionized the landscape of health analytics for PEM electrolyzer stacks. The integration of high-fidelity sensors and advanced data acquisition systems enables granular monitoring of stack parameters such as voltage, temperature, pressure, and humidity. Coupled with machine learning algorithms, these platforms can detect early signs of degradation, predict component failures, and recommend proactive maintenance actions. Such data-driven approaches not only enhance the reliability of PEM electrolyzer operations but also facilitate data-centric decision-making for asset managers. The increasing availability of cloud-based analytics solutions further democratizes access to sophisticated health analytics, allowing even small and medium enterprises to leverage these technologies without significant upfront investment.




    Another significant driver is the growing emphasis on lifecycle assessment and total cost of ownership optimization in hydrogen infrastructure projects. As capital and operational expenditures remain a concern for stakeholders, the ability to monitor stack health, forecast maintenance needs, and optimize replacement schedules is gaining strategic importance. Health analytics platforms offer comprehensive lifecycle insights, enabling asset owners to plan maintenance budgets, minimize unplanned outages, and ensure regulatory compliance. These benefits are particularly pronounced in large-scale industrial and utility deployments, where the scale and complexity of PEM electrolyzer installations necessitate sophisticated asset management strategies. The convergence of digitalization and sustainability imperatives is thus creating fertile ground for the widespread adoption of PEM Electrolyzer Stack Health Analytics solutions.




    From a regional perspective, Europe currently leads the PEM Electrolyzer Stack Health Analytics market, accounting for over 38% of global revenue in 2024. This dominance is underpinned by robust policy support for hydrogen, ambitious decarbonization targets, and significant investments in green hydrogen infrastructure across countries such as Germany, France, and the Netherlands. North America follows closely, with increasing adoption in the United States and Canada driven by government incentives and the presence of major technology providers. The Asia Pacific region is emerging as a fast-growing market, propelled by large-scale hydrogen projects in China, Japan, and South Korea. As these regions continue to invest in hydrogen ecosystems, the demand for advanced stack health analytics will rise in tandem, shaping the competitive landscape and innovation trajectory of the market.



    <a href="https://growthmarketreports

  7. Moroccan Bank Reviews from Google Maps

    • kaggle.com
    zip
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelfatah MENNOUN (2025). Moroccan Bank Reviews from Google Maps [Dataset]. https://www.kaggle.com/datasets/m3nnoun/moroccan-bank-reviews-from-google-maps
    Explore at:
    zip(1450924 bytes)Available download formats
    Dataset updated
    Mar 13, 2025
    Authors
    Abdelfatah MENNOUN
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Unlock insights into Moroccan banking customer experiences! 🇲🇦

    This dataset contains scraped and cleaned Google Maps reviews for banks across all cities in Morocco. Collected as part of a collaborative student/freelancer project, it’s perfect for sentiment analysis, market research, or academic projects.

    What’s Inside?

    • 2 Versions:
      • Raw Data: As scraped from Google Maps.
      • Cleaned Data: Filtered to exclude non-bank businesses (e.g., cash services, unrelated entries).
    • Columns:
      City, Business Name, Address, Phone Number, Website, Google Map ID, Review Text, Timestamp, Stars.
    • Cities Sourced from data.gov.ma: Ensured comprehensive coverage of Moroccan regions.

    Methodology:

    1. City Identification: Used official data from data.gov.ma to target cities with banks.
    2. Search Strategy: Queried “bank in [city name]” on Google Maps to compile business links.
    3. Scraping: Extracted business details (name, address, etc.) and latest reviews using Python + Playwright (automation) and BeautifulSoup (parsing).
    4. Cleaning: Removed duplicates and non-bank entries for accuracy.

    Potential Use Cases:

    • 📈 Sentiment Analysis: Analyze customer satisfaction trends.
    • 🗺️ Geospatial Visualization: Map bank ratings by city/region.
    • 🔍 Competitor Analysis: Compare bank reputations.
    • 🎓 Academic Projects: Practice NLP, data cleaning, or visualization.

    Tech Stack:

    • Python 🐍
    • Playwright (for browser automation)
    • BeautifulSoup (HTML parsing)
    • Pandas (data cleaning)

    Why This Dataset?

    • First-of-its-kind: Focused on Moroccan banks.
    • Ready-to-use: Cleaned version requires minimal preprocessing.
    • Transparent: Raw data included for reproducibility.

    License: CC0: Public Domain (Free to use, modify, and share).

  8. d

    Company Data | 249 Countries Coverage | 270m+ Companies

    • datarade.ai
    .json, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rhetorik, Company Data | 249 Countries Coverage | 270m+ Companies [Dataset]. https://datarade.ai/data-products/neuron360-companies-from-rhetorik-rhetorik
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    Rhetorik
    Area covered
    United States
    Description

    Rhetorik, a Lightcast Company provides the largest known database of globally compliant profiles of professionals, companies, and contacts—continuously updated to ensure accuracy. Powered by AI, the platform supports a wide range of use cases, including sales, marketing, recruiting, and investment banking, with flexible delivery options through platforms, APIs, and services.

    Rhetorik360 is the ultimate tool to both segment your market and target, and find your ideal customer. Access detailed Firmographics and Technographics for more than 200 million+ companies globally to power your sales and marketing efforts and increase your business revenues.

    Available for lead lists, data enrichment, account and contact data hygiene and validation, company technographics, leads, ABM, recruiting and other uses. One time and annual use licensing available.

    Use the Rhetorik360 Company DB with its linked sister database, the Neuron 360 Global B2B Professionals Profiles Database to get the best global coverage of Companies, Offices and Professionals.

    230 Million Companies 800 Million Professional Profiles 109 Company Attributes 192 Professional Profile Attributes

    This is a new to market, uniquely sourced data set using the power of Rhetorik's proprietary AI. We amalgamate billions of data points from scores of sources to create a world class BTB Company and Contact data asset.

    Company Profile Information: Micro-target and reach your ideal customer faster by gaining access to your complete company profile. Our global company profile data feed is always clean, accurate, up to date and compliant. Target by Technographics, Firmographics and much more.

    Technographics: Our extensive technographics data sets allow you to understand the tech stack of your prospects and their interest in your products. Look forward to enhanced insight to power your company’s organizational and segmentation efforts, improve your qualification process, increase the effectiveness of your account base marketing, and shorten your sales cycles. Rhetorik's technology data organizes installed enterprise technologies across all major hardware and software product categories, allowing easy searching and filtering on buyers’ technology assets. We track:

    26 Million+ technology installs 20,000+ technology products 7,900+ technology vendors 180+ technology categories

    Firmographics: Our Firmographics will help you to more efficiently and effectively segment your company through comprehensive data-analysis of your target markets! Determining if a business is the right fit for your company has never been easier. Experience the power of targeted messaging which can be adapted to each and every target audience; taking into account business size, budget, and much more!

    Access it where and when you need it. Rhetorik360-Profiles is available via APIs, Snowflake Marketplace, or bulk delivery in JSON and CSV formats and supports a wide range of use cases. Data is refreshed weekly, so you can be sure your information is always up to date!

    North America: 55M+ Companies EMEA: 70M+ Companies APAC: 45M+ Companies LATAM: 30M+ Companies

  9. o

    Uniform Crime Reporting Program Data: Law Enforcement Officers Killed and...

    • openicpsr.org
    Updated Mar 25, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Uniform Crime Reporting Program Data: Law Enforcement Officers Killed and Assaulted (LEOKA) 1975-2016 [Dataset]. http://doi.org/10.3886/E102180V4
    Explore at:
    Dataset updated
    Mar 25, 2018
    Dataset provided by
    University of Pennsylvania
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1975 - 2015
    Area covered
    United States
    Description

    Version 4 release notes: Add data for 2016.Order rows by year (descending) and ORI.Version 3 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The LEOKA data sets contain highly detailed data about the number of officers/civilians employed by an agency and how many officers were killed or assaulted. Each data set contains over 2,200 columns and has a wealth of information about the circumstances of assaults on officers. All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. It was then cleaned in R. The "cleaning" just means that column names were standardized (different years have slightly different spellings for many columns). Standardization of column names is necessary to stack multiple years together. Categorical variables (e.g. state) were also standardized (i.e. fix spelling errors). About 7% of all agencies in the data report more officers or civilians than population. As such, I removed the officers/civilians per 1,000 population variables. You should exercise caution if deciding to generate and use these variables yourself. I did not make any changes to the numeric columns except for the following. A few years of data had the values "blank" or "missing" as indicators of missing values. Rows in otherwise numeric columns (e.g. jan_asslt_no_injury_knife) with these values were replaced with NA. There were three obvious data entry errors in officers killed by felony/accident that I changed to NA. In 1978 the agency "pittsburgh" (ORI = PAPPD00) reported 576 officers killed by accident during March.In 1979 the agency "metuchen" (ORI = NJ01210) reported 991 officers killed by felony during August.In 1990 the agency "penobscot state police" (ORI = ME010SP) reported 860 officers killed by accident during July.No other changes to numeric columns were made.Each zip file contains all years as individual monthly files of the specified data type It also includes a file with all years aggregated yearly and stacked into a single data set. Please note that each monthly file is quite large (2,200+ columns) so it may take time to download the zip file and open each data file.For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data.The UCR Handbook (https://ucr.fbi.gov/additional-ucr-publications/ucr_handbook.pdf/view) describes the LEOKA data as follows:"The UCR Program collects data from all contributing agencies ... on officer line-of-duty deaths and assaults. Reporting agencies must submit data on ... their own duly sworn officers feloniously or accidentally killed or assaulted in the line of duty. The purpose of this data collection is to identify situations in which officers are killed or assaulted, describe the incidents statistically, and publish the data to aid agencies in developing policies to improve officer safety."... agencies must record assaults on sworn officers. Reporting agencies must count all assaults that resulted in serious injury or assaults in which a weapon was used that could have caused serious injury or death. They must include other assaults not causing injury if the assault involved more than mere verbal abuse or minor resistance to an arrest. In other words, agencies must include in this section all assaults on officers, whether or not the officers sustained injuries."If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com

  10. M

    Global Air-cooled Fuel Cell Stack Market Key Players and Market Share...

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Air-cooled Fuel Cell Stack Market Key Players and Market Share 2025-2032 [Dataset]. https://www.statsndata.org/report/air-cooled-fuel-cell-stack-market-229255
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Air-cooled Fuel Cell Stack market is rapidly evolving as businesses and governments seek sustainable energy solutions to combat climate change and reduce dependence on fossil fuels. These fuel cell stacks provide a pivotal clean energy solution by converting chemical energy directly into electrical energy, utili

  11. n

    Infaunal marine invertebrate fauna inside and outside of bacterial mats,...

    • access.earthdata.nasa.gov
    • researchdata.edu.au
    • +3more
    Updated Mar 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Infaunal marine invertebrate fauna inside and outside of bacterial mats, Casey 2006-07 [Dataset]. http://doi.org/10.26179/5c8b147c9568b
    Explore at:
    Dataset updated
    Mar 15, 2019
    Time period covered
    Nov 10, 2006 - Dec 7, 2006
    Area covered
    Description

    Infaunal marine invertebrates were collected from inside and outside of patches of white bacterial mats from several sites in the Windmill Islands, Antarctica, around Casey station during the 2006-07 summer. Samples were collected from McGrady Cove inner and outer, the tide gauge near the Casey wharf, Stevenson's Cove and Brown Bay inner. Sediment cores of 10cm depth and 5cm diameter were collected by divers using a PVC corer from inside (4 cores) and outside (4 cores) each bacterial patch. The size of each patch varied from site to site. Cores were sieved at 500 microns and the extracted fauna preserved in 4 percent neutral buffered formalin. All fauna were counted and identified to species where possible or assigned to morphospecies based on previous infaunal sampling around Casey.

    An excel spreadsheet is available for download at the URL given below. The spreadsheet does not represent the complete dataset, and is only the bacterial mat infauna data.

    Regarding the infauna dataset:

    • in - in the mat or patch of bacteria and out is in the "normal" sediment surrounding the patch without evidence of any bacterial mat presence.
    • Patch numbers were allocated to ensure there was no confusion between patches in the same area.
    • Fauna names are our identification codes for each species. Some we have confirmed identifications for, some not. Species names, where we have them and as we get them, are listed against these codes in the Casey marine soft-sediment fauna identification guide.

    This work was completed as part of ASAC 2201 (ASAC_2201).

  12. Case study: Cyclistic bike-share analysis

    • kaggle.com
    zip
    Updated Mar 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge4141 (2022). Case study: Cyclistic bike-share analysis [Dataset]. https://www.kaggle.com/datasets/jorge4141/case-study-cyclistic-bikeshare-analysis
    Explore at:
    zip(131490806 bytes)Available download formats
    Dataset updated
    Mar 25, 2022
    Authors
    Jorge4141
    Description

    Introduction

    This is a case study called Capstone Project from the Google Data Analytics Certificate.

    In this case study, I am working as a junior data analyst at a fictitious bike-share company in Chicago called Cyclistic.

    Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike.

    Scenario

    The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, our team will design a new marketing strategy to convert casual riders into annual members.

    ****Primary Stakeholders:****

    1: Cyclistic Executive Team

    2: Lily Moreno, Director of Marketing and Manager

    ASK

    1. How do annual members and casual riders use Cyclistic bikes differently?
    2. Why would casual riders buy Cyclistic annual memberships?
    3. How can Cyclistic use digital media to influence casual riders to become members?

    # Prepare

    The last four quarters were selected for analysis which cover April 01, 2019 - March 31, 2020. These are the datasets used:

    Divvy_Trips_2019_Q2
    Divvy_Trips_2019_Q3
    Divvy_Trips_2019_Q4
    Divvy_Trips_2020_Q1
    

    The data is stored in CSV files. Each file contains one month data for a total of 12 .csv files.

    Data appears to be reliable with no bias. It also appears to be original, current and cited.

    I used Cyclistic’s historical trip data found here: https://divvy-tripdata.s3.amazonaws.com/index.html

    The data has been made available by Motivate International Inc. under this license: https://ride.divvybikes.com/data-license-agreement

    Limitations

    Financial information is not available.

    Process

    Used R to analyze and clean data

    • After installing the R packages, data was collected, wrangled and combined into a single file.
    • Columns were renamed.
    • Looked for incongruencies in the dataframes and converted some columns to character type, so they can stack correctly.
    • Combined all quarters into one big data frame.
    • Removed unnecessary columns

    Analyze

    • Inspected new data table to ensure column names were correctly assigned.
    • Formatted columns to ensure proper data types were assigned (numeric, character, etc).
    • Consolidated the member_casual column.
    • Added day, month and year columns to aggregate data.
    • Added ride-length column to the entire dataframe for consistency.
    • Deleted trip duration rides that showed as negative and bikes out of circulation for quality control.
    • Replaced the word "member" with "Subscriber" and also replaced the word "casual" with "Customer".
    • Aggregated data, compared average rides between members and casual users.

    Share

    After analysis, visuals were created as shown below with R.

    Act

    Conclusion:

    • Data appears to show that casual riders and members use bike share differently.
    • Casual riders' average ride length is more than twice of that of members.
    • Members use bike share for commuting, casual riders use it for leisure and mostly on the weekends.
    • Unfortunately, there's no financial data available to determine which of the two (casual or member) is spending more money.

    Recommendations

    • Offer casual riders a membership package with promotions and discounts.
  13. n

    Winter foraging success of Southern Ocean predators in relation to...

    • access.earthdata.nasa.gov
    • researchdata.edu.au
    • +1more
    cfm
    Updated Apr 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Winter foraging success of Southern Ocean predators in relation to stochastic variation in sea-ice extent and winter water formation [Dataset]. http://doi.org/10.4225/15/554AACBF0C998
    Explore at:
    cfmAvailable download formats
    Dataset updated
    Apr 26, 2017
    Time period covered
    Oct 1, 2006 - Mar 31, 2012
    Area covered
    Description

    Metadata record for data from ASAC Project 2794 See the link below for public details on this project.

    Public: This study will use innovative technology to measure the winter spatial foraging patterns and net energy gain of adult female elephant seals (and potentially Weddell seals), while simultaneously providing high-resolution data on the physical nature of the water column in which the seals live. By combining biological and physical data with satellite derived sea-ice information, this study will improve our understanding of predator foraging success (and therefore mechanisms which regulate population trajectories) and provide physical oceanographers with fundamental data on the importance mechanisms that determine the winter ice and bottom water formation that under-pin the Antarctic marine ecosystem.

    Project objectives: The extent and nature of Antarctic winter sea ice is thought to have profound impacts on biological productivity, the recruitment of Antarctic krill, and the flow-on effects through the Antarctic marine food web. 1. Winter sea-ice formation is also hypothesised to play an important, yet highly-variable role in ocean circulation patterns through the production of cold, dense winter bottom water. 2. The mechanisms determining the inter-annual variation in winter ice formation are poorly understood, as are the complex feedback processes involved, but they are nonetheless recognised as being vulnerable to human-induced climate change. 3. Given the dynamically-linked nature of winter-ice and biological productivity, long-term climatic changes will have broad scale influences on Antarctic biota.

    This study will use innovative technological developments to quantify the response of one of the major Antarctic marine predators, the southern elephant seal (Mirounga leonina), to inter-annual variation in winter ice conditions. We will measure the winter spatial foraging patterns and net energy gain of adult female elephant seals while simultaneously providing high-resolution data on the physical nature of the water column in which the seals are living. The combination of these biological and physical data with satellite-derived sea-ice information will relate variation in the winter-ice to broad scale biological production through the foraging success (maternal investment and therefore demographic performance) of a top Antarctic marine predator, as well as providing physical oceanographers with fundamental data on the important mechanisms that determine the winter ice and bottom water formation that under-pin the Antarctic marine ecosystem. The specific objectives are to:

    1. Measure the foraging performance of the seals in terms of spatially-specific net energy gain while at sea, in relation to intra- and inter-annual variation in sea-ice and oceanic processes.
    2. Use newly-developed (and tested) animal-borne satellite-linked Conductivity-Temperature-Depth Satellite Relay Data Loggers (CTD-SRDLs) to provide oceanographic quality data on local physical characteristics (temperature and salinity).
    3. Record fine-scale foraging parameters (dive depth, duration, swimming speed) using "Dead-Reckoning" Data Loggers (DRDLs) and feeding events using Stomach Temperature Sensors (STSs).
    4. Integrate these data collected in years and regions of different winter ice extent and conditions.
    5. Assess diet during the winter months using stable isotope and fatty acid signature analysis.
    6. Combine the biological and physical information to refine current models of predator performance based on annual climatic features. These models will be used to examine a range of climate-change scenarios, initially for elephant seals but with a view to broadening the species application at a later stage.

    Taken from the 2008-2009 Progress Report: Progress against objectives: Due to logistic constraints, no satellite telemetry was conducted at Casey or Macquarie Island this year, but preliminary surveys of the region were conducted for both elephant and Weddell seals (see report for 2753). However we did deploy CTD satellite tags on elephant seals at Isles Kerguelen and Elephant Island to contribute to the IPY MEOP program. These animals either traversed the Southern Ocean to forage over the Antarctic continental shelf, or remained very close to their breeding island, indicating that even within a population there are markedly different foraging strategies.

    Taken from the 2010-2011 Progress Report: Public summary of the season progress: Due to pre-departure accident for one of the field team leaders we were unable to reach Casey this year to complete that component of the program. Forty CTD satellite tags were successfully deployed at Vestfold Hills in January and February 2011. These tags are currently still transmitting from foraging locations along the Antarctic continental shelf and the ice edge.

    Project 2695 (ASAC_2695) was incorporated into this project.

    An Access database containing data from this project is available for download at the provided URL.

    The data have also been loaded into the Australian Antarctic Data Centre's ARGOS tracking database. The database can be accessed at the provided URLs.

  14. u

    Macquarie Island Bulk Aerosol Chemistry Data

    • data.ucar.edu
    • ckanprod.data-commons.k8s.ucar.edu
    ascii
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chad Dick (2025). Macquarie Island Bulk Aerosol Chemistry Data [Dataset]. http://doi.org/10.26023/ZNSN-EMBP-RZ0Z
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Chad Dick
    Time period covered
    Nov 22, 1995 - Dec 13, 1995
    Area covered
    Description

    This dataset contains results of IC analysis of low volume Fluoropore aerosol filters collected at Macquarie Island (Clean Air Lab.), using a filter stack to seperate sub- and super-micron radius aerosols.

  15. D

    Fuel Cell Stack Health Monitoring Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Fuel Cell Stack Health Monitoring Market Research Report 2033 [Dataset]. https://dataintelo.com/report/fuel-cell-stack-health-monitoring-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Fuel Cell Stack Health Monitoring Market Outlook



    According to our latest research, the global fuel cell stack health monitoring market size reached USD 865.4 million in 2024, demonstrating robust momentum as the industry increasingly focuses on reliability and efficiency in fuel cell systems. The market is projected to expand at a CAGR of 12.7% from 2025 to 2033, ultimately attaining a value of USD 2,527.1 million by 2033. This market growth is propelled by heightened demand for real-time monitoring solutions that ensure optimal performance and longevity of fuel cell stacks, particularly in automotive, stationary power, and industrial applications.




    The primary growth driver for the fuel cell stack health monitoring market is the accelerating adoption of fuel cell technology across various sectors, notably in automotive and stationary power generation. As governments worldwide implement stricter emissions regulations and incentivize clean energy solutions, manufacturers are increasingly integrating fuel cell systems into vehicles and power infrastructure. This shift necessitates advanced health monitoring systems that can detect anomalies, predict failures, and optimize maintenance schedules, thereby reducing downtime and maximizing the return on investment for end-users. The demand is further amplified by the growing need for efficient and reliable energy storage solutions as the world transitions towards renewable energy sources and decarbonization.




    Another significant factor contributing to market expansion is the continuous advancement in sensor technologies and data analytics. The integration of sophisticated hardware and software enables real-time data collection and analysis, providing valuable insights into the operational status of fuel cell stacks. With the advent of Internet of Things (IoT) and artificial intelligence (AI), monitoring systems have become more intelligent and proactive, capable of delivering predictive maintenance and remote diagnostics. These technological innovations not only enhance the performance and safety of fuel cell systems but also reduce operational costs by minimizing unscheduled maintenance and extending system lifespans.




    Furthermore, the increasing investments in research and development by both public and private entities are fostering innovation in the fuel cell stack health monitoring market. Governments across North America, Europe, and Asia Pacific are allocating substantial funds to support the commercialization of hydrogen and fuel cell technologies, creating a fertile environment for market growth. Collaborative efforts among automotive OEMs, research institutes, and technology providers are accelerating the development of next-generation health monitoring solutions that are scalable, cost-effective, and adaptable to diverse applications. This trend is expected to sustain the market’s upward trajectory over the forecast period.




    Regionally, Asia Pacific dominates the market, accounting for the largest share in 2024, followed by North America and Europe. The rapid industrialization, robust automotive manufacturing base, and strong government support for clean energy initiatives in countries such as China, Japan, and South Korea are key contributors to the region’s leadership. North America is also witnessing significant growth, driven by investments in hydrogen infrastructure and the presence of leading technology innovators. Europe, with its stringent environmental regulations and ambitious decarbonization targets, is emerging as a critical market for fuel cell stack health monitoring solutions, particularly in automotive and stationary power applications.



    Component Analysis



    The component segment of the fuel cell stack health monitoring market is categorized into hardware, software, and services. Hardware forms the backbone of health monitoring systems, comprising sensors, data acquisition units, controllers, and communication modules. The demand for advanced and miniaturized sensors capable of withstanding harsh operational conditions is on the rise, as they enable precise measurement of critical parameters such as voltage, temperature, and pressure within fuel cell stacks. Modern hardware solutions are designed for high reliability and accuracy, ensuring continuous and real-time monitoring of stack health, which is vital for the safety and efficiency of fuel cell systems deployed in automotive and industrial environments.


  16. n

    Windmill Islands of vegetation transects, surveyed 2012 to 13 (10 years)

    • access.earthdata.nasa.gov
    • researchdata.edu.au
    • +1more
    cfm
    Updated Jun 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Windmill Islands of vegetation transects, surveyed 2012 to 13 (10 years) [Dataset]. http://doi.org/10.4225/15/55B5766F7FF9E
    Explore at:
    cfmAvailable download formats
    Dataset updated
    Jun 18, 2020
    Time period covered
    Jan 1, 2013 - Jan 31, 2013
    Area covered
    Description

    Metadata ID: AAS_4046_Transects_2012-13 title: Windmill Islands vegetation transects, surveyed 2012/13 (10 years)

    This record contains data associated with the Windmill Islands vegetation 10 year survey conducted in 2012/13, under AAS_4046. The transects were established in 2002/03, as described in metadata ID: ASAC_1313_Transects_2002-03, where details of experimental design and data collection are provided.

    Descriptions of data associated with this record are provided below under the following headings: 1. LOCATION (GPS) DATA (and MAPS) 2. QUADRAT PHOTOS 3. NOTEBOOK SCANS 4. MICROSCOPY SCORE SHEETS 5. FINESCALE SPECIES ABUNDANCE (MICROSCOPY) 6. BROADSCALE PERCENT COVER (IMAGE ANALYSIS) 7. ENVIRONMENTAL VARIABLES (e.g. MOISTURE, TEMPERATURE) 8. PROCESSED/COMPILED/WORKED

    Descriptions of data provided: 1. LOCATION (GPS) DATA (and MAPS) Quadrat location data are provided in metadata ID: AAS_4046_quadrat_locations (http://data.aad.gov.au/aadc/metadata/metadata.cfm?entry_id=AAS_4046_quadrat_locations). And shown in two maps which are available via the AADC map catalogue: http://data.aad.gov.au/aadc/mapcat/display_map.cfm?map_id=14450 http://data.aad.gov.au/aadc/mapcat/display_map.cfm?map_id=14451

    1. QUADRAT PHOTOS TO BE PROVIDED - all quadrat (and transect/site) photos collected 2013.

    2. NOTEBOOK SCANS TO BE PROVIDED -

    3. MICROSCOPY SCORE SHEETS TO BE PROVIDED - for samples collected 2013.

    4. FINESCALE SPECIES ABUNDANCE (MICROSCOPY) TO BE PROVIDED - raw data from microscopy scoring for samples collected 2013.

    5. BROADSCALE PERCENT COVER (IMAGE ANALYSIS) TO BE PROVIDED - has this data been generated? May be part of Diana King PhD thesis, which is due to be submitted 2016.

    6. ENVIRONMENTAL VARIABLES (e.g. MOISTURE, TEMPERATURE) TO BE PROVIDED - raw stable isotope data collected 2013. Raw moisture content (CWC) data collected 2013.

    7. PROCESSED/COMPILED/WORKED OPTIONAL to provide if relevant

    8. MULTI-YEAR COMPILATIONS AND COMPARISONS

    FILE: Transects Data Summary_2000-2013.xlsx This excel file provides a summary of transect data collected to 2013. This file was originally prepared by Taylor Benny (2013) and has been updated by Jane Wasley (2015).

    Four worksheets: 1. Worksheet: "Vocabulary"- provides a detailed description of methods, terms and abbreviations. 2. Worksheet: "DataCollection" provides a summary of project personnel (including field collections, laboratory analyses and data analysis) for all survey years from 1999. 3. Worksheet: "Quadrat" provides a schematic of the quadrats used in this study, providing details of the size used for photos (25 x 25 cm), sample collection (20 x 20 cm) and grid interval details. 4. Worksheet "Data" includes the following data types: GPS locations of quadrats, species composition of vegetation quadrats (referred to as: fine scale vegetation analysis), moss moisture contents (referred to as: community water content; CWC) and vegetation temperature. The species composition data presented in this file are the overall relative abundance scores for each species/taxa for each quadrat. These data are based on presence/absence scores for nine samples collected per quadrat (raw individual sample data not provided here). The score range for each taxa for each quadrat is 0-9, where nine indicates taxa present in all nine samples in a given quadrat.

    FILE: Taylor Benny 2013_Thesis.pdf PDF file of Honours thesis for Taylor Benny (2013).

    FILE NAME: ASAC_4046-Transects 2013-SOE summary.pdf Public summary of results, describing the state and trends of continental Antarctic vegetation communities. Presentation format based on template from Australia: State of the Environment 2011 (Hatton et al. 2011). Trends presented based on results of transects surveyed 203 to 2013. The PDF file is an extract from Benny 2013 thesis (P80).

    FILE NAME: AAS_4046-Transects-change maps-2013.pdf Maps produced by Taylor Benny (2013), showing schematic summary of biological change observed between survey periods (2002/03 vs 2007/08 and 2007/08 vs 2012/13) for Windmill Islands vegetation transects at Robinson Ridge and ASPA 135 sites.

    The PDF file contains six pages, each page shows a map for each of the two study sites: ASPA 135 and Robinson Ridge (2 maps per page). Data collection as described in metadata ID: Windmill Islands Vegetation Transects. Unless otherwise provide in Benny 2013, details of the origin of the map imagery and quadrat position data are not known (data likely collected via octocopter instruments deployed by Arko Lucier, or his team). The six pages are an extract from Benny 2013, and are labelled with page numbers as indicated in brackets ( ) below, they present data for: 1. Ceratodon purpureus (P74) 2. Schistidium antarctici (P75) 3. Bryum pseudotriquetrum (P76) 4. crustose lichens (P77) 5. Community Water Content (P78) 6. % live moss (P79)

    Data were collected from ASPA 135 and Robinson Ridge, as shown in maps 14450 and 14451 in the SCAR Map Catalogue.

  17. MELD Preprocessed

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argish Abhangi (2025). MELD Preprocessed [Dataset]. https://www.kaggle.com/datasets/argish/meld-preprocessed
    Explore at:
    zip(3527202381 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Argish Abhangi
    Description

    The MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.

    Data Sources

    • Audio: Waveforms extracted from the original video files.
    • Video: Video files are processed to sample frames at a target frame rate (default: 2 fps) and to detect faces using a Haar Cascade classifier.
    • Text: Utterances from the dialogue, which are cleaned using custom encoding functions to fix potential byte encoding issues.
    • Emotion Labels: Each sample is associated with an emotion label.

    Preprocessing Pipeline

    The preprocessing script performs several key steps:

    1. Text Cleaning:

      • fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.
      • replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).
    2. Audio Processing:

      • Extracts raw audio waveform from each sample.
      • Computes a Mel-spectrogram using torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).
      • Converts the spectrogram to a logarithmic scale for numerical stability.
    3. Video Processing:

      • Reads video frames at a specified target FPS (default: 2 fps) using OpenCV.
      • For each video, samples frames evenly based on the original video's FPS.
      • Applies Haar Cascade face detection on the frames to extract the first detected face.
      • Resizes the detected face to 224x224 and converts it to RGB. If no face is detected, a default black image (224x224x3) is returned.
    4. Saving Processed Samples:

      • Each sample is saved as a .pt file in a directory structure split by data type (train, dev, and test).
      • The filename is derived from the original video filename (e.g., dia0_utt1.mp4 becomes dia0_utt1.pt).

    Data Format

    Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:

    • utterance (str): The cleaned textual utterance.
    • emotion (str/int): The corresponding emotion label.
    • video_path (str): Original path to the video file from which the sample was extracted.
    • audio (Tensor): Raw audio waveform tensor of shape [channels, time].
    • audio_sample_rate (int): The sampling rate of the audio waveform.
    • audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].
    • face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.

    Directory Structure

    The preprocessed files are organized into splits: preprocessed_data/ ├── train/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... ├── dev/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... └── test/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt └── ...

    Loading and Using the Dataset

    A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:

    Dataset Class

    from torch.utils.data import Dataset
    import os
    import torch
    
    class PreprocessedMELDDataset(Dataset):
      def _init_(self, data_dir):
        """
        Args:
          data_dir (str): Directory where preprocessed .pt files are stored.
        """
        self.data_dir = data_dir
        self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')]
        
      def _len_(self):
        return len(self.files)
      
      def _getitem_(self, idx):
        sample_path = self.files[idx]
        sample = torch.load(sample_path)
        return sample
    

    Custom Collate Function

    def preprocessed_collate_fn(batch):
      """
      Collates a list of sample dictionaries into a single dictionary with keys mapping to lists.
      Modify this function to pad or stack tensor data if needed.
      """
      collated = {}
      collated['utterance'] = [sample['utterance'] for sample in batch]
      collated['emotion'] = [sample['emotion'] for sample in batch]
      collated['video_path'] = [sample['video_path'] for sample in batch]
      collated['audio'] = [sample['audio'] for sample in batch]
      collated['audio_sample_rate'] = batch[0]['audio_sample_rate']
      collated['audio_mel'] = [sample['audio_mel'] for sample in batch]
      collated['face'] = [sample['face'] for sample in batch]
      return collated
    

    Creating DataLoaders

    from torch.utils.data import DataLoader
    
    # Define paths for each split
    train_data_dir = "preprocessed_data/train"
    dev_data_dir = "preproces...
    
  18. m

    From Simulation to Classification: A Scalable Rule-Based SQL Injection...

    • data.mendeley.com
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Abu Obaida Mullick (2025). From Simulation to Classification: A Scalable Rule-Based SQL Injection Dataset Creation and Machine Learning Evaluation [Dataset]. http://doi.org/10.17632/xz4d5zj5yw.1
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Mohammad Abu Obaida Mullick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset has been developed to support advanced research and development in the detection of SQL injection (SQLi) vulnerabilities. It contains a total of 10,304,026 structured entries, out of which 2,813,146 are labeled as malicious and 7,490,880 as benign. The malicious entries are categorized into six distinct types of SQL injection attacks: Union-based (758,600 samples), Stackqueries-based (746,480 samples), Time-based (531,580 samples), Meta-based (481,280 samples), Boolean-based (226,080 samples), and Error-based (69,126 samples).

    The malicious payloads for Union-based, Time-based, and Error-based injection types were sourced directly from the widely used and reputable open-source GitHub repository "Payloads All The Things – SQL Injection Payload List" (https://github.com/payloadbox/sql-injection-payload-list). Moreover, ChatGPT was employed to generate additional payloads for Boolean-based, Stack queries-based, and Meta-based injection categories. This hybrid approach ensures that the dataset reflects both known attack patterns and intelligently simulated variants, contributing to a broader representation of SQLi techniques.

    All payloads were carefully curated, anonymized, and structured during preprocessing. Sensitive data was replaced with secure placeholders, preserving semantic meaning while protecting data integrity and privacy. The dataset also underwent a thorough sanitization process to ensure consistency and usability. To support scalability and reproducibility, a rule-based classification algorithm was used to automate the labeling and organization of each payload by type. This methodology promotes standardization and ensures that the dataset is ready for use in machine learning pipelines, anomaly detection models, and intrusion detection systems. In addition to being comprehensive, the dataset provides a substantial volume of clean (benign) data, making it well-suited for supervised learning, comparative experiments, and robustness testing in cybersecurity research.

    This dataset is intended to facilitate progress in the development of more accurate and generalizable SQL injection detection systems and to serve as a reliable benchmark for the broader security and machine learning communities.

  19. Movie Metadata Collection: 4900+ Films

    • kaggle.com
    zip
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sonal Gire (2025). Movie Metadata Collection: 4900+ Films [Dataset]. https://www.kaggle.com/datasets/sonalgire/movie-metadata-collection-4900-films
    Explore at:
    zip(176273 bytes)Available download formats
    Dataset updated
    Apr 10, 2025
    Authors
    Sonal Gire
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🎬 Latest 4900+ Movies Dataset (Fetched via API)

    This dataset contains metadata for 4900+ recently added movies, collected through an API using Python’s requests library. It includes a wide range of movie attributes such as:

    • 🎥 Movie Title
    • 📅 Year of Release
    • ⭐ IMDb Rating
    • 🏷️ Genres
    • ⏱️ Runtime
    • 🌍 Language
    • 📆 Date Added
    • 🎞️ Torrent Information(where available)

    🔍 Why This Dataset?

    This collection is up-to-date, clean, and ideal for real-world data practice. Whether you're a data enthusiast, ML learner, or just a film buff, this dataset opens the door to exciting insights and creative projects.

    ✅ Applications

    • 📊 Data Analysis & Visualization: Explore top genres, ratings over time, or language trends
    • 🧠 Machine Learning Projects: Build models for rating prediction, genre classification, clustering, etc.
    • 🎯 Recommendation Systems: Create content-based or collaborative filtering models
    • 🤖 AI/NLP Integration: Combine with reviews to analyze sentiment or perform text classification
    • 🧪 Capstone or Portfolio Projects: Showcase your skills with real-time movie data

    🚀 Tech Stack

    • Python
    • Jupyter Notebook
    • requests, pandas

    Feel free to use this dataset in your own projects, share insights, or remix it into something even better!
    If you like the dataset, please give it an upvote ❤️ and leave a comment with your ideas!

    Let me know if you'd like to include a small "How I collected it" section too!

  20. G

    CHP Fuel Cell for Data Center Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). CHP Fuel Cell for Data Center Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/chp-fuel-cell-for-data-center-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    CHP Fuel Cell for Data Center Market Outlook



    According to our latest research, the CHP Fuel Cell for Data Center market size reached USD 1.84 billion in 2024 on a global scale. The market is projected to expand at a robust CAGR of 16.2% from 2025 to 2033, reaching an estimated value of USD 7.29 billion by 2033. This impressive growth trajectory is primarily driven by the escalating demand for sustainable, reliable, and efficient power solutions in data center operations worldwide, as organizations seek to minimize carbon emissions and ensure uninterrupted service delivery.




    A major growth factor propelling the CHP Fuel Cell for Data Center market is the increasing emphasis on energy efficiency and sustainability in the data center industry. Data centers are notorious for their immense energy consumption, and operators are under mounting pressure to reduce their environmental footprint. Combined Heat and Power (CHP) fuel cells offer a compelling solution by providing both electricity and thermal energy from a single fuel source, significantly improving overall energy utilization rates. This dual-generation capability not only reduces operational costs but also aligns with stringent regulatory frameworks and corporate sustainability goals. As data centers continue to proliferate, especially in regions with high digitalization rates, the adoption of CHP fuel cell systems is poised to accelerate further.




    Another significant driver for the CHP Fuel Cell for Data Center market is the need for enhanced reliability and resilience in power supply. Data centers cannot afford power interruptions, as downtime can result in substantial financial losses and reputational damage. CHP fuel cell systems are inherently reliable, offering continuous power generation independent of the grid, and are less susceptible to fluctuations or failures in the external power supply. Moreover, these systems can seamlessly integrate with existing backup solutions, such as batteries and diesel generators, to provide layered redundancy. The ability to operate in island mode during grid outages is particularly attractive to hyperscale and colocation data centers, which prioritize uptime above all else.




    Technological advancements and declining costs of fuel cell components are also contributing to the expansion of the CHP Fuel Cell for Data Center market. Innovations in fuel cell stack design, catalyst materials, and system integration have led to improved performance, longer lifespans, and reduced maintenance requirements. Furthermore, as manufacturing scales up and supply chains mature, the overall cost of deploying CHP fuel cell systems is gradually decreasing, making them more accessible to a broader range of data center operators. Government incentives, tax credits, and favorable policies supporting clean energy adoption are further bolstering market growth, particularly in North America, Europe, and parts of Asia Pacific.



    In recent years, the potential of Hydrogen Fuel Cell for Data Centers has garnered significant attention as an alternative to traditional energy sources. Hydrogen fuel cells offer a clean and efficient way to power data centers, with the added benefit of producing only water as a byproduct. This technology is particularly appealing for data centers aiming to achieve zero-emission operations while maintaining high levels of reliability and performance. The integration of hydrogen fuel cells can also enhance energy security by reducing dependency on fossil fuels and grid-based electricity. As the industry continues to explore sustainable power solutions, hydrogen fuel cells are emerging as a viable option for data centers looking to align with global decarbonization goals.




    From a regional perspective, North America currently dominates the CHP Fuel Cell for Data Center market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The United States, in particular, has witnessed significant installations of CHP fuel cell systems in both enterprise and colocation data centers, driven by strong regulatory support and a mature technological ecosystem. Europe is rapidly catching up, propelled by ambitious decarbonization targets and substantial investments in green data center infrastructure. Meanwhile, Asia Pacific is emerging as a high-growth region, fueled by rapid digital transformation, expanding clou

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fahad Rehman (2025). Sales Dataset (Raw v Clean)#fic_PAK_region_data [Dataset]. https://www.kaggle.com/datasets/fahad0rehman/sales-dataset-raw-v-cleanfic-pak-region-data
Organization logo

Sales Dataset (Raw v Clean)#fic_PAK_region_data

Raw & Clean Dataset with power BI stack chart

Explore at:
zip(98904 bytes)Available download formats
Dataset updated
Aug 21, 2025
Authors
Fahad Rehman
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset

This dataset was created by Fahad Rehman

Released under Attribution 4.0 International (CC BY 4.0)

Contents

Search
Clear search
Close search
Google apps
Main menu