4 datasets found
  1. Data from: AdvSCanner: Generating Adversarial Smart Contracts to Exploit...

    • figshare.com
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yin wu (2024). AdvSCanner: Generating Adversarial Smart Contracts to Exploit Reentrancy Vulnerabilities Using LLM and Static Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26014876.v4
    Explore at:
    text/x-script.pythonAvailable download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    yin wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AGEStaticAGEStatic is an innovative project aimed at enhancing the security of Ethereum smart contracts by automatically generating exploit smart contracts. The project leverages large language models (LLMs) and static analysis to automatically generate adversarial smart contracts (ASCs) designed to exploit reentrancy vulnerabilities in victim contracts, which are among the most critical security issues in smart contracts.DatasetWe have collected and integrated multiple smart contracts with reentrancy vulnerabilities from various sources. To obtain more representative samples, we filtered out ineligible and duplicate smart contracts according to the standards mentioned above, resulting in a total of 78 unique smart contracts (14 are duplicate.)Size: The dataset includes 78 smart contracts (14 duplicates), each verified for relevance and uniqueness,such as ERAP, ESC, Smartbugs, RSD, ATR, and SSE.Standards for Dataset Collection:Solidity Smart Contract: The AGEStatic tool we designed is aimed at Solidity smart contracts, with Solidity versions ranging from 0.4.0 to 0.8.25.Open-source and Peer-reviewed Dataset: The reentrancy vulnerabilities datasets are collected from widely-used or peer-reviewed open-source datasets that have obtained general public acceptance and applications in relevant research.Marked as Reentrancy Vulnerability: The most vital standard requires the existence of reentrancy vulnerability, which can be categorized into two types: manually injected vulnerability (MI) and real-world vulnerability (RW).Detection by Static Analysis Tool: These contracts in the dataset should be identified as reentrancy vulnerability by traditional static analysis tools that output reentrancy reports for each contract.Fully Functional Characteristics: Smart contracts with only partial functions cannot support attack verification experiments; therefore, the contracts satisfy logical integrity and full functionality characteristics.Physical ExperimentThis section describes the environment and code used for running the static analysis experiments and generating exploit contracts.Static Analysis: The static analysis experiments, obtained from GitHub, are run on an Ubuntu 22.04 system with the following hardware specifications:Operating System: Ubuntu 22.04CPU: Intel(R) Core(TM) i7-9750H @ 2.60GHz (2 cores and 2 threads)Cache Size: 12288 KBMemory Size: 6085248 KBExploit Contract Generation: We leverage APIs of gpt-3.5-turbo, gpt-4, or gpt-4o using Python. The environment specifications are as follows:Required Packages:python==3.10.0openai==0.28.0py-solc-x==2.0.2Experiment ResultsThe experimental results include RQ1, RQ2, RQ3, and RQ4.

  2. Job Dataset

    • kaggle.com
    zip
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravender Singh Rana (2023). Job Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset
    Explore at:
    zip(479575920 bytes)Available download formats
    Dataset updated
    Sep 17, 2023
    Authors
    Ravender Singh Rana
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Dataset

    This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.

    Descriptions for each of the columns in the dataset:

    1. Job Id: A unique identifier for each job posting.
    2. Experience: The required or preferred years of experience for the job.
    3. Qualifications: The educational qualifications needed for the job.
    4. Salary Range: The range of salaries or compensation offered for the position.
    5. Location: The city or area where the job is located.
    6. Country: The country where the job is located.
    7. Latitude: The latitude coordinate of the job location.
    8. Longitude: The longitude coordinate of the job location.
    9. Work Type: The type of employment (e.g., full-time, part-time, contract).
    10. Company Size: The approximate size or scale of the hiring company.
    11. Job Posting Date: The date when the job posting was made public.
    12. Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)
    13. Contact Person: The name of the contact person or recruiter for the job.
    14. Contact: Contact information for job inquiries.
    15. Job Title: The job title or position being advertised.
    16. Role: The role or category of the job (e.g., software developer, marketing manager).
    17. Job Portal: The platform or website where the job was posted.
    18. Job Description: A detailed description of the job responsibilities and requirements.
    19. Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).
    20. Skills: The skills or qualifications required for the job.
    21. Responsibilities: Specific responsibilities and duties associated with the job.
    22. Company Name: The name of the hiring company.
    23. Company Profile: A brief overview of the company's background and mission.

    Potential Use Cases:

    • Building predictive models to forecast job market trends.
    • Enhancing job recommendation systems for job seekers.
    • Developing NLP models for resume parsing and job matching.
    • Analyzing regional job market disparities and opportunities.
    • Exploring salary prediction models for various job roles.

    Acknowledgements:

    We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.

    Note:

    Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com

  3. TSMC Stock Daily Updated

    • kaggle.com
    zip
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Hidden Layer (2025). TSMC Stock Daily Updated [Dataset]. https://www.kaggle.com/datasets/isaaclopgu/tsmc-stock-daily-updated
    Explore at:
    zip(297189 bytes)Available download formats
    Dataset updated
    Nov 22, 2025
    Authors
    The Hidden Layer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About this Dataset

    This dataset offers a comprehensive, up-to-date look at the historical stock performance of Taiwan Semiconductor Manufacturing Company (TSMC), the world's largest contract chip manufacturer. The data is provided in a clean, daily format, making it an excellent resource for financial analysis, machine learning, and time series modeling.

    About the Company

    Taiwan Semiconductor Manufacturing Company, Ltd. (TSMC) is a Taiwanese multinational semiconductor contract manufacturing and design company. Founded in 1987 and headquartered in Hsinchu, Taiwan, it is a key player in the global technology supply chain, producing chips for many of the world's leading tech companies, including Apple, NVIDIA, and AMD. TSMC's stock performance is a significant indicator of the health of the semiconductor industry and global demand for advanced electronics.

    Key Features

    Daily OHLCV Data: The dataset contains essential Open, High, Low, Close, and Volume metrics for each trading day.

    Comprehensive History: Includes data from TSMC's early trading history to the present, offering a long-term perspective.

    Regular Updates: The dataset is designed for regular, automated updates to ensure data freshness for time-sensitive projects.

    Data Dictionary

    Date: The date of the trading session in YYYY-MM-DD format.

    ticker: The standard ticker symbol for Taiwan Semiconductor Manufacturing Company Ltd. on the NYSE: 'TSM'.

    name: The full name of the company: 'Taiwan Semiconductor Manufacturing Company Ltd.'.

    Open: The stock price in USD at the start of the trading session.

    High: The highest price reached during the trading day in USD.

    Low: The lowest price recorded during the trading day in USD.

    Close: The final stock price at market close in USD.

    Volume: The total number of shares traded on that day.

    Data Collection

    The data for this dataset is collected using the yfinance Python library, which pulls information directly from the Yahoo Finance API.

    Potential Use Cases

    Financial Analysis: Analyze historical price trends, volatility, and trading volume of TSMC stock.

    Machine Learning: Develop and test models for stock price prediction and time series forecasting.

    Educational Projects: A perfect real-world dataset for students and data enthusiasts to practice data cleaning, visualization, and modeling.

  4. d

    Augmented Texas 7000-bus synthetic grid

    • search.dataone.org
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aravena, Ignacio; Jiyu Wang (2025). Augmented Texas 7000-bus synthetic grid [Dataset]. http://doi.org/10.7910/DVN/AKUDJT
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Aravena, Ignacio; Jiyu Wang
    Description

    Augmented Texas 7000-bus synthetic grid Augmented version of the synthetic Texas 7k dataset published by Texas A&M University. The system has been populated with high-resolution distributed photovoltaic (PV) generation, comprising 4,499 PV plants of varying sizes with associated time series for 1 year of operation. This high-resolution dataset was produced following publicly available data and it is free of CEII. Details on the procedure followed to generate the PV dataset can be found in the Open COG Grid Project Year 1 Report (Chapter 6). The technical data of the system is provided using the (open) CTM specification for easy accessibility from Python without additional packages (data can be loaded as a dictionary). The time series for demand and PV production are provided as a HDF5 file, also loadable with standard open-source tools. We additionally provide example scripts for parsing the data in Python. Prepared by LLNL under Contract DE-AC52-07NA27344. LLNL control number: LLNL-DATA-2001833.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
yin wu (2024). AdvSCanner: Generating Adversarial Smart Contracts to Exploit Reentrancy Vulnerabilities Using LLM and Static Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26014876.v4
Organization logo

Data from: AdvSCanner: Generating Adversarial Smart Contracts to Exploit Reentrancy Vulnerabilities Using LLM and Static Analysis

Related Article
Explore at:
text/x-script.pythonAvailable download formats
Dataset updated
Sep 13, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
yin wu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

AGEStaticAGEStatic is an innovative project aimed at enhancing the security of Ethereum smart contracts by automatically generating exploit smart contracts. The project leverages large language models (LLMs) and static analysis to automatically generate adversarial smart contracts (ASCs) designed to exploit reentrancy vulnerabilities in victim contracts, which are among the most critical security issues in smart contracts.DatasetWe have collected and integrated multiple smart contracts with reentrancy vulnerabilities from various sources. To obtain more representative samples, we filtered out ineligible and duplicate smart contracts according to the standards mentioned above, resulting in a total of 78 unique smart contracts (14 are duplicate.)Size: The dataset includes 78 smart contracts (14 duplicates), each verified for relevance and uniqueness,such as ERAP, ESC, Smartbugs, RSD, ATR, and SSE.Standards for Dataset Collection:Solidity Smart Contract: The AGEStatic tool we designed is aimed at Solidity smart contracts, with Solidity versions ranging from 0.4.0 to 0.8.25.Open-source and Peer-reviewed Dataset: The reentrancy vulnerabilities datasets are collected from widely-used or peer-reviewed open-source datasets that have obtained general public acceptance and applications in relevant research.Marked as Reentrancy Vulnerability: The most vital standard requires the existence of reentrancy vulnerability, which can be categorized into two types: manually injected vulnerability (MI) and real-world vulnerability (RW).Detection by Static Analysis Tool: These contracts in the dataset should be identified as reentrancy vulnerability by traditional static analysis tools that output reentrancy reports for each contract.Fully Functional Characteristics: Smart contracts with only partial functions cannot support attack verification experiments; therefore, the contracts satisfy logical integrity and full functionality characteristics.Physical ExperimentThis section describes the environment and code used for running the static analysis experiments and generating exploit contracts.Static Analysis: The static analysis experiments, obtained from GitHub, are run on an Ubuntu 22.04 system with the following hardware specifications:Operating System: Ubuntu 22.04CPU: Intel(R) Core(TM) i7-9750H @ 2.60GHz (2 cores and 2 threads)Cache Size: 12288 KBMemory Size: 6085248 KBExploit Contract Generation: We leverage APIs of gpt-3.5-turbo, gpt-4, or gpt-4o using Python. The environment specifications are as follows:Required Packages:python==3.10.0openai==0.28.0py-solc-x==2.0.2Experiment ResultsThe experimental results include RQ1, RQ2, RQ3, and RQ4.

Search
Clear search
Close search
Google apps
Main menu