2 datasets found
  1. Data from: Can LLMs Replace Manual Annotation of Software Engineering...

    • zenodo.org
    text/x-python, zip
    Updated Oct 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blinded; Blinded (2024). Can LLMs Replace Manual Annotation of Software Engineering Artifacts? [Dataset]. http://doi.org/10.5281/zenodo.13208088
    Explore at:
    zip, text/x-pythonAvailable download formats
    Dataset updated
    Oct 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Blinded; Blinded
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Required Libraries

    The following libraries are required to run the scripts in this repository. You can install them using `pip`:

    ```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai

    Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:

    • openai
    • anthropic
    • together

    All the experiments were done using python 3.10.11

    For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.

    File Description:

    1. data_result: This folder contains the file with the dataset and few-shot samples. After running process.py, all the results will be accumuted to data_result folder. Note that this folder is containing all the data and model generated results in .jsonl fomat files. You do not need to run process.py to generate the results.
    2. Plots: This folder is containing the generated plots which can be generated by running heatmap.py and ira_sample.py
    3. process.py: This file will generate the results/annotations from the model based on the given parameters. We have shared the necessary command to run this file at the bottom. Note that you need API keys from different organizations to run the script. However, we have shared all the model generated results on data_result folder.
    4. heatmap.py: Running this file will generate the heatmap that we presented from Figure 1-5 in the paper. The generated plots will be stored in "Plots" folder.
    5. ira_sample.py: Running this file will generate the plots that we presented from Figure 7-10 in the paper. The generated plots will be stored in "Plots" folder.

    Commands for datasets (Except Code Summarization):

    Generating samples for different models:

    python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    For Figure (1-5):

    python heatmap.py

    For Figure (7-10):

    python ira_sample.py

    Commands for datasets (Code Summarization):

    python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

    For Figure (1-5):

    python heatmap.py

    For Figure (7-10):

    python ira_sample.py

    What="accurate", "adequate", "concise", "similarity"

  2. a

    Finding Optimal Locations for Polling Stations in San Diego County

    • hub.arcgis.com
    Updated Feb 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of California San Diego (2021). Finding Optimal Locations for Polling Stations in San Diego County [Dataset]. https://hub.arcgis.com/documents/2263c48dceaa4bd6b2ea981d849b5712
    Explore at:
    Dataset updated
    Feb 11, 2021
    Dataset authored and provided by
    University of California San Diego
    Description

    This project studied the pooling location optimization problem by visually creating heatmap to visualize the distribution of registered voters for each precinct, the polling location area type layer by spatial join, and by computationally performed Location-Allocation Analysis on candidate polling locations and demand locations, with weights based on the rate of in-person voters in a certain election precinct.Notable Modules Used: Python: pandas, geopandas, matplotlib, seaborn ArcGIS: dissolve_boundaries, find_existing_locations, enrich, aggregate_points, create_buffers, solve_location_allocation, Storymaps, WebMap No permission to access the contents.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Blinded; Blinded (2024). Can LLMs Replace Manual Annotation of Software Engineering Artifacts? [Dataset]. http://doi.org/10.5281/zenodo.13208088
Organization logo

Data from: Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

Related Article
Explore at:
zip, text/x-pythonAvailable download formats
Dataset updated
Oct 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Blinded; Blinded
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Required Libraries

The following libraries are required to run the scripts in this repository. You can install them using `pip`:

```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai

Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:

  • openai
  • anthropic
  • together

All the experiments were done using python 3.10.11

For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.

File Description:

  1. data_result: This folder contains the file with the dataset and few-shot samples. After running process.py, all the results will be accumuted to data_result folder. Note that this folder is containing all the data and model generated results in .jsonl fomat files. You do not need to run process.py to generate the results.
  2. Plots: This folder is containing the generated plots which can be generated by running heatmap.py and ira_sample.py
  3. process.py: This file will generate the results/annotations from the model based on the given parameters. We have shared the necessary command to run this file at the bottom. Note that you need API keys from different organizations to run the script. However, we have shared all the model generated results on data_result folder.
  4. heatmap.py: Running this file will generate the heatmap that we presented from Figure 1-5 in the paper. The generated plots will be stored in "Plots" folder.
  5. ira_sample.py: Running this file will generate the plots that we presented from Figure 7-10 in the paper. The generated plots will be stored in "Plots" folder.

Commands for datasets (Except Code Summarization):

Generating samples for different models:

python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

Commands for datasets (Code Summarization):

python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

What="accurate", "adequate", "concise", "similarity"

Search
Clear search
Close search
Google apps
Main menu