2 datasets found

Data from: Can LLMs Replace Manual Annotation of Software Engineering...
zenodo.org
text/x-python, zip
Updated Oct 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blinded; Blinded (2024). Can LLMs Replace Manual Annotation of Software Engineering Artifacts? [Dataset]. http://doi.org/10.5281/zenodo.13208088
Explore at:
zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13208088
Dataset updated
Oct 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Blinded; Blinded
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Required Libraries

The following libraries are required to run the scripts in this repository. You can install them using `pip`:

```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai

Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:

openai

anthropic

together

All the experiments were done using python 3.10.11

For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.

File Description:

data_result: This folder contains the file with the dataset and few-shot samples. After running process.py, all the results will be accumuted to data_result folder. Note that this folder is containing all the data and model generated results in .jsonl fomat files. You do not need to run process.py to generate the results.

Plots: This folder is containing the generated plots which can be generated by running heatmap.py and ira_sample.py

process.py: This file will generate the results/annotations from the model based on the given parameters. We have shared the necessary command to run this file at the bottom. Note that you need API keys from different organizations to run the script. However, we have shared all the model generated results on data_result folder.

heatmap.py: Running this file will generate the heatmap that we presented from Figure 1-5 in the paper. The generated plots will be stored in "Plots" folder.

ira_sample.py: Running this file will generate the plots that we presented from Figure 7-10 in the paper. The generated plots will be stored in "Plots" folder.

Commands for datasets (Except Code Summarization):

Generating samples for different models:

python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

Commands for datasets (Code Summarization):

python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

What="accurate", "adequate", "concise", "similarity"
a
Finding Optimal Locations for Polling Stations in San Diego County
hub.arcgis.com
Updated Feb 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of California San Diego (2021). Finding Optimal Locations for Polling Stations in San Diego County [Dataset]. https://hub.arcgis.com/documents/2263c48dceaa4bd6b2ea981d849b5712
Explore at:
Dataset updated
Feb 11, 2021
Dataset authored and provided by
University of California San Diego
Description
This project studied the pooling location optimization problem by visually creating heatmap to visualize the distribution of registered voters for each precinct, the polling location area type layer by spatial join, and by computationally performed Location-Allocation Analysis on candidate polling locations and demand locations, with weights based on the rate of in-person voters in a certain election precinct.Notable Modules Used: Python: pandas, geopandas, matplotlib, seaborn ArcGIS: dissolve_boundaries, find_existing_locations, enrich, aggregate_points, create_buffers, solve_location_allocation, Storymaps, WebMap No permission to access the contents.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Blinded; Blinded (2024). Can LLMs Replace Manual Annotation of Software Engineering Artifacts? [Dataset]. http://doi.org/10.5281/zenodo.13208088

Data from: Can LLMs Replace Manual Annotation of Software Engineering Artifacts?

Explore at:

zip, text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13208088

Dataset updated

Oct 10, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Blinded; Blinded

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Required Libraries

The following libraries are required to run the scripts in this repository. You can install them using `pip`:

```bash pip install pandas numpy argparse json time random openai copy statistics krippendorff sklearn seaborn matplotlib together anthropic google-generativeai

Make sure to also install any other dependencies required by the specific model API if you plan on using models like GPT-4 or Claude:

openai
anthropic
together

All the experiments were done using python 3.10.11

For each dataset, we have a folder that contains process.py, heatmap.py, ira_sample.py. The folder also contains the relevant datasets and plots.

File Description:

data_result: This folder contains the file with the dataset and few-shot samples. After running process.py, all the results will be accumuted to data_result folder. Note that this folder is containing all the data and model generated results in .jsonl fomat files. You do not need to run process.py to generate the results.
Plots: This folder is containing the generated plots which can be generated by running heatmap.py and ira_sample.py
process.py: This file will generate the results/annotations from the model based on the given parameters. We have shared the necessary command to run this file at the bottom. Note that you need API keys from different organizations to run the script. However, we have shared all the model generated results on data_result folder.
heatmap.py: Running this file will generate the heatmap that we presented from Figure 1-5 in the paper. The generated plots will be stored in "Plots" folder.
ira_sample.py: Running this file will generate the plots that we presented from Figure 7-10 in the paper. The generated plots will be stored in "Plots" folder.

Commands for datasets (Except Code Summarization):

Generating samples for different models:

python process.py --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

Commands for datasets (Code Summarization):

python process.py --what accurate --model gpt-4 --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gpt-3.5-turbo --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model llama3--fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model mixtral --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model claude --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

python process.py --what accurate --model gemini --fewshot yes --openai_key xxxx --together_key xxxx --claude_key xxxx --google_key xxxx

For Figure (1-5):

python heatmap.py

For Figure (7-10):

python ira_sample.py

What="accurate", "adequate", "concise", "similarity"

Clear search

Close search

Google apps

Main menu

Data from: Can LLMs Replace Manual Annotation of Software Engineering...

Finding Optimal Locations for Polling Stations in San Diego County

Data from: Can LLMs Replace Manual Annotation of Software Engineering Artifacts?See More Versions

Data from: Can LLMs Replace Manual Annotation of Software Engineering Artifacts?