Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
shcd $Project_Pathbash run.sh
## Parameter setting description- To adjust the weight ranges of positive samples, modify the softmax operation for 'ai' on line 158 of 'utils.py'.- To adjust the weight ranges of negative sample, adjust 'bi' on line 163 of 'utils.py'. ## Code SearchThe dataset file contains the code retrieval datasets and the code classification datasets. python run.py \ --output_dir=./python \ --config_name=/graphcodebert-base \ --model_name_or_path=/graphcodebert-base \ --tokenizer_name=/graphcodebert-base \ --lang=python \ --do_train \ --train_data_file=/dataset/CSN-Python/train.jsonl \ --eval_data_file=/dataset/CSN-Python/test.jsonl \ --test_data_file=/dataset/CSN-Python/test.jsonl \ --codebase_file=/dataset/CSN-Python/codebase.jsonl \ --num_train_epochs 20 \ --code_length 318 \ --data_flow_length 64 \ --nl_length 256 \ --train_batch_size 32 \ --eval_batch_size 64 \ --learning_rate 2e-5 \ --seed 42
https://www.icpsr.umich.edu/web/ICPSR/studies/37519/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37519/terms
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who use or do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled primary sampling units (PSU)s and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This "second replenishment sample" was combined for estimation and analysis purposes with the Wave 7 adult and youth respondents from the Wave 4 Cohorts who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Wave 4.5 was a special data collection for youth only who were aged 12 to 17 at the time of the Wave 4.5 interview. Wave 4.5 was the fourth annual follow-up wave for those who were members of the Wave 1 Cohort. For those who were sampled at Wave 4, Wave 4.5 was the first annual follow-up wave. Wave 5.5, conducted in 2020, was a special data collection for Wave 4 Cohort youth and young adults ages 13 to 19 at the time of the Wave 5.5 interview. Also in 2020, a subsample of Wave 4 Cohort adults ages 20 and older were interviewed via the PATH Study Adult Telephone Survey (PATH-ATS). Wave 7.5 was a special collection for Wave 4 and Wave 7 Cohort youth and young adults ages 12 to 22 at the time of the Wave 7.5 interview. For those who were sampled at Wave 7, Wave 7.5 was the first annual follow-up wave. Dataset 1002 (DS1002) contains the data from the Wave 4.5 Youth and Parent Questionnaire. This file contains 1,617 variables and 13,131 cases. Of these cases, 11,378 are continuing youth having completed a prior Youth Interview. The other 1,753 cases are "aged-up youth" having previously been sampled as "shadow youth" Datasets 1112, 1212, and 1222, (DS1112, DS1212, and DS1222) are data files comprising the weight variables for Wave 4.5. The "all-waves" weight file contains weights for participants in the Wave 1 Cohort who completed a Wave 4.5 Youth Interview and completed interviews (if old enough to do so) or verified their information with the study (if not old enough to be interviewed) in Waves 1, 2, 3, and 4. There are two separate files with "single wave" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "single-wave" weight file for the Wave 1 Cohort contains weights for youth who c
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The R2R dataset consists of human annotated instructions corresponding to the paths in these graphs. Each path consists of a sequence of viewpoints encountered by the agent during navigation. A derived dataset, Fine-Grained R2R (FGR2R) [ 12] dataset, annotated parts of instructions with corresponding graph edges to obtain a fine-grained dataset. Existing works in VLN have shown that more instruction examples can improve an agent’s performance in
previously unseen environments.
Hence to augment training data, we mix parts of paths from the FGR2R dataset to obtain additional instruction-trajectory pairs. The paths are mixed from other neighboring paths that are part of
same house which sustains both view and instruction consistency.
To mix paths we identify all the edges in the graph corresponding
to the start of navigation 𝜀𝑠𝑡𝑎𝑟𝑡 and end of navigational episodes
𝜀𝑒𝑛𝑑 . These edges are important for mixing as they correspond
to micro-instructions (Walk away from the desk, Turn right etc.)
that refer to start and stop positions in the house while other edges
correspond to instructions that back reference to previous locations.
The remaining transition edges 𝜀𝑡𝑟𝑎𝑛𝑠 are mixed to obtain a path
𝜀𝑠𝑡𝑎𝑟𝑡 → 𝜀𝑡𝑟𝑎𝑛𝑠 → 𝜀𝑒𝑛𝑑 . Not all edges are inter-connectable, as
some of the nodes could be spatially close to each other - reducing
the visual variety of viewpoints or resulting in the repetition of
micro-instructions (short but actionable instructions) in the final
instruction. Accordingly, the edges are connected based on the
following criteria: (1) the distance between any 2 nodes should be
greater than 3m and the angle between edges should not acute
to prevent navigating in loops (2) the distance between the start
and end nodes should be greater than 3m to ensure that the path
ends up in a different room (3) the start and end nodes cannot
have a common edge (4) micro-instructions from common edges
of different paths are chosen randomly. The final instruction is
the sequence of micro-instructions and the path is the sequence of
edges (Figure 2). Using this method, we generate 162k instruction-
trajectory pairs with path lengths between 5m and 30m. The final dataset
has on average 7.27 views per path, a mean of 14.4m trajectory
length and an average of 82 words per instruction.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a source code file and the code language is MATLAB, Propose an improved algorithm based on the traditional A* algorithm, which expands the search step and search angle - Improv-A*. This algorithm not only improves the search speed but also enhances search efficiency, reducing the total planning distance. In order to achieve a combination of static global path planning and dynamic local path planning, we attempt to integrate Improv-A* algorithm with artificial potential field method to achieve dynamic path planning for unmanned aerial vehicles.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for PathVQA
Dataset Description
PathVQA is a dataset of question-answer pairs on pathology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from two publicly-available pathology textbooks: "Textbook of Pathology" and "Basic Pathology", and a publicly-available digital library: "Pathology… See the full description on the dataset page: https://huggingface.co/datasets/flaviagiammarino/path-vqa.
This dataset provides information about the number of properties, residents, and average property values for View Path cross streets in The Villages, FL.
The wayfinding system is important for the people on campus. However, the existed wayfinding system of UBC does not consider some walkable paths which are not shown on the street map. Also, the wayfinding system ignores the barriers like stairs, which could be obstacles for wheelchair users, on the paths. LiDAR is developed rapidly in recent years. It can collect the elevation information of the objectives on the ground. University of British Columbia (UBC) collects and publishes the LiDAR dataset of campus every year. This project uses the elevation and the point intensity information from the LiDAR point dataset to identify the walkable paths and the barriers on the paths. Two algorithms are announced. The first one is the intensity-based path identification algorithm, which assumes that the concrete paths have a homogenous intensity. Another algorithm is the barrier identification algorithm, which is based on the Canny edge detection algorithm. As a result, the two algorithms both work well in the research area, and they have the potential to be developed as an automatic process and can be one part of the wayfinding system.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Honea Path population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of Honea Path. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.
Key observations
The largest age group was 18 to 64 years with a poulation of 2,230 (59.72% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age cohorts:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Honea Path Population by Age. You can refer the same here
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
After detemining that there is no direct connection to the port in the network diagram, get the direct connection distance between ports through the port.sol.com.cn、SeaRates.com and McDistance shipping calculation tool. If there is a big difference between the three query data, the average value method is used for optimization, get the table Port Distance.Using the Floyd algorithm, the path between two ports in the port network graph is solved on the basis of the table Port Distance, there maybe multiple shortest paths between two ports, but this situation is not considered here, the only result will be the result of Python simulation, get the table Port Shortest Path.After get the Port Shortest Path, calculate the value of the shortest path between two ports, get the table Port Shortest Path Value.According to the shortest path between two ports, count the number of routes for each port, then use the K-Medoids, construting the model of strategic importance of ports, get the table The number of ports is crossed by the shortest path.According to the principle of the Betweenness Centrality model, the Betweenness Centrality of each port in the whole network is obtained by the table Port Shortest Path, and then use the K-Medoids, get the table Port Betweenness Centrality.The values and contents of the table The number of ports is crossed by the shortest path and the table Betweenness Centrality Group are combined together to get the table Total Group to facilitate data search.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book series. It has 1 row and is filtered where the books is Pennsylvania off the beaten path : discover your fun. It features 2 columns including publication dates.
My Dataset
This dataset includes research articles with metadata and images.
Features
The dataset contains the following features:
pmid: The PubMed ID of the article (string). pmcid: The PubMed Central ID of the article (string). title: The title of the article (string). abstract: The abstract of the article (string). fulltext: The full text of the article (string). images: Contains image data with the following fields: bytes: Binary image data. path: Relative path to… See the full description on the dataset page: https://huggingface.co/datasets/Power108/trial_image_dataset.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Programming Languages Infrastructure as Code (PL-IaC) enables IaC programs written in general-purpose programming languages like Python and TypeScript. The currently available PL-IaC solutions are Pulumi and the Cloud Development Kits (CDKs) of Amazon Web Services (AWS) and Terraform. This dataset provides metadata and initial analyses of all public GitHub repositories in August 2022 with an IaC program, including their programming languages, applied testing techniques, and licenses. Further, we provide a shallow copy of the head state of those 7104 repositories whose licenses permit redistribution. The dataset is available under the Open Data Commons Attribution License (ODC-By) v1.0.
Contents:
This artifact is part of the ProTI Infrastructure as Code testing project: https://proti-iac.github.io.
The dataset's metadata comprises three tabular CSV files containing metadata about all analyzed repositories, IaC programs, and testing source code files.
repositories.csv:
programs.csv:
testing-files.csv:
scripts-and-logs.zip contains all scripts and logs of the creation of this dataset. In it, executions/executions.log documents the commands that generated this dataset in detail. On a high level, the dataset was created as follows:
The repositories are searched through search-repositories.py and saved in a CSV file. The script takes these arguments in the following order:
Pulumi projects have a Pulumi.yaml or Pulumi.yml (case-sensitive file name) file in their root folder, i.e., (3) is Pulumi and (4) is yml,yaml. https://www.pulumi.com/docs/intro/concepts/project/
AWS CDK projects have a cdk.json (case-sensitive file name) file in their root folder, i.e., (3) is cdk and (4) is json. https://docs.aws.amazon.com/cdk/v2/guide/cli.html
CDK for Terraform (CDKTF) projects have a cdktf.json (case-sensitive file name) file in their root folder, i.e., (3) is cdktf and (4) is json. https://www.terraform.io/cdktf/create-and-deploy/project-setup
The script uses the GitHub code search API and inherits its limitations:
More details: https://docs.github.com/en/search-github/searching-on-github/searching-code
The results of the GitHub code search API are not stable. However, the generally more robust GraphQL API does not support searching for files in repositories: https://stackoverflow.com/questions/45382069/search-for-code-in-github-using-graphql-v4-api
download-repositories.py downloads all repositories in CSV files generated through search-respositories.py and generates an overview CSV file of the downloads. The script takes these arguments in the following order:
The script only downloads a shallow recursive copy of the HEAD of the repo, i.e., only the main branch's most recent state, including submodules, without the rest of the git history. Each repository is downloaded to a subfolder named by the repository's ID.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The entity relatedness problem refers to the question of exploring a knowledge base, represented as an RDF graph, to discover and understand how two entities are connected. More precisely, this problem can be defined as: “Given an RDF graph 'G' and a pair of entities 'a' and 'b', represented in 'G', compute the paths in 'G' from 'a' to 'b' that best describe the connectivity between them”.This dataset supports the evaluation of approaches that address the entity relatedness problem and contains a total of 240 ranked lists with 50 relationship paths each between entity pairs in two familiar domains, music and movies, in two subsets of the DBpedia that we called DBpedia21M and DBpedia45M. Specifically, we extracted data from the following two publicly available subsets of the English DBpedia corpus to form our two knowledge bases:1. mappingbased-objects: https://downloads.dbpedia.org/repo/dbpedia/mappings/mappingbased-objects/2021.03.01/mappingbased-objects_lang=en.ttl.bz22. infobox-properties: https://downloads.dbpedia.org/repo/dbpedia/generic/infobox-properties/2021.03.01/infobox-properties_lang=en.ttl.bz2 DBpedia21M contains the statements in the mappingbased-objects dataset, and DBpedia45M contains the union of the statements in mappingbased-objects and in infobox-properties. In both cases, we exclude statements involving literals or blank nodes.For each dataset (DBpedia21M and DBpedia45M), the ground truth contains 120 ranked lists with 50 relationship paths each. Each list corresponds to the most relevant paths between one of the 20 entity pairs, 10 pairs from the music domain and 10 from the movie domain, found using different path search strategies.A path search strategy consists of an entity similarity measure and a path ranking measure. The ground truth was created using the following 6 strategies:1. Jaccard Index & Predicate Frequency Inverse Triple Frequency (PF-ITF)2. Jaccard Index & Exclusivity-based Relatedness (EBR)3. Jaccard Index & Pointwise Mutual Information (PMI)4. Wikipedia Link-based Measure (WLM) & PF-ITF5. WLM & EBR6. WLM & PMIThe filename of a file that contains the ranked list of 50 relationship paths between a pair of entities has the following format:[Dataset].[EntityPairID].[SearchStrategyID].[Entity1-Entity2].txtExample 1: DBpedia21M.1.2.Michael_Jackson-Whitney_Houston.txtExample 2: DBpedia45M.27.4.Paul_Newman-Joanne_Woodward.txtThe file in Example 1 contains the top-50 most relevant paths between Michael Jackson and Whitney Houston in DBpedia21M using the search strategy number 2 (Jaccard Index & EBR)The file in Example 2 contains the top-50 most relevant paths between Paul Newman and Joanne Woodward in DBpedia45M using the search strategy number 4 (WLM & PF-ITF)The data is splitted into 2 files, one for each dataset and compressed in .zip format:DBpedia21M.GT.zip: contains 180 .txt files representing the ranked lists of relationship paths between entity pairs in DBpedia21M dataset. DBpedia45M.GT.zip: contains 180 .txt files representing the ranked lists of relationship paths between entity pairs in DBpedia45M dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Path to a SAEIV line’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/61a025be66bcd934d64ed79e on 17 January 2022.
--- Dataset description provided by original source is as follows ---
This non-graphic dataset from the Operating Assistance and Traveller Information System (SAEIV) represents the variants of line paths used by vehicles in the TBM network. A line path is an orderly sequence of consecutive sections, with a direction (Go or back).
This dataset can be linked to Course of a vehicle on a path, SAEIV commercial line, Elementary Pathway Round, Deviation Round, physical stop on the network, bus schedules for the next 14 days and Vehicle in service on the network
This data set is refreshed all: 1 hour(s). Be careful, for performance reasons, this dataset (Table, Map, Analysis and Export tabs) can be updated less frequently than the source and a deviation may exist. We also invite you to use our Webservices (see Webservices BM tab) to retrieve the freshest data.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Do institutions rule when explaining cross-country divergence? By employing regression tree analysis to uncover the existence and nature of multiple development clubs and growth regimes, this paper finds that to a large extent they do. However, the role of ethnic fractionalization cannot be dismissed. The findings suggest that sufficiently high-quality institutions may be necessary for the negative impact on development from high levels of ethnic fractionalization to be mitigated. Interestingly, I find no role for geographic factors-neither those associated with climate nor physical isolation-in explaining divergence. There is also no evidence to suggest a role for religious fractionalization.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Terms of Use
By using the dataset, you agree to comply with the dataset license (CC-by-4.0-Deed).
Download Instructions
To download one file, please use from huggingface_hub import hf_hub_download
local_directory = 'LOCAL_DIRECTORY'
filepath = 'FILE_PATH'
repo_id = "climateset/climateset" repo_type = "dataset" hf_hub_download(repo_id=repo_id… See the full description on the dataset page: https://huggingface.co/datasets/climateset/climateset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
在确定网络图中端口之间没有直接连接后,请通过port.sol.com.cn,SeaRates.com和McDistance运输计算工具获得端口之间的直接连接距离。如果三个查询数据之间有较大差异,则使用平均值法进行优化,获得表端口距离。
Using the Floyd algorithm, the path between two ports in the port network graph is solved on the basis of the table Port Distance, there maybe multiple shortest paths between two ports, but this situation is not considered here, the only result will be the result of Python simulation, get the table Port Shortest Path.
After get the Port Shortest Path, calculate the value of the shortest path between two ports, get the table Port Shortest Path Value.
According to the shortest path between two ports, count the number of routes for each port, then use the K-Medoids, construting the model of strategic importance of ports, get the table Number of ports are crossed by the shortest path.
根据“中间性中心性”模型的原理,通过“端口最短路径”表获得整个网络中每个端口的“中间性中心性”,然后使用K-Medoids来获得“端口中间性中心性”表。
表“端口通过次数组”和“中间性组”表的值和内容组合在一起,得到表“总组”,以方便数据搜索。
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
https://www.usgs.gov/information-policies-and-instructions/acknowledging-or-crediting-usgshttps://www.usgs.gov/information-policies-and-instructions/acknowledging-or-crediting-usgs
This dataset shows the tiling grid and their Row and Path IDs for Landsat 4 - 9 satellite imagery. The IDs are useful for selecting imagery of an area of interest. Landsat 4 - 9 are a series of Earth observation satellites, part of the US Landsat program aimed at monitoring Earth's land surfaces since 1982.
The Worldwide Reference System (WRS) is a global notation system used for cataloging and indexing Landsat imagery. It employs a grid-based system consisting of path and row numbers, where the path indicates the longitude and the row indicates the latitude, allowing users to easily locate and identify specific scenes covering a particular area on Earth.
Landsat satellites 4,5,7, 8, and 9 follow WRS-2 which this dataset describes.
This dataset corresponds to the descending Path Row identifiers as these correspond to day time scenes.
eAtlas Notes: It should be noted that the extent boundaries of the scene polygons in this dataset are only indicative of the imagery extent. For Landsat 5 images the individual images move around by about 10 km and the shape of the Landsat 8 and 9 images do not match the shape of the WRS-2 polygons. The angle of the top and bottom edges are at a different angle to the imagery, where the imagery is more square in shape. The left and right edges of the polygons are also smaller than the imagery. As a result of this, this dataset is probably not suitable as a clipping mask for the imagery for these satellites.
This dataset is suitable for determining the approximate extent of the imagery and the associated Row and Path IDs for a given scene.
Why is this dataset in the eAtlas?: Landsat imagery is very useful for the studying and mapping of reef systems. Selecting imagery for study often requires knowing the Path and Row numbers for the area of interest. This dataset is intended as a reference layer. This metadata is included to link to from the associated mapping layer. The eAtlas is not the custodian of this dataset and copies of the data should be obtained from the original sources. The eAtlas does however keep a cached version of the dataset from the time this dataset was setup to make available should the original dataset no longer become available.
eAtlas Processing: The original data was sourced from USGS (See links). No modifications to the underlying data were performed.
Location of the data:
This dataset is filed in the eAtlas enduring data repository at: data
on-custodian\2020-2024\World_USGS_Landsat-WRS-2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The tested data for Inter-Domain Path Computation under Domain Uniqueness constraint (IDPCDU).
On account of no public dataset to be available for the IDPC-NDU problem, two distinct types of instances are created based on the dataset of IDPC-EDU, which is also a shortest-path problem. We first generated three parameters for each instance: number of nodes, number of domains, and number of edges. After that, an optimal path p where the weight of edges is equal to 1 and the number of domains on p is approximately the input graph’s domain number. Next, the noise is added to the instance by for every node in p, besides random weight edges, several random one-weight edges from that node to some other nodes not in p and some random edges with greater values of weight than the total cost of p are added into. These traps make simple greedy algorithms harder to find the optimal solution. Especially in Type 2, feasible paths whose length is less than three are removed. The datasets are categorized into two kinds regarding dimensionality: small instances, each of which has between 50 and 2000 vertices, and large instances, each of which has over 2000 vertices.
Filename idpc_
First line of a file constains two intergers N and D, which are number of nodes and number of domains, respectively. Second line contains two integers s and t, which are the source node and terminal node. Every next line contains four integers u, v, w, d, represents an edge (u,v) has weight w and belong to domain d.
According to INSPIRE transformed development plan “See Path, Building Lines” of the city of Sachsenheim based on an XPlanung dataset in version 5.0.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
shcd $Project_Pathbash run.sh
## Parameter setting description- To adjust the weight ranges of positive samples, modify the softmax operation for 'ai' on line 158 of 'utils.py'.- To adjust the weight ranges of negative sample, adjust 'bi' on line 163 of 'utils.py'. ## Code SearchThe dataset file contains the code retrieval datasets and the code classification datasets. python run.py \ --output_dir=./python \ --config_name=/graphcodebert-base \ --model_name_or_path=/graphcodebert-base \ --tokenizer_name=/graphcodebert-base \ --lang=python \ --do_train \ --train_data_file=/dataset/CSN-Python/train.jsonl \ --eval_data_file=/dataset/CSN-Python/test.jsonl \ --test_data_file=/dataset/CSN-Python/test.jsonl \ --codebase_file=/dataset/CSN-Python/codebase.jsonl \ --num_train_epochs 20 \ --code_length 318 \ --data_flow_length 64 \ --nl_length 256 \ --train_batch_size 32 \ --eval_batch_size 64 \ --learning_rate 2e-5 \ --seed 42