https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
An academic journal or research journal is a periodical publication in which research articles relating to a particular academic discipline is published, according to Wikipedia. Currently, there are more than 25,000 peer-reviewed journals that are indexed in citation index databases such as Scopus and Web of Science. These indexes are ranked on the basis of various metrics such as CiteScore, H-index, etc. The metrics are calculated from yearly citation data of the journal. A lot of efforts are given to make a metric that reflects the journal's quality.
This is a comprehensive dataset on the academic journals coving their metadata information as well as citation, metrics, and ranking information. Detailed data on their subject area is also given in this dataset. The dataset is collected from the following indexing databases: - Scimago Journal Ranking - Scopus - Web of Science Master Journal List
The data is collected by scraping and then it was cleaned, details of which can be found in HERE.
Rest of the features provide further details on the journal's subject area or category: - Life Sciences: Top level subject area. - Social Sciences: Top level subject area. - Physical Sciences: Top level subject area. - Health Sciences: Top level subject area. - 1000 General: ASJC main category. - 1100 Agricultural and Biological Sciences: ASJC main category. - 1200 Arts and Humanities: ASJC main category. - 1300 Biochemistry, Genetics and Molecular Biology: ASJC main category. - 1400 Business, Management and Accounting: ASJC main category. - 1500 Chemical Engineering: ASJC main category. - 1600 Chemistry: ASJC main category. - 1700 Computer Science: ASJC main category. - 1800 Decision Sciences: ASJC main category. - 1900 Earth and Planetary Sciences: ASJC main category. - 2000 Economics, Econometrics and Finance: ASJC main category. - 2100 Energy: ASJC main category. - 2200 Engineering: ASJC main category. - 2300 Environmental Science: ASJC main category. - 2400 Immunology and Microbiology: ASJC main category. - 2500 Materials Science: ASJC main category. - 2600 Mathematics: ASJC main category. - 2700 Medicine: ASJC main category. - 2800 Neuroscience: ASJC main category. - 2900 Nursing: ASJC main category. - 3000 Pharmacology, Toxicology and Pharmaceutics: ASJC main category. - 3100 Physics and Astronomy: ASJC main category. - 3200 Psychology: ASJC main category. - 3300 Social Sciences: ASJC main category. - 3400 Veterinary: ASJC main category. - 3500 Dentistry: ASJC main category. - 3600 Health Professions: ASJC main category.
This API is designed to find the rankings by any geography type within the state with a specific census metric (population or household) and ranking metric (any of the metrics from provider, demographic, technology or speed). Only the top ten and bottom ten rankings would be returned through the API if the result set is greater than 500; otherwise full ranking list be returned.
This statistic illustrates the ranking of product types returned in Poland in 2019. As of 2019, roughly *** percent of the Polish respondents returned home furnishing that they had purchased online in the past year. The most frequent return products were clothing and footwear.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
C program implementing the method described in the paper;Gcc makefile for compiling the C program;Example data for use with the C program;Python program for generating synthetic test data;Instructions for use of the other files
Types of harmful content that had max ranking differences (Δrank) of at least 33, or half of the total number of rank positions.
A ranking of North American tree species based on keyword searches of genus and species using Web of Science, AGRICOLA, and CAB Abstracts. A bibliographic analysis using three bibliographic databases was conducted to understand the importance of North American tree species in literature published since 1900 and since the development of the last Silvics of North America in the 1980s. The Silvics of North America is the most comprehensive guide to North American tree-like species containing about 200 of the 696 tree-like species found in the literature. Depending on the bibliographic database, over 100,000 to over 600,000 publications on the 696 plant species can be obtained. At least half of the publication records are from 1985 forward. The ranking in this database is based on the average rank of six bibliographic searches (three bibliographic databases and two time frames: 1900-2020 and 1985-2020).
💬Also have a look at
💡 COUNTRIES Research & Science Dataset - SCImagoJR
💡 UNIVERSITIES & Research INSTITUTIONS Rank - SCImagoIR
☢️❓The entire dataset is obtained from public and open-access data of ScimagoJR (SCImago Journal & Country Rank)
ScimagoJR Journal Rank
SCImagoJR About Us
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
The Dataset represents the County Health Ranking of all states taking into account the various factors The County Health Rankings can be used to highlight regional variations in health, increase public understanding of the various factors that affect health, and inspire actions to improve community health. The Rankings capitalizes on our innate desire to compete by enabling comparisons across adjacent or comparable counties within states.
The CSV file contains the rankings and data details for the measures used in the 2022/23 County Health Rankings.
1) Outcomes and Factors Rankings --Ranks are all calculated and reported WITHIN states
2)**Outcomes and Factors SubRankings** --Ranks are all calculated and reported WITHIN states
3) Ranked Measure Data --The measures themselves are listed in bold.
4) Ranked Measure Sources & Years
5) Additional Measure Data --These are supplemental measures reported on the Rankings web site but not used in calculating the rankings.
6) Additional Measure Sources & Years
The Data Types of all Columns are automatically set to "Object"
To change it just use data.apply(pd.to_numeric)
This statistic shows data on the most popular types of magazines in Germany from 2014 to 2015. In 2014, **** million Germans aged 14 years and older had read a television magazine within the last three months.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ranking of the first four study designs to address the types of question in which consensus was reached in stage 2.
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.
Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.
Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.
Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:
Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.
Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.
Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.
Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).
Background and Motivation
In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.
While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.
In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.
However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.
Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2
Paper: to be published
This dataset provides related gridded outputs of future modeled forest carbon sequestration priority and related species richness and habitat suitability for the western United States. The primary dataset is of the ranking of forest lands in the western U.S. for preservation based on the ability of these lands to sequester carbon over the coming century. The preservation ranking was derived from the results of simulations of future potential forest net ecosystem productivity (NEP) and vulnerability to drought and wildfire, as modeled from 2020 to 2099 at 4 km x 4 km resolution using a modified version of the Community Land Model (CLM 4.5). In addition, data files of potential forest NEP ranking and the forest vulnerability ranking are also provided. Co-located data of species richness for amphibians, birds, mammals, and reptiles are included to illustrate habitat suitability in relation to forest carbon preservation rankings. There are two files for each vertebrate class, one reflecting all western U.S. species included in the USGS GAP Analysis Project and a second for the subset of species listed as threatened or endangered by the U.S. Fish and Wildlife Service. Establishing this forest carbon preservation priority ranking for forest lands in the western U.S. will help guide the conservation of land for climate change mitigation activities and improved harvest management in the region.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset contains ranking-and-academic-year, university/institute- and state-wise compiled data on the capital and operational expenditure incurred by the educational institutions, as per the National Institutional Ranking Framework (NIRF) data. The different types of expenditure covered in the dataset include expenditure incurred towards seminars/conferences/workshops, salaries (teaching and non-teaching staff), maintenance, other running expenditures and creation of capital assets, new equipment for laboratories, libraries, etc.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Electoral rules can affect who wins, who loses, and how voters feel about the electoral process. Most cities select office holders through plurality rule, but an alternative, ranked choice voting (RCV), has become increasingly popular. RCV requires voters to rank candidates, instead of simply selecting their most preferred candidate. Observers debate whether RCV will cure a variety of electoral ills or undermine representation. We test the effect of RCV on voter’s choices and perceptions of representation using survey experiments with large, representative samples of respondents. We find that candidates of color are significantly penalized in both plurality and RCV elections, with no significant difference between the rule types. However, providing respondents with candidates’ partisan affiliation significantly increases support for candidates of color.
The dataset includes Land Use/Land Cover types throughout the Chenier Eco-Region in Southwest Louisiana. Using the 2015 National Aerial Imagery Program (NAIP) dataset (1m) as the basemap, E-Cognition image objects were derived from the multiresolution segmentation algorithm at 75 and 250 segments. Attempts to refine the data training methods using E-cognition, to extrapolate automating categories of this information to the entire map resulted with exceedingly low accuracy. Therefore, a raster was produced by piecing together several data resources, which provide reliable data for specific LandUse/LandCover (LULC) categories. This process involved stitching together more reliable sources for specific categories to apply to higher resolution (75) segmentation product. Reference datasets include; 12,000 aerial points assigned to image objects derived from 75 segmentation settings (previously used to develop scripts for data training), mask created from National Wetlands Inventory (NWI) 2008 including water, wetland forested, upland forested and scrub/shrub categories, Bureau of Ocean Energy and Management (BOEM) marsh classes, National Land Cover Dataset (NLCD) urban areas, and Cropescape Data Layer (CDL) data. The raster produced from this process was applied to the vector image objects derived from the 250 segmentation settings, using a majority filter (greater then; 50 percent). The series of draft shapefiles were manually edited and merged, resulting in the final dataset. This vector dataset was then converted into a 10 meter raster datase(https://doi.gov/10.5066/F7KW5DJW). We used the Tabulate Area tool within the Spatial Analyst Tools in ArcGIS 10.4 (ESRI, Redlands, CA) to estimate the percentage of classified grasslands occurring on each soil type. Soil types with the highest percentages of grasslands occurring on them were identified. Most of these soils occurred in Calcasieu parish. Because each parish has different soil “MUSYSM” we could not just select by “MUSYSM”, so we had to manually identify those soils across parish lines that were identified previously. The following structured query language statement was built to identify those crosswalks between parishes."SOILDATA_Merge_Clip_Project.MUNAME" LIKE '% silt loam, 0 to 1 percent slopes%' OR "SOILDATA_Merge_Clip_Project.MUNAME " LIKE '% silt loams, 0 to 1 percent slopes%' OR "SOILDATA_Merge_Clip_Project.MUNAME" = 'Crowley-Vidrine complex' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Mr' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Mn' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Ju' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Mt' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'MoA' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Pa' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Co' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Pt' OR "SOILDATA_Merge_Clip_Project.MUSYM" = 'Cu'This selection was then used to mask the LULC areas to further prioritize. Prioritization or ranking of the LULC types was then accomplished by reclassifying the LULC types between 1 and 10 with 10 being the highest priority areas. The priority ranking was then grouped into High, Medium and Low areas of potential grassland areas to restore (High 10-8; Medium 7-4; Low 3-1).CodeClass Name Reclass_Rank Rank_Group10 Herbaceous Marsh 2 Low11 Fresh Marsh 1 Low 12 Intermediate Marsh 1 Low13 Brackish Marsh 1 Low14 Saline Marsh 1 Low20 Upland Forest 4 Medium21 Upland Forested Evergreen 7 Medium22 Upland Forested Deciduous 4 Medium23 Upland Forested Mixed 4 Medium30 Upland SS 7 Medium31 Upland SS Evergreen 8 High32 Upland SS Deciduous 7 Medium33 Upland SS Mixed 7 Medium40 Wetland Forest 3 Low41 Wetland Forested Evergreen 5 Medium42 Wetland Forested Deciduous 5 Medium43 Wetland Forested Mixed 4 Medium50 Wetland SS 5 Medium51 Wetland SS Evergreen 5 Medium52 Wetland SS Deciduous 5 Medium53 Wetland SS Mixed 5 Medium60 Swamp 1 Low70 Agriculture 8 High71 Row Crop 8 High72 Rice 8 High73 Sugarcane 8 High74 Grassland 10 High75 Pasture 9 High76 Orchard 8 High80 Urban 1 Low81 High Density Developed 1 Low82 Medium Density Developed 2 Low83 Low Density Developed 5 Medium90 Barren 1 Low100 Water 1 Low
GIC created the habitat cores model using the National Land Cover Database (NLCD) 2019 land cover data (the most recent land cover available when this project began). The NLCD provides nation-wide data on land cover and land cover change at the Landsat Thematic Mapper (TM) 30-meter resolution (30 x 30 meter pixels of analysis) and is appropriate for mapping rural landscapes.
To be considered a habitat core, the native landscape must encompass more than 100 acres of intact area. This acreage standard is based on studies evaluating the minimum acreage for terrestrial species to survive and thrive. For example, interior forest dwelling birds such as cerulean warblers need 100 acres of interior forest habitat for adequate foraging and nesting habitats. Large, intact forest cores are less impacted by disturbances and can better support area-sensitive and extinction-prone species because they retain larger populations, and their habitat is less likely to degrade through time (Ewers et al 2006). Forest fragments or woodlands less than 100 acres (known as patches) were also mapped to aid in identification of corridors or pathways for species to migrate across the landscape. These fragments, while not ideal habitat for larger species, can provide quality refugia for some species. Fragments can act also act as stepping stones, allowing species to move across the landscape while minimizing their exposure to predators and other disturbances. Such 2019 NLCD landcover types as forests and wetlands were then evaluated to determine their intactness by identifying features that fragment them, such as roads, buildings, transmission corridors, large rivers, and so on. These features bisect the landscape into smaller units (see maps). If an area is bisected too often, it does not contain a large enough habitat area to support interior nesting species and thus is too small to function as a habitat core.
To ensure that there is enough interior habitat, GIC’s analysts first subtract (clip out) the outer edge for a distance of 300 feet to ensure that potentially disturbed area is not counted as interior habitat. Edge areas are more likely to contain invasive species, suffer from wind impacts leading to dryness and blowdowns, and opportunistic predators such as domestic cats and dogs. In the final map of intact habitats, this edge area is added back in, but does not count towards the 100-acre minimum core size.The next step in the process is to divide the acreage into quintiles or “natural breaks.” This sorts the cores by size, which is the most important element for contributing to species abundance – bigger landscapes can generally support more species. However, there are other landscape factors that contribute to species abundance such as surface waters. Thus, in addition to geometry and extent, habitat cores are ranked based additional environmental attributes. Assigning attributes to each core allows for the identification and prioritization of specific high-quality and high-value habitat during strategy development. Not all habitats will be protected and resources for management or conservation are usually limited. Ranking habitat cores by their quality allows land-use planners, agency officials, and landowners or site managers to prioritize specific landscapes that provide the highest value for species.
The rankings use landscape-based environmental and ecological attributes. Examples of environmental attribute data used to rank cores include the number of wetlands found within a core; the presence of rare, threatened or endangered species; species richness; soil diversity; the length of stream miles; and topography. These factors all influence the diversity of plants, insects, animals and other biota within a forest or even a wetland core. Core Ranking is represented in the Habitat Core layer. To access it, download the Habitat Core Layer and view the “Score Weight” attribute field.
In 2020, more than ************ Poles bought clothes and shoes in online shops abroad, establishing this category as the most popular. Other types of goods that enjoyed popularity were home electronics and home furnishings, respectively.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Marine accident key variables for top-most dangerous goods, means of containment and instigating events. Annual data is available from 1987.
1 = most severe; 66 = least severe. The World ranking is a country-agnostic ranking instead of an aggregation from individual country rankings.
This graph shows a ranking of the most popular types of bread spreads in Germany in 2018 and 2020. Marmalades and jams are the most popular type of bread spread, with almost ** percent of the population consuming them in 2020, followed by honey with **** percent.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
An academic journal or research journal is a periodical publication in which research articles relating to a particular academic discipline is published, according to Wikipedia. Currently, there are more than 25,000 peer-reviewed journals that are indexed in citation index databases such as Scopus and Web of Science. These indexes are ranked on the basis of various metrics such as CiteScore, H-index, etc. The metrics are calculated from yearly citation data of the journal. A lot of efforts are given to make a metric that reflects the journal's quality.
This is a comprehensive dataset on the academic journals coving their metadata information as well as citation, metrics, and ranking information. Detailed data on their subject area is also given in this dataset. The dataset is collected from the following indexing databases: - Scimago Journal Ranking - Scopus - Web of Science Master Journal List
The data is collected by scraping and then it was cleaned, details of which can be found in HERE.
Rest of the features provide further details on the journal's subject area or category: - Life Sciences: Top level subject area. - Social Sciences: Top level subject area. - Physical Sciences: Top level subject area. - Health Sciences: Top level subject area. - 1000 General: ASJC main category. - 1100 Agricultural and Biological Sciences: ASJC main category. - 1200 Arts and Humanities: ASJC main category. - 1300 Biochemistry, Genetics and Molecular Biology: ASJC main category. - 1400 Business, Management and Accounting: ASJC main category. - 1500 Chemical Engineering: ASJC main category. - 1600 Chemistry: ASJC main category. - 1700 Computer Science: ASJC main category. - 1800 Decision Sciences: ASJC main category. - 1900 Earth and Planetary Sciences: ASJC main category. - 2000 Economics, Econometrics and Finance: ASJC main category. - 2100 Energy: ASJC main category. - 2200 Engineering: ASJC main category. - 2300 Environmental Science: ASJC main category. - 2400 Immunology and Microbiology: ASJC main category. - 2500 Materials Science: ASJC main category. - 2600 Mathematics: ASJC main category. - 2700 Medicine: ASJC main category. - 2800 Neuroscience: ASJC main category. - 2900 Nursing: ASJC main category. - 3000 Pharmacology, Toxicology and Pharmaceutics: ASJC main category. - 3100 Physics and Astronomy: ASJC main category. - 3200 Psychology: ASJC main category. - 3300 Social Sciences: ASJC main category. - 3400 Veterinary: ASJC main category. - 3500 Dentistry: ASJC main category. - 3600 Health Professions: ASJC main category.