Dataset Card for AIDS
Dataset Summary
The AIDS dataset is a dataset containing compounds checked for evidence of anti-HIV activity..
Supported Tasks and Leaderboards
AIDS should be used for molecular classification, a binary classification task. The score used is accuracy with cross validation.
External Use
PyGeometric
To load in PyGeometric, do the following: from datasets import load_dataset
from torch_geometric.data import Data from… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/AIDS.
AIDS is a graph dataset. It consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral Screen Database of Active Compounds. It contains 4395 chemical compounds, of which 423 belong to class CA, 1081 to CM, and the remaining compounds to CI.
HIV/AIDS funding by the NIH stood at around 3.3 billion U.S. dollars in fiscal year 2023. This graph shows the total HIV/AIDS funding by the National Institutes for Health (NIH) from FY 2013 to FY 2023 and estimates for FY 2024 and FY 2025.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
2011 to present. BRFSS combined land line and cell phone prevalence data. BRFSS is a continuous, state-based surveillance system that collects information about modifiable risk factors for chronic diseases and other leading causes of death. Data will be updated annually as it becomes available. Detailed information on sampling methodology and quality assurance can be found on the BRFSS website (http://www.cdc.gov/brfss). Methodology: http://www.cdc.gov/brfss/factsheets/pdf/DBS_BRFSS_survey.pdf Glossary: https://chronicdata.cdc.gov/Behavioral-Risk-Factors/Behavioral-Risk-Factor-Surveillance-System-BRFSS-H/iuq5-y9ct
The AIDS Antiviral Screen dataset is a dataset of screens checking tens of thousands of compounds for evidence of anti-HIV activity. The available screen results are chemical graph-structured data of these various compounds.
As of 2023, South Africa was the country with the highest number of people living with HIV in Africa. At that time, around 7.7 million people in South Africa were HIV positive. In Mozambique, the country with the second-highest number of HIV-positive people in Africa, around 2.4 million people were living with HIV. Which country in Africa has the highest prevalence of HIV? Although South Africa has the highest total number of people living with HIV in Africa, it does not have the highest prevalence of HIV on the continent. Eswatini currently has the highest prevalence of HIV in Africa and worldwide, with almost 26 percent of the population living with HIV. South Africa has the third-highest prevalence, with around 18 percent of the population HIV positive. Eswatini also has the highest rate of new HIV infections per 1,000 population worldwide, followed by Lesotho and South Africa. However, South Africa had the highest total number of new HIV infections in 2023, with around 150,000 people newly infected with HIV that year. Deaths from HIV in Africa Thanks to advances in treatment and awareness, HIV/AIDS no longer contributes to a significant amount of death in many countries. However, the disease is still the fourth leading cause of death in Africa, accounting for around 5.6 percent of all deaths. In 2023, South Africa and Nigeria were the countries with the highest number of AIDS-related deaths worldwide with 50,000 and 45,000 such deaths, respectively. Although not every country in the leading 25 for AIDS-related deaths is found in Africa, African countries account for the majority of countries on the list. Fortunately, HIV treatment has become more accessible in Africa over the years and now up to 95 percent of people living with HIV in Eswatini are receiving antiretroviral therapy (ART). Access to ART does vary from country to country, however, with around 77 percent of people who are HIV positive in South Africa receiving ART, and only 31 percent in the Congo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database defined from the AIDS Antiviral Screen Database of Active Compounds is composed of 2000 chemical compounds some of them being disconnected. These chemical compounds have been screened as active or inactive against HIV and they are split into three different sets:
Results on AIDS dataset.
Method | Classification accuracy (%) | |
(1) | Riesen and Bunke (2008) | 97.3 |
(2) | Suard et al. (2002) | 98.5 |
(3) | Vishwanathan et al. (2010) | 98.5 |
(4) | Neuhaus and Bunke (2007) | 99.7 |
(5) | Riesen et al. (2007) | 98.2 |
(6) | Graph Laplacian kernel | 99.3 |
(7) | Gauzere el al. (2012) | 99.1 |
The following slide sets are available to download for presentational use:
New HIV diagnoses, AIDS and deaths are collected from HIV outpatient clinics, laboratories and other healthcare settings. Data relating to people living with HIV is collected from HIV outpatient clinics. Data relates to England, Wales, Northern Ireland and Scotland, unless stated.
HIV testing, pre-exposure prophylaxis, and post-exposure prophylaxis data relates to activity at sexual health services in England only.
View the pre-release access lists for these statistics.
Previous reports, data tables and slide sets are also available for:
Our statistical practice is regulated by the Office for Statistics Regulation (OSR). The OSR sets the standards of trustworthiness, quality and value in the https://code.statisticsauthority.gov.uk/" class="govuk-link">Code of Practice for Statistics that all producers of Official Statistics should adhere to.
Additional information on HIV surveillance can be found in the HIV Action Plan for England monitoring and evaluation framework reports. Other HIV in the UK reports published by Public Health England (PHE) are available online.
Among all countries worldwide those in sub-Saharan Africa have the highest rates of HIV. The countries with the highest rates of HIV include Eswatini, Lesotho, and South Africa. In 2023, Eswatini had the highest prevalence of HIV with a rate of around ** percent. Other countries, such as Zimbabwe, have significantly decreased their HIV prevalence. Community-based HIV services are considered crucial to the prevention and treatment of HIV. HIV Worldwide The human immunodeficiency virus (HIV) is a viral infection that is transmitted via exposure to infected semen, blood, vaginal and anal fluids and breast milk. HIV destroys the human immune system, rendering the host unable to fight off secondary infections. Globally, the number of people living with HIV has generally increased over the past two decades. However, the number of HIV-related deaths has decreased significantly in recent years. Despite being a serious illness that affects millions of people, medication exists that effectively manages the progression of the virus in the body. These medications are called antiretroviral drugs. HIV Treatment Generally, global access to antiretroviral treatment has increased in recent years. However, despite being available worldwide, not all adults have access to antiretroviral drugs. Europe and North America have the highest rates of antiretroviral use among people living with HIV. There are many different antiretroviral drugs available on the market. As of 2024, ********, an antiretroviral marketed by Gilead, was the leading HIV treatment based on revenue.
This graph illustrates the gross rate of health insurance coverage for HIV in France in 2019, by age. It shows that the highest gross rate was found among people aged 55 to 64 with around 4.7 per thousand people.
This graph depicts the percentage of the types of discrimination against HIV positive people in China in 2009. 2.9 percent of HIV-positive women reported having been physically assaulted.
UNAIDS estimated that there were some ******* people worldwide that died from acquired immune deficiency syndrome (AIDS) in 2023. This statistic depicts the total number of annual AIDS-related deaths worldwide from 2000 to 2023. HIV/AIDS burden A majority of countries with the highest burden due to HIV and AIDS are in Africa- in 2023, the highest number of AIDS-related deaths occurred in South Africa and Nigeria and the highest prevalence of HIV was found in Eswatini. Although access to life-saving antiretroviral therapy treatment (ART) has increased globally over recent years, many individuals living with HIV still lack access to ART. Barriers and interventions In part due to the development of ART, the number of people living with HIV worldwide is continuing to increase, reaching almost ** million in 2023. Important public health measures to combat the burden of the disease include a combination of biomedical and behavioral interventions such as pre- and post-exposure prophylaxis, and context-specific structural interventions to reduce barriers to supplies and education. One prominent barrier faced by those living with HIV is stigma, which can often cause disadvantages in many areas of life, including employment, use of health services, and social support.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous researches support that graphs are relevant decision aids to tasks related to the interpretation of numerical information. Moreover, literature shows that different types of graphical information can help or harm the accuracy on decision making of accountants and financial analysts. We conducted a 4×2 mixed-design experiment to examine the effects of numerical information disclosure on financial analysts’ accuracy, and investigated the role of overconfidence in decision making. Results show that compared to text, column graph enhanced accuracy on decision making, followed by line graphs. No difference was found between table and textual disclosure. Overconfidence harmed accuracy, and both genders behaved overconfidently. Additionally, the type of disclosure (text, table, line graph and column graph) did not affect the overconfidence of individuals, providing evidence that overconfidence is a personal trait. This study makes three contributions. First, it provides evidence from a larger sample size (295) of financial analysts instead of a smaller sample size of students that graphs are relevant decision aids to tasks related to the interpretation of numerical information. Second, it uses the text as a baseline comparison to test how different ways of information disclosure (line and column graphs, and tables) can enhance understandability of information. Third, it brings an internal factor to this process: overconfidence, a personal trait that harms the decision-making process of individuals. At the end of this paper several research paths are highlighted to further study the effect of internal factors (personal traits) on financial analysts’ accuracy on decision making regarding numerical information presented in a graphical form. In addition, we offer suggestions concerning some practical implications for professional accountants, auditors, financial analysts and standard setters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area under the curve (AUC) of a receiver-operator characteristics (ROC) graph comparing the accuracy of the PwD, BED, and LAg assays in identifying HIV infection recency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Controlling severe outbreaks remains the most important problem in infectious disease area. With time, this problem will only become more severe as population density in urban centers grows. Social interactions play a very important role in determining how infectious diseases spread, and organization of people along social lines gives rise to non-spatial networks in which the infections spread. Infection networks are different for diseases with different transmission modes, but are likely to be identical or highly similar for diseases that spread the same way. Hence, infection networks estimated from common infections can be useful to contain epidemics of a more severe disease with the same transmission mode. Here we present a proof-of-concept study demonstrating the effectiveness of epidemic mitigation based on such estimated infection networks. We first generate artificial social networks of different sizes and average degrees, but with roughly the same clustering characteristic. We then start SIR epidemics on these networks, censor the simulated incidences, and use them to reconstruct the infection network. We then efficiently fragment the estimated network by removing the smallest number of nodes identified by a graph partitioning algorithm. Finally, we demonstrate the effectiveness of this targeted strategy, by comparing it against traditional untargeted strategies, in slowing down and reducing the size of advancing epidemics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Life table of cohort of HIV/AIDS patients attending follow-up at public health facilities in Arba Minch Town, Ethiopia.
The number of new cases of HIV diagnosed in the UK fluctuated over the observed period. In 2023, there were ***** new HIV cases recorded in the UK, highest in the given period. Cases of AIDS in the UK were significantly lower, with *** cases in 2023. STIs in the UK Other common STIs in the UK are herpes, gonorrhea, and chlamydia. Especially for gonorrhea and chlamydia, an increase in cases was observed between 2012 and 2019, while in 2020 and 2021 figures fell dramatically due to the COVID-19 pandemic and resulting lockdowns and social distancing. HIV in Europe New cases of HIV in Europe amounted to roughly **** thousand in 2023, of which **** thousand were among males. Among male individuals, the most common mode of HIV transmission in Europe in 2023 was among men having homosexual intercourse.
SciGraphQA is a large-scale, open-domain dataset focused on generating multi-turn conversational question-answering dialogues centered around understanding and describing scientific graphs and figures. It contains over 300,000 samples derived from academic research papers in computer science and machine learning domains.
Each sample in ScFiGraphQA consists of a scientific graph image sourced from papers on ArXiv, accompanied by rich textual context including the paper's title, abstract, figure caption, and a paragraph from the paper referencing the figure. Using this comprehensive context, the dataset employs a to produce multi-turn question-answer dialogues aimed at explaining the given graph in an interactive, conversational format. On average, each sample contains 2-3 turns of question-answer exchange.
The key motivation behind SciGraphQA is providing a large-scale resource to support research and development of multi-modal AI systems that can engage in informative, open-ended conversations about graphs and data visualizations. The multi-turn dialogue format presents a more natural and interactive setting compared to standard visual question answering datasets that use fixed sets of standalone questions.
Potential use cases of SciGraphQA include pre-training and benchmarking multi-modal conversational models for scientific graph comprehension, building AI assistants that can discuss data insights, and developing aids to help individuals understand complex figures and diagrams interactively. The academic source material also provides a way to evaluate model capabilities on expert-level graphs spanning diverse topics and complex visual encodings.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing predicates in the ORKG semantically relevant to the given paper.
The paper instances in the dataset are grouped by ORKG comparisons and therefore the data.json file is more comprehensive than training_set.json and test_set.json.
data.json
The main JSON object consists of a list of comparisons. Each comparisons object has an ID, label, list of papers and list of predicates, whereas each paper object has ID, label, DOI, research field, research problems and abstract. Each predicate object has an ID and a label. See an example instance below.
{ "comparisons": [ { "id": "R108331", "label": "Analysis of approaches based on required elements in way of modeling", "papers": [ { "id": "R108312", "label": "Rapid knowledge work visualization for organizations", "doi": "10.1108/13673270710762747", "research_field": { "id": "R134", "label": "Computer and Systems Architecture" }, "research_problems": [ { "id": "R108294", "label": "Enterprise engineering" } ], "abstract": "Purpose \u2013 The purpose of this contribution is to motivate a new, rapid approach to modeling knowledge work in organizational settings and to introduce a software tool that demonstrates the viability of the envisioned concept.Design/methodology/approach \u2013 Based on existing modeling structures, the KnowFlow toolset that aids knowledge analysts in rapidly conducting interviews and in conducting multi\u2010perspective analysis of organizational knowledge work is introduced.Findings \u2013 This article demonstrates how rapid knowledge work visualization can be conducted largely without human modelers by developing an interview structure that allows for self\u2010service interviews. Two application scenarios illustrate the pressing need for and the potentials of rapid knowledge work visualizations in organizational settings.Research limitations/implications \u2013 The efforts necessary for traditional modeling approaches in the area of knowledge management are often prohibitive. This contribution argues that future research needs ..." }, .... ], "predicates": [ { "id": "P37126", "label": "activities, behaviours, means [for knowledge development and/or for knowledge conveyance and transformation" }, { "id": "P36081", "label": "approach name" }, .... ] }, .... ] }
training_set.json and test_set.json
The main JSON object consists of a list of training/test instances. Each instance has an instance_id with the format (comparison_id X paper_id) and a text. The text is a concatenation of the paper's label (title) and abstract. See an example instance below.
Note that test instances are not duplicated and do not occur in the training set. Training instances are also not duplicated, BUT training papers can be duplicated in a concatenation with different comparisons.
{ "instances": [ { "instance_id": "R108331xR108301", "comparison_id": "R108331", "paper_id": "R108301", "text": "A notation for Knowledge-Intensive Processes Business process modeling has become essential for managing organizational knowledge artifacts. However, this is not an easy task, especially when it comes to the so-called Knowledge-Intensive Processes (KIPs). A KIP comprises activities based on acquisition, sharing, storage, and (re)use of knowledge, as well as collaboration among participants, so that the amount of value added to the organization depends on process agents' knowledge. The previously developed Knowledge Intensive Process Ontology (KIPO) structures all the concepts (and relationships among them) to make a KIP explicit. Nevertheless, KIPO does not include a graphical notation, which is crucial for KIP stakeholders to reach a common understanding about it. This paper proposes the Knowledge Intensive Process Notation (KIPN), a notation for building knowledge-intensive processes graphical models." }, ... ] }
Dataset Statistics:
-
Papers
Predicates
Research Fields
Research Problems
Min/Comparison
2
2
1
0
Max/Comparison
202
112
5
23
Avg./Comparison
21,54
12,79
1,20
1,09
Total
4060
1816
46
178
Dataset Splits:
-
Papers
Comparisons
Training Set
2857
214
Test Set
1203
180
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Types of OI reoccurred among cohort of HIV/AIDS patient attending ART at public health facility in Arba Minch Town, Ethiopia.
Dataset Card for AIDS
Dataset Summary
The AIDS dataset is a dataset containing compounds checked for evidence of anti-HIV activity..
Supported Tasks and Leaderboards
AIDS should be used for molecular classification, a binary classification task. The score used is accuracy with cross validation.
External Use
PyGeometric
To load in PyGeometric, do the following: from datasets import load_dataset
from torch_geometric.data import Data from… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/AIDS.