Facebook
TwitterThis is the third national probability survey of American Muslims conducted by Pew Research Center (the first was conducted in "https://www.thearda.com/data-archive?fid=MUSLIMS" Target="_blank">2007, the second in "https://www.thearda.com/data-archive?fid=MUSAM11" Target="_blank">2011). Results from this study were published in the "https://www.pewresearch.org/" Target="_blank">Pew Research Center report '"https://www.pewresearch.org/religion/2017/07/26/findings-from-pew-research-centers-2017-survey-of-us-muslims/" Target="_blank">U.S. Muslims Concerned About Their Place in Society, but Continue to Believe in the American Dream.' The report is included in the materials that accompany the public-use dataset.
The survey included interviews with 1,001 adult Muslims living in the United States. Interviewing was conducted from January 23 to May 2, 2017, in English, Arabic, Farsi and Urdu. The survey employed a complex design to obtain a probability sample of Muslim Americans. Before working with the dataset, data analysts are strongly encouraged to carefully review the 'Survey Methodology' section of the report.
In addition to the report, the materials accompanying the public-use dataset also include the survey questionnaire, which reports the full details on question wording. Data users should treat the questionnaire (and not this codebook) as the authoritative reflection of question wording and order.
Facebook
TwitterBy Throwback Thursday [source]
The dataset includes data on Christianity, Islam, Judaism, Buddhism, Hinduism, Sikhism, Shintoism, Baha'i Faith, Taoism, Confucianism, Jainism and various other syncretic and animist religions. For each religion or denomination category, it provides both the total population count and the percentage representation in relation to the overall population.
Additionally, - Columns labeled with Population provide numeric values representing the total number of individuals belonging to a particular religion or denomination. - Columns labeled with Percent represent numerical values indicating the percentage of individuals belonging to a specific religion or denomination within a given population. - Columns that begin with ** indicate primary categories (e.g., Christianity), while columns that do not have this prefix refer to subcategories (e.g., Christianity - Roman Catholics).
In addition to providing precise data about specific religions or denominations globally throughout multiple years,this dataset also records information about geographical locations by including state or country names under StateNme.
This comprehensive dataset is valuable for researchers seeking information on global religious trends and can be used for analysis in fields such as sociology, anthropology studies cultural studies among others
Introduction:
Understanding the Columns:
Year: Represents the year in which the data was recorded.
StateNme: Represents the name of the state or country for which data is recorded.
Population: Represents the total population of individuals.
Total Religious: Represents the total percentage and population of individuals who identify as religious, regardless of specific religion.
Non Religious: Represents the percentage and population of individuals who identify as non-religious or atheists.
Identifying Specific Religions: The dataset includes columns for different religions such as Christianity, Judaism, Islam, Buddhism, Hinduism, etc. Each religion is further categorized into specific denominations or types within that religion (e.g., Roman Catholics within Christianity). You can find relevant information about these religions by focusing on specific columns related to each one.
Analyzing Percentages vs. Population: Some columns provide percentages while others provide actual population numbers for each category. Depending on your analysis requirement, you can choose either column type for your calculations and comparisons.
Accessing Historical Data: The dataset includes records from multiple years allowing you to analyze trends in religious populations over time. You can filter data based on specific years using Excel filters or programming languages like Python.
Filtering Data by State/Country: If you are interested in understanding religious populations in a particular state or country, use filters to focus on that region's data only.
Example - Extracting Information:
Let's say you want to analyze Hinduism's growth globally from 2000 onwards:
- Identify Relevant Columns:
- Year: to filter data from 2000 onwards.
Hindu - Total (Percent): to analyze the percentage of individuals identifying as Hindus globally.
Filter Data:
Set a filter on the Year column and select values greater than or equal to 2000.
Look for rows where Hindu - Total (Percent) has values.
Analyze Results: You can now visualize and calculate the growth of Hinduism worldwide after filtering out irrelevant data. Use statistical methods or graphical representations like line charts to understand trends over time.
Conclusion: This guide has provided you with an overview of how to use the Rel
- Comparing religious populations across different countries: With data available for different states and countries, this dataset allows for comparisons of religious populations across regions. Researchers can analyze how different religions are distributed geographically and compare their percentages or total populations across various locations.
- Studying the impact of historical events on religious demographics: Since the dataset includes records categorized by year, it can be used to study how historical events such as wars, migration, or political changes have influenced religious demographics over time. By comparing population numbers before and after specific events, resea...
Facebook
TwitterIn the aftermath of the attacks on September 11, 2001, and subsequent terrorist attacks elsewhere around the world, a key counterterrorism concern was the possible radicalization of Muslims living in the United States. The purpose of the study was to examine and identify characteristics and practices of four American Muslim communities that have experienced varying levels of radicalization. The communities were selected because they were home to Muslim-Americans that had experienced isolated instances of radicalization. They were located in four distinct regions of the United States, and they each had distinctive histories and patterns of ethnic diversity. This objective was mainly pursued through interviews of over 120 Muslims located within four different Muslim-American communities across the country (Buffalo, New York; Houston, Texas; Seattle, Washington; and Raleigh-Durham, North Carolina), a comprehensive review of studies an literature on Muslim-American communities, a review of websites and publications of Muslim-American organizations and a compilation of data on prosecutions of Muslim-Americans on violent terrorism-related offenses.
Facebook
TwitterBy Throwback Thursday [source]
The dataset contains information on a wide range of religions, including Christianity, Judaism, Islam, Buddhism, Hinduism, Sikhism, Shintoism, Baha'i Faith, Taoism, Confucianism, Jainism, Zoroastrianism, Syncretic Religions (religious practices that blend elements from multiple faiths), Animism (belief in spiritual beings in nature), Non-Religious individuals or those without any religious affiliation.
For each religion and region/country combination recorded in the dataset we have the following information:
- Total population: The total population of the region or country.
- Religious affiliation percentages: The percentages of the population that identify with specific religious affiliations.
- Subgroup populations/percentages: The populations or percentages within specific denominations or sects of each religion.
The dataset also provides additional variables like Year and State Name (for regional data) for further analysis.
Understanding the Columns
The dataset contains several columns with different categories of information. Here's a brief explanation of some important columns:
- Year: The year in which the data was recorded.
- Total Population: The total population of a country or region.
- State Name (StateNme): The name of the state or region.
Each religion has specific columns associated with it, such as Christianity, Buddhism, Islam, Hinduism, Judaism, Taoism, Shintoism etc., representing its percentage and population for each category/denomination within that religion.
Selecting Specific Data
If you are interested in exploring data related to a particular religion or geographic location:
To filter data by Religion: Identify relevant columns associated with that religion such as 'Christianity', 'Buddhism', 'Islam', etc., and extract their respective percentage and population values for analysis.
Example: If you want to analyze Christianity specifically, extract columns related to Christianity like 'Christianity (Percent)', 'Christianity (Population)', etc.
Note: There might be multiple columns related to a specific religion indicating different categories or denominations within that religion.
To filter data by Geographic Location: Utilize the 'State Name' column ('StateNme') to segregate data corresponding to different states/regions.
Example: If you want to analyze religious demographics for a particular state/region like California or India:
i) Filter out rows where State Name is equal to California or India.
ii) Extract relevant columns associated with your selected religion as mentioned above.
Finding Trends and Insights
Once you have selected the specific data you are interested in, examine patterns and trends over time or across different regions.
Plotting data using visualizations: Use graphical tools such as line charts, bar charts, or pie charts to visualize how religious demographics have changed over the years or vary across different regions.
Analyzing population proportions: By comparing the percentage values of different religions for a given region or over time, you can gather insights into changes in religious diversity.
Comparing Religions
If you wish to compare multiple religions:
- Comparing religious affiliations across different countries or regions: With data on various religions such as Christianity, Islam, Buddhism, Judaism, Hinduism, etc., researchers can compare the religious affiliations of different countries or regions. This can help in understanding the cultural and religious diversity within different parts of the world.
- Exploring the growth or decline of specific religions: By examining population numbers for specific religions such as Jainism, Taoism, Zoroastrianism, etc., this dataset can be used to investigate the growth or decline of these religious groups over time. Researchers can analyze factors contributing to their popularity or decline in particular regions or countries
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: ThrowbackDataThursday 201912 - Religion.csv | Column name...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 5 rows and is filtered where the books is Muslim American women on campus : undergraduate social life and identity. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Cellular Towers in USA dataset is a collection of data that provides information about cellular towers located in the United States. The dataset includes information about the location, ownership, and technical specifications of over 47,000 cellular towers.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Burqas, baseball, and apple pie : being Muslim in America. It features 5 columns: author, publication date, book publisher, and BNB id.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IndQNER
IndQNER is a Named Entity Recognition (NER) benchmark dataset that was created by manually annotating 8 chapters in the Indonesian translation of the Quran. The annotation was performed using a web-based text annotation tool, Tagtog, and the BIO (Beginning-Inside-Outside) tagging format. The dataset contains:
3117 sentences
62027 tokens
2475 named entities
18 named entity categories
Named Entity Classes
The named entity classes were initially defined by analyzing the existing Quran concepts ontology. The initial classes were updated based on the information acquired during the annotation process. Finally, there are 20 classes, as follows:
Allah
Allah's Throne
Artifact
Astronomical body
Event
False deity
Holy book
Language
Angel
Person
Messenger
Prophet
Sentient
Afterlife location
Geographical location
Color
Religion
Food
Fruit
The book of Allah
Annotation Stage
There were eight annotators who contributed to the annotation process. They were informatics engineering students at the State Islamic University Syarif Hidayatullah Jakarta.
Anggita Maharani Gumay Putri
Muhammad Destamal Junas
Naufaldi Hafidhigbal
Nur Kholis Azzam Ubaidillah
Puspitasari
Septiany Nur Anggita
Wilda Nurjannah
William Santoso
Verification Stage
We found many named entity and class candidates during the annotation stage. To verify the candidates, we consulted Quran and Tafseer (content) experts who are lecturers at Quran and Tafseer Department at the State Islamic University Syarif Hidayatullah Jakarta.
Dr. Eva Nugraha, M.Ag.
Dr. Jauhar Azizy, MA
Dr. Lilik Ummi Kultsum, MA
Evaluation
We evaluated the annotation quality of IndQNER by performing experiments in two settings: supervised learning (BiLSTM+CRF) and transfer learning (IndoBERT fine-tuning).
Supervised Learning Setting
The implementation of BiLSTM and CRF utilized IndoBERT to provide word embeddings. All experiments used a batch size of 16. These are the results:
Maximum sequence length Number of e-poch Precision Recall F1 score
256 10 0.94 0.92 0.93
256 20 0.99 0.97 0.98
256 40 0.96 0.96 0.96
256 100 0.97 0.96 0.96
512 10 0.92 0.92 0.92
512 20 0.96 0.95 0.96
512 40 0.97 0.95 0.96
512 100 0.97 0.95 0.96
Transfer Learning Setting
We performed several experiments with different parameters in IndoBERT fine-tuning. All experiments used a learning rate of 2e-5 and a batch size of 16. These are the results:
Maximum sequence length Number of e-poch Precision Recall F1 score
256 10 0.67 0.65 0.65
256 20 0.60 0.59 0.59
256 40 0.75 0.72 0.71
256 100 0.73 0.68 0.68
512 10 0.72 0.62 0.64
512 20 0.62 0.57 0.58
512 40 0.72 0.66 0.67
512 100 0.68 0.68 0.67
This dataset is also part of the NusaCrowd project which aims to collect Natural Language Processing (NLP) datasets for Indonesian and its local languages.
How to Cite
@InProceedings{10.1007/978-3-031-35320-8_12,author="Gusmita, Ria Hariand Firmansyah, Asep Fajarand Moussallem, Diegoand Ngonga Ngomo, Axel-Cyrille",editor="M{\'e}tais, Elisabethand Meziane, Faridand Sugumaran, Vijayanand Manning, Warrenand Reiff-Marganiec, Stephan",title="IndQNER: Named Entity Recognition Benchmark Dataset from the Indonesian Translation of the Quran",booktitle="Natural Language Processing and Information Systems",year="2023",publisher="Springer Nature Switzerland",address="Cham",pages="170--185",abstract="Indonesian is classified as underrepresented in the Natural Language Processing (NLP) field, despite being the tenth most spoken language in the world with 198 million speakers. The paucity of datasets is recognized as the main reason for the slow advancements in NLP research for underrepresented languages. Significant attempts were made in 2020 to address this drawback for Indonesian. The Indonesian Natural Language Understanding (IndoNLU) benchmark was introduced alongside IndoBERT pre-trained language model. The second benchmark, Indonesian Language Evaluation Montage (IndoLEM), was presented in the same year. These benchmarks support several tasks, including Named Entity Recognition (NER). However, all NER datasets are in the public domain and do not contain domain-specific datasets. To alleviate this drawback, we introduce IndQNER, a manually annotated NER benchmark dataset in the religious domain that adheres to a meticulously designed annotation guideline. Since Indonesia has the world's largest Muslim population, we build the dataset from the Indonesian translation of the Quran. The dataset includes 2475 named entities representing 18 different classes. To assess the annotation quality of IndQNER, we perform experiments with BiLSTM and CRF-based NER, as well as IndoBERT fine-tuning. The results reveal that the first model outperforms the second model achieving 0.98 F1 points. This outcome indicates that IndQNER may be an acceptable evaluation metric for Indonesian NER tasks in the aforementioned domain, widening the research's domain range.",isbn="978-3-031-35320-8"}
Contact
If you have any questions or feedback, feel free to contact us at ria.hari.gusmita@uni-paderborn.de or ria.gusmita@uinjkt.ac.id
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 3 rows and is filtered where the books is American heretics : Catholics, Jews, Muslims, and the history of religious intolerance. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides comprehensive census data at the district level for India. It includes detailed demographic, religious, educational, and workforce-related attributes, making it a rich resource for socio-economic analysis.
District_code: A unique numeric code for each district. State_name: Name of the state to which the district belongs. District_name: Name of the district.
Population: Total population of the district. Male: Total male population in the district. Female: Total female population in the district.
Literate: Total number of literate individuals in the district.
Workers: Total number of workers in the district. Male_Workers: Total number of male workers in the district. Female_Workers: Total number of female workers in the district. Cultivator_Workers: Number of workers engaged as cultivators. Agricultural_Workers: Number of workers engaged in agricultural labor. Household_Workers: Number of workers engaged in household industries.
Hindus: Total number of Hindus in the district. Muslims: Total number of Muslims in the district. Christians: Total number of Christians in the district. Sikhs: Total number of Sikhs in the district. Buddhists: Total number of Buddhists in the district. Jains: Total number of Jains in the district.
Secondary_Education: Number of individuals with secondary education. Higher_Education: Number of individuals with higher education qualifications. Graduate_Education: Number of individuals with graduate-level education.
Age_Group_0_29: Population in the age group 0–29 years. Age_Group_30_49: Population in the age group 30–49 years. Age_Group_50: Population aged 50 years and above.
Number of Districts: 640 Number of Columns: 25 Non-null Values: All columns are complete with no missing data. Detailed breakdown of population by gender, age group, literacy levels, and workforce distribution. Religious composition and education statistics are also included for each district.
Data Analysis and Visualization:
Explore patterns in population distribution, literacy rates, workforce composition, and religious demographics. Machine Learning Applications:
Build predictive models to classify districts or forecast demographic trends. Social Research:
Investigate correlations between education levels, workforce participation, and religion. Policy Planning:
Help policymakers target specific demographics or regions for intervention. Educational Insights:
Analyze the impact of education levels on workforce participation or literacy.
Total Rows: 640 Total Columns: 25 This dataset provides a unique opportunity to understand India's socio-economic and demographic composition at a granular district level.
Facebook
TwitterThis Religion and State-Minorities (RASM) dataset is supplemental to the Religion and State Round 2 (RAS2) dataset. It codes the RAS religious discrimination variable using the minority as the unit of analysis (RAS2 uses a country as the unit of analysis and, is a general measure of all discrimination in the country). RASM codes religious discrimination by governments against all 566 minorities in 175 countries which make a minimum population cut off. Any religious minority which is at least 0.25 percent of the population or has a population of at least 500,000 (in countries with populations of 200 million or more) are included. The dataset also includes all Christian minorities in Muslim countries and all Muslim minorities in Christian countries for a total of 597 minorities. The data cover 1990 to 2008 with yearly codings.
These religious discrimination variables are designed to examine restrictions the government places on the practice of religion by minority religious groups. It is important to clarify two points. First, these variables focus on restrictions on minority religions. Restrictions that apply to all religions are not coded in this set of variables. This is because the act of restricting or regulating the religious practices of minorities is qualitatively different from restricting or regulating all religions. Second, this set of variables focuses only on restrictions of the practice of religion itself or on religious institutions and does not include other types of restrictions on religious minorities. The reasoning behind this is that there is much more likely to be a religious motivation for restrictions on the practice of religion than there is for political, economic, or cultural restrictions on a religious minority. These secular types of restrictions, while potentially motivated by religion, also can be due to other reasons. That political, economic, and cultural restrictions are often placed on ethnic minorities who share the same religion and the majority group in their state is proof of this.
This set of variables is essentially a list of specific types of religious restrictions which a government may place on some or all minority religions. These variables are identical to those included in the RAS2 dataset, save that one is not included because it focuses on foreign missionaries and this set of variables focuses on minorities living in the country. Each of the items in this category is coded on the following scale:
0. The activity is not restricted or the government does not engage in this practice.
1. The activity is restricted slightly or sporadically or the government engages in a mild form of this practice or a severe form sporadically.
2. The activity is significantly restricted or the government engages in this activity often and on a large scale.
A composite version combining the variables to create a measure of religious discrimination against minority religions which ranges from 0 to 48 also is included.
ARDA Note: This file was revised on October 6, 2017. At the PIs request, we removed the variable reporting on the minority's percentage of a country's population after finding inconsistencies with the reported values. For detailed data on religious demographics, see the "/data-archive?fid=RCSREG2" Target="_blank">Religious Characteristics of States Dataset Project.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 4 rows and is filtered where the books is Soundtrack to a movement : African American Islam, jazz, and Black internationalism. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Facebook
Twitterhttps://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
This dataset contains the Arab-West Report special reports that were published in 2006.This dataset mainly contains the writings of Cornelis Hulsman, Drs., among other authors on topics related to Muslim-Christian relations and interfaith dialogue. The writings in this dataset are mostly reports concerning Coptic Christian culture, Muslim-Christian dialogue, and the state of the Christian faith in Egypt.Some of the articles address the controversial book "The Da Vinci Code" and the debates that ensued after its publication surrounding its historicity and freedom of expression.Additionally this dataset contains recommendation for the work of Arab-West Report by other social figures and the development of its affiliated NGO, the Center for Arab West Understanding. Furthermore, this dataset contains commentary and critique on published material from other sources (media critique).Some of the themes that characterize this dataset:Development of the Center for Arab West Understanding (CAWU) and recommendations of the work of Arab West Report:- Recommendation for Arab-West Report and the Center for Arab-West Understanding from Dutch musician and entertainer, Herman van Veen, Pastor Dave Petrescue ( Maadi Community Church in Cairo, Egypt) and Lord Carey of Clifton, former archbishop of Canterbury. Additionally, this dataset contains special recommendations of the work of Corneliss 'Kees' Hulsman and Sawsan Gabra by Dr. Jan Slomp, member of the Advisory Editorial Board of the Journal of Muslim Minority Affairs in Jeddah. Dr. Slomp acknowledges that Arab West Report’s use of reliable information is working towards strengthening Muslim-Christian relations by providing source material for cultural, educational and religious dialogue and cooperation.-Another report mentioned that the Former Dutch Prime Minister Andreas van Agt visited Egypt to support the foundation of the Center for Arab-West Understanding.-A report about NGO Status of CAWU, “After Three Years of Struggle”. This report came as a result of the February 18 ruling of the Egyptian Council of State that granted the Center recognition as an NGO under Egyptian law.-Annual report: Arab-West Report presents the annual report for 2005.-Arab West Report’s American intern writes about 220 years of religious freedom in the U.S., arguing that one standard must be applied to all.-A discussion of homosexuality and Egyptian law taken from a bachelor’s thesis on Egyptian law.-Book review of Jamal Al-Banna’s "My Coptic Brethren".-“Christian Minorities in the Islamic World, an Egyptian Perspective”: A paper presented at the annual interfaith dialogue meeting of the Anglican Communion and the Permanent Committee of the Azhar al-Sharif for Dialogue with the Monotheistic Religions. This paper prompted criticism from Metropolitan Seraphim for the portrayal of Muslim-Christian relations in Egypt.Media Critique:-An author criticizes an article by the German magazine Der Spiegel about Christians in the Middle East. She claims that the article distorts the reality of the situation in the declining Christian communities in the region.- Interview with Egyptian artist Farid Fadil, , including discussion of his views on Muslim-Christian relations in Egypt, ’Christian art’, Leonardo da Vinci and the controversial book, The Da Vinci Code.-Excerpts from the speeches of Mr. Ahmad Māhir, former foreign minister of Egypt, Sir Derek Plumbly, British ambassador to Egypt , Mr. Tjeerd de Zwaan, Dutch ambassador to Egypt, Mr. Lasse Seim, Norwegian ambassador to Egypt, and Cornelis Hulsman, Drs., director of the Center for Arab-West Understanding, on ’Freedom of expression and respect for the other. How to respond if one is offended.’- Highlights of the meeting held at El-Sawy Culture Wheel on May 7, 2006, to launch the CAWU website. Highlights include a welcome address by Mr. Muhammad al-Sāwī, comments from former ministers Dr. Mamdouh al-Biltājī, Mr. Ahmed Māhir, Dr. Ahmad Juwaylī, head of the Protestant Community Council, Dr. Safwat al Bayādī, and former prime minister of the Netherlands, Prof. Van Agt.- Aran West Report asked our former intern Maria Roeder, a student of media science at the University of Jena in Germany, to summarize a study commissioned by the Austrian Federal Ministry of Interior. This study is a comparative study concerning Austrian media reporting on Muslims and media from countries with Muslim majorities reporting about the integration of Muslims in Europe.-A review of the media coverage following the Alexandria church stabbings concludes that both Muslims and Christians condemned the attacks and spoke of the need for change in the citizenship rights of Christians.-Apostolic Nuncio to Egypt, Archbishop Fitzgerald, responds to polarization following the Regensburg lecture of H.H. Pope Benedict XVI.-Cornelis Hulsman, Drs., presented a text at the recent roundtable discussions of the European Institute of the Mediterranean, concerning “Journalism and freedom of...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Index Time Series for Wahed Dow Jones Islamic World ETF. The frequency of the observation is daily. Moving average series are also typically included. The fund is an actively-managed exchange-traded fund ("ETF") that seeks to invests in equity securities of global companies (excluding U.S. domiciled companies) the characteristics of which meet the requirements of Shariah and are consistent with Islamic principles as interpreted by subject-matter experts (each, a "Shariah Compliant Company"). The adviser seeks to invest the fund"s assets in securities similar to the components of, and to achieve returns similar to those of, the Dow Jones Islamic Market International Titans 100 Index (the "Index"). The fund is non-diversified.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IndQNER
IndQNER is a Named Entity Recognition (NER) benchmark dataset that was created by manually annotating 8 chapters in the Indonesian translation of the Quran. The annotation was performed using a web-based text annotation tool, Tagtog, and the BIO (Beginning-Inside-Outside) tagging format. The dataset contains:
3117 sentences
62027 tokens
2475 named entities
18 named entity categories
Named Entity Classes
The named entity classes were initially defined by analyzing the existing Quran concepts ontology. The initial classes were updated based on the information acquired during the annotation process. Finally, there are 20 classes, as follows:
Allah
Allah's Throne
Artifact
Astronomical body
Event
False deity
Holy book
Language
Angel
Person
Messenger
Prophet
Sentient
Afterlife location
Geographical location
Color
Religion
Food
Fruit
The book of Allah
Annotation Stage
There were eight annotators who contributed to the annotation process. They were informatics engineering students at the State Islamic University Syarif Hidayatullah Jakarta.
Anggita Maharani Gumay Putri
Muhammad Destamal Junas
Naufaldi Hafidhigbal
Nur Kholis Azzam Ubaidillah
Puspitasari
Septiany Nur Anggita
Wilda Nurjannah
William Santoso
Verification Stage
We found many named entity and class candidates during the annotation stage. To verify the candidates, we consulted Quran and Tafseer (content) experts who are lecturers at Quran and Tafseer Department at the State Islamic University Syarif Hidayatullah Jakarta.
Dr. Eva Nugraha, M.Ag.
Dr. Jauhar Azizy, MA
Dr. Lilik Ummi Kultsum, MA
Evaluation
We evaluated the annotation quality of IndQNER by performing experiments in two settings: supervised learning (BiLSTM+CRF) and transfer learning (IndoBERT fine-tuning).
Supervised Learning Setting
The implementation of BiLSTM and CRF utilized IndoBERT to provide word embeddings. All experiments used a batch size of 16. These are the results:
Maximum sequence length Number of e-poch Precision Recall F1 score
256 10 0.94 0.92 0.93
256 20 0.99 0.97 0.98
256 40 0.96 0.96 0.96
256 100 0.97 0.96 0.96
512 10 0.92 0.92 0.92
512 20 0.96 0.95 0.96
512 40 0.97 0.95 0.96
512 100 0.97 0.95 0.96
Transfer Learning Setting
We performed several experiments with different parameters in IndoBERT fine-tuning. All experiments used a learning rate of 2e-5 and a batch size of 16. These are the results:
Maximum sequence length Number of e-poch Precision Recall F1 score
256 10 0.67 0.65 0.65
256 20 0.60 0.59 0.59
256 40 0.75 0.72 0.71
256 100 0.73 0.68 0.68
512 10 0.72 0.62 0.64
512 20 0.62 0.57 0.58
512 40 0.72 0.66 0.67
512 100 0.68 0.68 0.67
This dataset is also part of the NusaCrowd project which aims to collect Natural Language Processing (NLP) datasets for Indonesian and its local languages.
How to Cite
@InProceedings{10.1007/978-3-031-35320-8_12,author="Gusmita, Ria Hariand Firmansyah, Asep Fajarand Moussallem, Diegoand Ngonga Ngomo, Axel-Cyrille",editor="M{\'e}tais, Elisabethand Meziane, Faridand Sugumaran, Vijayanand Manning, Warrenand Reiff-Marganiec, Stephan",title="IndQNER: Named Entity Recognition Benchmark Dataset from the Indonesian Translation of the Quran",booktitle="Natural Language Processing and Information Systems",year="2023",publisher="Springer Nature Switzerland",address="Cham",pages="170--185",abstract="Indonesian is classified as underrepresented in the Natural Language Processing (NLP) field, despite being the tenth most spoken language in the world with 198 million speakers. The paucity of datasets is recognized as the main reason for the slow advancements in NLP research for underrepresented languages. Significant attempts were made in 2020 to address this drawback for Indonesian. The Indonesian Natural Language Understanding (IndoNLU) benchmark was introduced alongside IndoBERT pre-trained language model. The second benchmark, Indonesian Language Evaluation Montage (IndoLEM), was presented in the same year. These benchmarks support several tasks, including Named Entity Recognition (NER). However, all NER datasets are in the public domain and do not contain domain-specific datasets. To alleviate this drawback, we introduce IndQNER, a manually annotated NER benchmark dataset in the religious domain that adheres to a meticulously designed annotation guideline. Since Indonesia has the world's largest Muslim population, we build the dataset from the Indonesian translation of the Quran. The dataset includes 2475 named entities representing 18 different classes. To assess the annotation quality of IndQNER, we perform experiments with BiLSTM and CRF-based NER, as well as IndoBERT fine-tuning. The results reveal that the first model outperforms the second model achieving 0.98 F1 points. This outcome indicates that IndQNER may be an acceptable evaluation metric for Indonesian NER tasks in the aforementioned domain, widening the research's domain range.",isbn="978-3-031-35320-8"}
Contact
If you have any questions or feedback, feel free to contact us at ria.hari.gusmita@uni-paderborn.de or ria.gusmita@uinjkt.ac.id
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 11 verified Indian Muslim restaurant businesses in New York, United States with complete contact information, ratings, reviews, and location data.
Facebook
Twitterhttps://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
This dataset contains the Arab-West Report special reports that were published in 2004.This dataset mainly contains the writings of Cornelis Hulsman ,Drs., among other authors on topics related to Muslim- Christian relations and interfaith dialogue between the West and Islamic world. Additionally this dataset contains reports pertaining to certain Muslim –Christian incidents and reports about allegations of forced conversions of Coptic girls. Some of the articles addressed the issue of missionaries.Further reports address monastic life and recommendations of Arab-West Report's work by other social figures.Furthermore, the dataset included commentary on published material from other sources (reviews/critique of articles from other media).Some of the themes that characterized this dataset:-A description of the history of the conflicts around the development of the convent of Patmos on the Cairo-Suez road.-An overview of a book titled “Christians versus Muslims in Modern Egypt: The Century-Long Struggle for Coptic Equality” by S. S. Hasan.- Rumors of forced conversions Of Coptic girls: A report by Hulsman stated that the US Copts Association published a press release on March 25, 2004 with the title “Coptic Pope Denounces Forced Conversion of Coptic Girls.” He criticized that the US Copts Association for not making much of an effort, if any, to check the veracity of the rumors.- A Glimpse into Monastic Life in Egypt: A Visit to St. Maqarius Monastery:- Another report covered the incident in which a priest and two members of the church board of Taha al-ʿAmeda died after an accident with a speeding car driven by a police officer.- A critique of Al-Usbuʿa newspapers: the author accused the newspaper of cherry-picking statements by Coptic extremists, who are much disliked in the US Coptic community and who have no following. He considered that quoting statements from such isolated radicals gives readers the impression that they represent much more than a few individuals. It has all appearance that al-Usbuʿa has highlighted these radicals to create fear and harm the reputation of US Copts in Egypt.- A number of reports highlighted a visit and the speech delivered by the Archbishop of Canterbury, Dr George Carey (Lord Carey) at the Azhar entitled “Muslims/Christian Relationships: A New Age Of Hope?”- A report covered the first visit made by Archbishop Rowan Williams to the Diocese of Egypt since he became the Archbishop of Canterbury. The archbishop met with President Mubarak, Dr. Muhammad Sayyed Tantawi, the Grand Imam of the Azhar, Pope Shenouda and also laid the foundation stone of Harpur Community Health Centre in Sadat City.- Updates on the developments of AWR’s work to create an electronic archive of information pertaining to relations between Muslims and Christians in the Arab-World in general and Egypt in particular.Additionally, this dataset also provides updates of the then-under construction - Center for Arab-West Understanding (CAWU) web-based Electronic Documentation Center (EDC) for contemporary information covering Arab-West and Muslim-Christian relations.- A report discussed the misconceptions of Christians in Islam.- An editorial commenting on the assassination of Theo van Gogh resulted in a debate in Dutch media about the limits of the freedom of expression.- An article calling on the western readers to be careful with Christian persecution stories from Egypt, they may be true but also may be rumours.-The Muslim World And The West; What Can Be Done To Reduce Tensions?-Text of a lecture for students and professors of different faculties at the University of Copenhagen, , about plans to establish the Center for Arab-West Understanding in Cairo, Egypt.- Escalations following the alleged conversion of A priest’s wife to IslamThe list of authors’ featurd in this dataset goes as follows:Cornelis Hulsman, Drs. , Wolfram Reiss, Rev. Dr. , John H. Watson, Kim Kwang-Chan, Dr. , Kamal Abu al-Majd, Fiona McCallum, Mary Picard , Jeff Adams, Dr., Rev., Jennie Marshall , Marcos Emil Mikhael, Usamah W. al-Ahwani, Sawsan Jabrah and Nirmin Fawzi, Hānī Labīb, George Carey (Lord), Rowan Williams, Lambeth Palace Press Office, H.G. Bishop Munir Hanna Anis Armanius, Eildert Mulder, Rīhām Saʿīd, Tharwat al-Kharabāwī, Geir Valle, Janique Blattman, Iqbal Barakah , Munā ʿUmar, Dieter Tewes, ʿAmr Asʿad Khalīl, Dr., Janique Blattmann, Vera Milackova, Tamir Shukri, and Christiane Paulus All reports are written in English, though some reports feature Arabic text or cite Arabic sources.
Facebook
TwitterThis study, designed and carried out by the "http://www.asarb.org/" Target="_blank">Association of Statisticians of American Religious Bodies (ASARB), compiled data on 372 religious bodies by county in the United States. Of these, the ASARB was able to gather data on congregations and adherents for 217 religious bodies and on congregations only for 155. Participating bodies included 354 Christian denominations, associations, or communions (including Latter-day Saints, Messianic Jews, and Unitarian/Universalist groups); counts of Jain, Shinto, Sikh, Tao, Zoroastrian, American Ethical Union, and National Spiritualist Association congregations, and counts of congregations and adherents from Baha'i, three Buddhist groupings, two Hindu groupings, four Jewish groupings, and Muslims. The 372 groups reported a total of 356,642 congregations with 161,224,088 adherents, comprising 48.6 percent of the total U.S. population of 331,449,281. Membership totals were estimated for some religious groups.
In January 2024, the ARDA added 21 religious tradition (RELTRAD) variables to this dataset. These variables start at variable #12 (TOTCNG_2020). Categories were assigned based on pages 88-94 in the original "https://www.usreligioncensus.org/index.php/node/1638" Target="_blank">2020 U.S. Religion Census Report.
Visit the "https://www.thearda.com/us-religion/sources-for-religious-congregations-membership-data" Target="_blank">frequently asked questions page for more information about the ARDA's religious congregation and membership data sources.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These are the Stata and R data and replication code for "The Media Matters: Muslim American Portrayals and the Effects on Mass Attitudes."
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper primarily employs qualitative research methods, analyzing articles from French media regarding Muslim and Ukrainian immigrants. This study utilized the Europresse media database, setting the filter category to domestic French media. By examining these texts, the study identifies relevant sentences and key terms that reveal the discursive characteristics of the French media itself, as well as the discursive images of Muslim and Ukrainian immigrants. These findings are then subjected to further interpretive analysis.
Facebook
TwitterThis is the third national probability survey of American Muslims conducted by Pew Research Center (the first was conducted in "https://www.thearda.com/data-archive?fid=MUSLIMS" Target="_blank">2007, the second in "https://www.thearda.com/data-archive?fid=MUSAM11" Target="_blank">2011). Results from this study were published in the "https://www.pewresearch.org/" Target="_blank">Pew Research Center report '"https://www.pewresearch.org/religion/2017/07/26/findings-from-pew-research-centers-2017-survey-of-us-muslims/" Target="_blank">U.S. Muslims Concerned About Their Place in Society, but Continue to Believe in the American Dream.' The report is included in the materials that accompany the public-use dataset.
The survey included interviews with 1,001 adult Muslims living in the United States. Interviewing was conducted from January 23 to May 2, 2017, in English, Arabic, Farsi and Urdu. The survey employed a complex design to obtain a probability sample of Muslim Americans. Before working with the dataset, data analysts are strongly encouraged to carefully review the 'Survey Methodology' section of the report.
In addition to the report, the materials accompanying the public-use dataset also include the survey questionnaire, which reports the full details on question wording. Data users should treat the questionnaire (and not this codebook) as the authoritative reflection of question wording and order.