Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supporting tables and figures. Table S1. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of true positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S2. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of false positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S3. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S4. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S5. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S6. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S7. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of true positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S8. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of false positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S9. The numbers of differentially expressed genes detected by different selection strategies. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Figure S1. Histogram of pairwise Pearson correlation coefficients between genes computed from HYPERDIP without normalization. Number of genes: 9005. Number of arrays: 88. (PDF)
Explore the progression of average salaries for graduates in Math And Statistics from 2020 to 2023 through this detailed chart. It compares these figures against the national average for all graduates, offering a comprehensive look at the earning potential of Math And Statistics relative to other fields. This data is essential for students assessing the return on investment of their education in Math And Statistics, providing a clear picture of financial prospects post-graduation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spain: PISA math scores: The latest value from 2022 is 473.14 index points, a decline from 481.393 index points in 2018. In comparison, the world average is 439.569 index points, based on data from 78 countries. Historically, the average for Spain from 2003 to 2022 is 481.893 index points. The minimum value, 473.14 index points, was reached in 2022 while the maximum of 485.843 index points was recorded in 2015.
Explore the progression of average salaries for graduates in Applied And Computational Math And Statistics from 2020 to 2023 through this detailed chart. It compares these figures against the national average for all graduates, offering a comprehensive look at the earning potential of Applied And Computational Math And Statistics relative to other fields. This data is essential for students assessing the return on investment of their education in Applied And Computational Math And Statistics, providing a clear picture of financial prospects post-graduation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Saudi Arabia phone number data is another important collection of phone numbers. These numbers come from trusted sources. We carefully check every number. This means you only get real numbers from reliable places. Furthermore, this data includes source URLs. You can use these URLs to find out where the numbers came from. This adds transparency to the data. If you have questions, you can get help anytime. Support is available 24/7. Moreover, the phone data has an opt-in feature. With customer support always on hand to help, you can feel confident using this data.Saudi Arabia number data is a special collection of phone numbers. Besides, this list includes numbers from people living in Saudi Arabia. Each number in this database has verification for accuracy. If you ever find a number that does not work, there is a replacement guarantee. This means any invalid number gets replaced with a valid one at no extra cost. The data comes from people who have given permission. Thus, this respect for privacy makes it a great tool for businesses. At List to Data, we help you find important phone numbers easily and quickly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Biological data were analysed with all 6 methods, the mean period value is reported in the table (standard deviation in brackets). The expected period is 24 h as the clock is entrained by a 24 h light:dark cycle. 1) The data were collected in two different conditions: LD and SD, monitoring 5 output genes in each of them. 2) (All) represents aggregated results from all data sets. 3) NoCAT3 represents aggregated results from all data sets except the CAT3 marker. +) The cases for which mean period is not statistically different from the 24 h are marked with +.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
No of Transaction: Check data was reported at 23,310.000 Unit in Dec 2019. This records an increase from the previous number of 20,747.000 Unit for Sep 2019. No of Transaction: Check data is updated quarterly, averaging 167,515.000 Unit from Mar 2013 (Median) to Dec 2019, with 28 observations. The data reached an all-time high of 250,046.000 Unit in Dec 2016 and a record low of 20,747.000 Unit in Sep 2019. No of Transaction: Check data remains active status in CEIC and is reported by State Bank of Vietnam. The data is categorized under Global Database’s Vietnam – Table VN.KA009: Domestic Transaction: Means of Liquidity. [COVID-19-IMPACT]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Industrial Enterprise: Number of Employee: Average data was reported at 72,892.000 Person th in Mar 2025. This records an increase from the previous number of 72,243.000 Person th for Feb 2025. China Industrial Enterprise: Number of Employee: Average data is updated monthly, averaging 77,451.100 Person th from Dec 1992 (Median) to Mar 2025, with 186 observations. The data reached an all-time high of 99,772.100 Person th in Dec 2014 and a record low of 54,408.390 Person th in Dec 2001. China Industrial Enterprise: Number of Employee: Average data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BF: Industrial Financial Data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China: PISA math scores: The latest value from 2015 is 531.296 index points, unavailable from index points in . In comparison, the world average is 463.913 index points, based on data from 67 countries. Historically, the average for China from 2015 to 2015 is 531.296 index points. The minimum value, 531.296 index points, was reached in 2015 while the maximum of 531.296 index points was recorded in 2015.
"In order to complete an economic study of UCG, a preliminary design must be made for the process. The design and economic study both require the estimation of a considerable number of variables such as depth and thickness of the coal seam, well spacing, gas heating value and production rate, air injection requirements, well spacing, percentage coal recovery, thermal efficiency of the process, and rate of advance of the gasification zone. Almost never will sufficient experimental data be available to determine all variables with confidence. Furthermore, not all of the variables cited are independent of each other. The purpose of this paper is to show how mathematical models and laboratory data can lead to a major reduction in uncertainties resulting from assumptions associated with economic analyses of UCG. An economic analysis is used to illustrate this method. Mathematical model calculations are used to establish the relationships between variables such as the gas heating value and thermal efficiency. The actual correlations are developed from operating data from the Hanna field tests, but model calculations provide the theoretical explanation for the shape and sensitivity of the experimental curves. Model calculations also allow confident interpolation and extrapolation of the experimental data. The end result is an economic analysis with improved accuracy and few assumptions. Finally, it is shown how economic studies can provide valuable feedback for an ongoing research program. Certain variables have yet to be fully determined such as maximum well spacing , probable gas leakage due to subsidence and long term average gas heating value. Economic studies show which of these variables have the greatest economic impact. Those variables with maximum impact should receive the greatest emphasis in research."
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The data are in long form, with some studies having multiple lines and includes a sample of children ranging from 3.54 to 13.75 years old. The main effect size is the r, correlation coefficient, and the accompanying sample size is also included. Each article is coded to include a study number, the article name, and its authors, as well as a X moderators. The moderators are as follows:
- grade_new2 = sample grade category, where 1 = preschool/kindergarten, 2 = secondary
- HME_comp_new = HME component, where 1 = direct activities, 2 = indirect activities, 3 = combination direct and indirect activities, 4 = parent attitudes and/or beliefs, 5 = parent math expectations, 6 = spatial activities, 7 = math talk
- hme_type_nocombo = HME measurement method, where 1 = frequency-based scale, 2 = rating scale, 3 = checklist, 4 = observation
- obs_pr = two-level HME measurement method variable, where 1 = observation-based, 2 = parent-report
- math_dom_nospat = math domain, where 1 = arithmetic operations, 2 = relations, 3 = numbering, 4 = multiple domains
- symbolic_nonsymbolic = refers to math assessment, where 1 = symbolic, 2 = non-symbolic, 3 = combination symbolic and non-symbolic
- timed_new = refers to math assessment, where 1 = timed, 2 = untimed, . = combination timed and untimed
- composite = refers to math assessment, where 1 = composite, 2 = single math assessment
- std_new = refers to math assessment, where 1 = standardized, 2 = unstandardized, . = combination standardized and unstandardized
- hme_calc = hme calculation method, where 1 = latent factor score, 2 = sum score, 3 = single item
- age = sample age in years
- long_new = refers to effect size, where 1 = longitudinal relation, 2 = concurrent relation
- low_SES = sample SES, where 1 = low SES (50% or more), 2 = average SES, 3 = high SES (50% or more)
- parent_ed = sample SES in terms of parent education level, based on the percentage of parents reported to have completed any post-secondary education (included a vocational certification, attended some college, and/or completed an associate’s, bachelor’s, or graduate degree program). The percentage was converted into a decimal value ranging from .00 to 1.00.
Explore the progression of average salaries for graduates in Math Of Finance from 2020 to 2023 through this detailed chart. It compares these figures against the national average for all graduates, offering a comprehensive look at the earning potential of Math Of Finance relative to other fields. This data is essential for students assessing the return on investment of their education in Math Of Finance, providing a clear picture of financial prospects post-graduation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
San Marino: PISA math scores: The latest value from is index points, unavailable from index points in . In comparison, the world average is 0.000 index points, based on data from countries. Historically, the average for San Marino from to is index points. The minimum value, index points, was reached in while the maximum of index points was recorded in .
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description: This dataset provides comprehensive movie statistics compiled from multiple sources, including Wikipedia, The Numbers, and IMDb. It offers a rich collection of information and insights into various aspects of movies, such as movie titles, production dates, genres, runtime minutes, director information, average ratings, number of votes, approval index, production budgets, domestic gross earnings, and worldwide gross earnings.
The dataset combines data scraped from Wikipedia, which includes details about movie titles, production dates, genres, runtime minutes, and director information, with data from The Numbers, a reliable source for box office statistics. Additionally, IMDb data is integrated to provide information on average ratings, number of votes, and other movie-related attributes.
With this dataset, users can analyze and explore trends in the film industry, assess the financial success of movies, identify popular genres, and investigate the relationship between average ratings and box office performance. Researchers, movie enthusiasts, and data analysts can leverage this dataset for various purposes, including data visualization, predictive modeling, and deeper understanding of the movie landscape.
Features: - Movie_title - Production_date - Genres - Runtime_minutes - Director_name (primaryName) - Director_professions (primaryProfession) - Director_birthYear - Director_deathYear - Movie_averageRating : refers to the average rating given by online users for a particular movie - Movie_numberOfVotes : refers to the number of votes given by online users for a particular movie - Approval_Index :is a normalized indicator (on scale 0-10) calculated by multiplying the logarithm of the number of votes by the average users rating. It provides a concise measure of a movie's overall popularity and approval among online viewers, penalizing both films that got too few reviews and blockbusters that got too many. - Production_budget ( $) - Domestic_gross ($) - Worldwide_gross ($)
Potential Applications:
Box office analysis: Analyze the relationship between production budgets, domestic and worldwide gross earnings, and profitability. Genre analysis: Identify the most popular genres based on movie counts and analyze their performance. Rating analysis: Explore the relationship between average ratings, number of votes, and financial success. Director analysis: Investigate the impact of directors on movie ratings and financial performance. Time-based analysis: Study movie trends over different production years and observe changes in production budgets, box office earnings, and genre preferences. By utilizing this dataset, users can gain valuable insights into the movie industry and uncover patterns that can inform decision-making, market research, and creative strategies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Netherlands: PISA math scores: The latest value from 2022 is 492.676 index points, a decline from 519.231 index points in 2018. In comparison, the world average is 439.569 index points, based on data from 78 countries. Historically, the average for the Netherlands from 2003 to 2022 is 520.206 index points. The minimum value, 492.676 index points, was reached in 2022 while the maximum of 537.823 index points was recorded in 2003.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Thailand Average Monthly Revenue: Per Mobile Phone Number: Prepaid data was reported at 151.000 THB in Sep 2019. This records a decrease from the previous number of 152.000 THB for Jun 2019. Thailand Average Monthly Revenue: Per Mobile Phone Number: Prepaid data is updated quarterly, averaging 152.000 THB from Mar 2014 (Median) to Sep 2019, with 23 observations. The data reached an all-time high of 165.000 THB in Mar 2016 and a record low of 134.000 THB in Sep 2014. Thailand Average Monthly Revenue: Per Mobile Phone Number: Prepaid data remains active status in CEIC and is reported by Office of The National Broadcasting and Telecommunications Commission. The data is categorized under Global Database’s Thailand – Table TH.TB006: Telecommunication Statistics: Office of The National Broadcasting and Telecommunications Commission .
Normalized 2020 and 2050 First Street flood risk data aggregated at the census-tract level. A lower number indicates less risk (0 is minimum) and a higher number indicates more risk (1 is maximum). The normalization process subtracts the mean from the local value and divides it by the standard deviation: ((tract_value - overall mean) / stand_dev). The overall mean is the national average of all census tracts.
If you are interested in acquiring First Street flood data, you can request to access the data here. More information on First Street's flood risk statistics can be found here and information on First Street's hazards can be found here.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
LinkedIn is a place for increasing connection, showing your skills and achievements. Therefore in order to understand the various features like promotions, regional analysis and facial characteristics. This data is taken into consideration.
Data is consisting of around 15000 profiles. The data set deals with a lot of features like region, the way the images are being uploaded, the emotions on them and growth of the users over time.
Lets understand the following attributes for the betterment:-
User id is a thing of privacy and should not be disclosed although there characteristics can be given in order to understand the various behavior pattern of people in LinkedIn. c id : name for each data, basically forms the primary key.
Profession Columns avg time in previous position: The amount of time spent in years in the previous position avg current position length: The amount of time on an average the user is present in the current position avg previous position length: The amount of time on an average the user is present in the previous position m urn: The user id for each profile m urn id: This is reduced to a distinct code no of promotions: Total number of times the user was promoted no of previous positions: The number of previous positions the user holds current position length: The number of months the person is in current position age: The Age of the person gender: Male or Female ethnicity: The percentage of ethnicity n followers: Number of followers
Image Clarity
beauty: The beauty is the index for the analysis of the
beauty female: This predicts the user image is more to be female or not.
beauty male: This predicts the user image is more to be male or not.
blur: The degree of shadiness of the image
Emotion Captured emo anger: The percentage of anger found emo disgust: The percentage of disgust found emo fear : The percentage of fear found emo happiness: The percentage of happiness found emo neutral: The percentage of neutral emo sadness: The percentage of sadness emo surprise: The percentage of surprise
Orientation & Facial Accessories glass: The person is wearing glasses or not or sunglasses head pitch: The orientation of head(basically Up or down) head roll: The orientation of head(side ways rolling; horizontal or vertical) head yaw: The orientation of head(side facing; left or right) mouth close: The percentage of closed mouth mouth mask: The percentage of masked mouth mouth open: The percentage of open mouth mouth other: The percentage of other mouth things skin acne: The percentage of skin tone skin dark_circle: The percentage of dark circle on skin skin health: The growth of the skin percentage skin stain: The stain percentage on skin smile: The smile percentage
Region Columns
nationality: The nationality belonging
Followed by the percentage of each:-
african
celtic english
east asian
european
greek
hispanic
jewish
muslim
nordic
south asian
face_quality: The quality of the face recognized.
We wouldn't be here without the help of Kagglers. If you owe any attributions or thanks, include them here along with any citations of past research.
Always wanted to contribute to the data science community and open up to questions.
https://data.norge.no/nlod/en/2.0/https://data.norge.no/nlod/en/2.0/
Data set of phone numbers for state-of-the-art businesses, municipalities and county authorities. It is intended to be used together with the data set of the units of public administration. This dataset is part of several data sets about public enterprises. The data sets are referred to as the agency base and were previously on Norge.no. They contain an overview of public enterprises, i.e. government agencies and enterprises’ central, regional and local units, county municipalities and municipalities. Data sets are not updated. The data sets contain information about the name of the enterprise, visiting address, postal address, telephone number, e-mail address, web address (URL), map coordinates (position), coverage (which municipalities the business covers), organisation number, overarching activity, type of organisation, type of affiliation (the way in which an enterprise is linked to the executive government) and quality assessments of the website. Look up on the keyword/tag agency base to see the other datasets. The establishment base is closed and is no longer maintained by the Directorate of Digitalisation (formerly Difi). The datasets were last updated in January 2012. Note that this does not mean that all data was updated in January 2012, but that the last changes were made at that time. Reference to the source When using this dataset, we ask that the source be referred to as follows (cf the NLOD license): The service is based on open data sets from the Directorate of Digitalisation and is subject to the Norwegian License for Public Data (NLOD). The data was last updated in 2012 and is no longer maintained by the Directorate of Digitalisation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supporting tables and figures. Table S1. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of true positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S2. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of false positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S3. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S4. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S5. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S6. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S7. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of true positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S8. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of false positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S9. The numbers of differentially expressed genes detected by different selection strategies. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Figure S1. Histogram of pairwise Pearson correlation coefficients between genes computed from HYPERDIP without normalization. Number of genes: 9005. Number of arrays: 88. (PDF)