Chapter 4: Statistical Modeling and Inference 4.1 Probability: Definition of probability: Probability is the measure of the likelihood of an event occurring. It ranges from 0 to 1, where 0 indicates impossibility and 1 indicates certainty. Sample spaces and events: A sample space is the set of all possible outcomes of an experiment, while an event is a subset of the sample space. Probability calculations: Probability can be calculated by dividing the number of favorable outcomes by the total number of possible outcomes. Example: Tossing a fair coin, the probability of getting heads is 1/2, as there is one favorable outcome (heads) out of two possible outcomes (heads or tails). Probability is defined mathematically as the ratio of the number of favorable outcomes (F) to the total number of possible outcomes (S). It can be represented using the following equation: P(A) = F / S Where: P(A) represents the probability of event A occurring. F is the number of favorable outcomes, i.e., t...
Chapter 3: Exploratory Data Analysis 3.1 Introduction to Exploratory Data Analysis (EDA) EDA is the process of analyzing and visualizing data to uncover patterns, identify outliers, and gain insights. The goals of EDA include understanding the data, detecting anomalies, exploring relationships, and preparing data for modeling. Key techniques used in EDA include descriptive statistics, data visualization, correlation analysis, and handling missing data. 3.2 Descriptive Statistics Descriptive statistics summarize and describe the main characteristics of a dataset. Measures of central tendency (mean, median, mode) provide information about the typical or central value of a variable. Measures of dispersion (range, variance, standard deviation) show the spread or variability of the data. Skewness and kurtosis indicate the asymmetry and peakedness of the data distribution. Data visualization techniques, such as histograms and box plots, help visualize the distribution and summary statis...