The Influence of Gender and Study Time on The GPA of 36 Batch 2020 & 2021 Students in Indonesia Using Python

13 min readApr 16, 2024

BACKGROUND

Students must put in a variety of efforts in order to obtain a decent GPA. Studying is one activity that can be undertaken; to fully benefit from the learning process, ample study time is required. The purpose of this study is to elucidate how study hours affect students in the 2020–2021 class’s GPA. The number of study hours used in this study is based on an assumption about how many hours a student should have. The length of time spent studying, the amount of time spent studying, and involvement in groups are included as independent factors in the analysis. Active students from a variety of universities made up the sample for this study. Primary data were the sort of data used in this study. Questionnaires were given out directly to enrolled students at various universities in order to collect data. Google Collab is used for data analysis with multiple regression models. The test results indicate that students in the Class of 2020/2021’s GPA is significantly influenced, at least in part, by the amount of time they spend studying.

One factor that significantly affects a student’s grade point average (GPA) is study time. At this level, students want their studies to go well, to be finished on schedule, and to graduate with honors. Learning outcomes are influenced by gender or sex. Adolescents, both boys and girls, have unique jobs and talents during their development. They have development responsibilities to finish in order to provide education.
The purpose of this study is to determine the correlation between study time and the grade point average (GPA), as well as the correlation between learning outcomes and gender.

Since population of data collection was done at random and there were only 36 populations, this research method uses a correlation and regression test method with primary data obtained by distributing questionnaires in the form of Google forms to students in Indonesia’s class of 2020 and 2021. Since every student in the population has an equal chance of being selected, the sampling technique used for data collection is Simple Random Sampling, which is a random sampling technique or element where every element or member of the population has the same opportunity to be selected as a sample. As a result, the sampling technique can be confirmed as Simple Random Sampling.

The author of this short research intends to investigate and further evaluate the relationship between the percentage of study time that students in the class of 2020 and those in the class of 2021 devote to their studies, based on the problem description provided above. In addition, the author intends to examine the Chi-Squared test results regarding the impact of gender and study time on students’ GPAs. Ultimately, the analysis that author conduct will have a significant impact on how the GPA develops, so the author hopes to offer recommendations or suggestions for policy makers.

OBJECTIVES

Examine the normalcy test findings. The effect of study time on the GPA of 2020/2021 students.
Examine the average hypothesis test findings. The impact of study time and gender on the GPA of students in class 2020/2021.
Examine the findings of the design data analysis pertaining to the impact of study time on the 2020–2021 students’ GPA.
Examine the regression and correlation test findings. Gender and study time effects on the 2020–2021 class students’ GPAs?
Assisting 2020–2021 class members with effective time management in their studies, which has an impact on their GPA.

DATASET

In this study, the author employed primary data collected through the distribution of a Google form questionnaire to Indonesian students in the 2020 and 2021 classes. The benefit of utilizing primary data is that it comes from the community the author wish to examine, making the data more reliable and accurate. Furthermore, the data acquired offers more current information because the study time coincides with the data collection period. The gender data, GPA/GPA, and study duration columns are the ones that the author wants to use.

Each of the eight columns in the above table has the following description:

There are 36 names of Indonesian students from the 2020 and 2021 classes in the “Name” column, which is qualitative data providing the names of population members.
The gender of the 36 Indonesian students from the 2020 and 2021 classes who make up the population is included in the qualitative data in the “Gender” column. Because it is a variable that will be examined to determine whether it affects the GPA of Indonesian students in the classes of 2020 and 2021, this data is also an independent variable.
The length of study time for 36 Indonesian students from the classes of 2020 and 2021 who join the population in a matter of hours is contained in the column “Length of study time in one day (hours)”, which is continuous quantitative data because it has an infinite number of possibilities and can take any value within a certain range. Because it is a variable that will be examined to determine whether it affects the GPA/GPA of Indonesian students in the classes of 2020 and 2021, this data is also an independent variable.
The learning outcomes (GPA) of 36 Indonesian students from the classes of 2020 and 2021 who are members of the population in one semester are contained in the “GPA” column, which is continuous quantitative data because it has an infinite number of possibilities and can take any value within a certain range. The values in this column range from 1–4. The fact that this data will be analyzed to determine whether it is influenced by the independent variables — gender and the amount of time spent studying in a given day — makes it a dependent variable as well.
The “Semester” column contains qualitative data; it includes the semester that each of the 36 Indonesian students in the classes of 2020 and 2021 is currently enrolled in, as well as their individual semesters.
The ages of population members are listed in the “Age” column, which is quantitative data. It includes the ages of 36 Indonesian students from the 2020 and 2021 classes.
The qualitative data in the “University Origin” column includes the university origins of the population’s members, which consists of 36 Indonesian students from the classes of 2020 and 2021.
The qualitative data in the “Class” column includes the population members, which are 36 Indonesian students from the 2020 and 2021 classes.

In order to conduct the sample method, a Google Form questionnaire was created and randomly distributed to students who met the requirements of being in the 2020 and 2021 classes. Since there were only 36 populations and author collected population data at random, author used simple random sampling, which is a random sampling technique or element in which each element or member of the population has the same chance of being selected as a sample because each student (population) has the same chance of being selected, so it is definitive that the sampling technique used is simple random sampling.

Finding the target sample’s theoretical population — that is, all Indonesian students — was the first step in the breakdown sampling process. Next, locate a study population or a population that is accessible to it. From this point on, only contact students who are the same age as the study population. Next, create a questionnaire and randomly distribute it via social media for the sampling frame. Lastly, for the sample itself, the author included all members who have completed the questionnaire because, at the data collection stage, they are actually included in the sampling process because the questionnaire is distributed randomly.

DATA DESCRIPTION

The following library can be used in Google Colab with the Python programming language to gain descriptive statistics and data visualization:

Next, the author uses the following command to obtain descriptive values, including the average, standard deviation, maximum value, minimum value, and percentile:

This table shows that, with an average of 3.61, the highest value for the GPA/GPA variable is 4 and the lowest data is 3.11. The fact that the standard deviation in this table is 0.19 indicates that the sample’s data distribution is fairly near to the mean. The percentile value is another item that is displayed in the table. There are values for the 25%, 50%, and 75% percentiles. With a GPA of 3.5, the 25% percentile number indicates that a mere 25% of the population has a GPA of 3.5 or below. The 50% percentile in the GPA variable, which is 3.65, means that 50% of the population as a whole has a GPA of 3.65 or below. The same holds true for this percentile. This also holds true for the 75% percentile, which shows that up to 75% of people have a GPA of 3.7 or below on average.

In addition to the GPA variable, there’s another one measuring how long it takes to study; it has an average of 2.9 hours, a standard deviation of 1.6 hours, and a range of 0 to 8 hours. The length of study variable has 25%, 50%, and 75% percentiles, same like the GPA variable. According to the 25% percentile, 25% of the population studies for two hours or less per day, and the 50% and 75% percentiles show that 50% and 75% of the population study for three hours or less per day, respectively.

The image shows a tendency, pattern, or trend wherein the duration of study time increases corresponds with a rise in GPA. This leads one to believe that these two items might have a linear relationship.

NORMALITY TEST

Testing for data normalcy is the initial step in the data analysis process. The following test steps are involved in the Shapiro-Wilk test, which is the used normalcy test:

Formulation of Hypotheses
H0: The distribution of data is normal.
H1: The distribution of data is irregular
Establish α = 0.05 as the significance level.
Establish the exam standards.
Should pvalue exceed α, accept H0.
Conclusion: Since the data accepts H0, it is regularly distributed.
Not changing the information in any way

Python will be used using the following libraries and syntax to facilitate calculations:

CORRELATION AND REGRESSION

The Correlation and Regression Analysis step is the following.
Step of Correlation. The Pearson Correlation is the correlation test employed in the Correlation Stage. The dependent variable in this test is GPA, whereas the independent variable is the amount of study time spent in a day (measured in hours). The test stages are as follows:

Develop a hypothesis
H0: p is equal to zero (The amount of time spent studying has no bearing on GPA.)
H1: p ≠ 0 (The amount of time spent studying and GPA are related)
Determine the correlation coefficient sample.
Establish α = 0.05 as the significance level.
Find the crucial area: t(𝛼/2 ;v) = t(0.025, 34) = 2.032; v = n -2 = 36–2 = 34
Concluding remarks

The following library will be used with Python to facilitate calculations:

Python will be used with the following syntax to facilitate calculations:

Step of Regression. Examining mathematical models and patterns of relationships between variables is done at this level. The dependent variable in this test is GPA/GPA, whereas the independent variable is the amount of study time spent in a day (measured in hours). The test stages are as follows:

Develop a hypothesis
● Study Time and GPA Relationship:
H0_1: p = 0 (There is no correlation between variations in study duration and GPA changes)
H1_1: p ≠ 0 (Differences in study duration have an impact on GPA fluctuations)
●GPA and Gender Relationship:
H0_2: p = 0 (Gender differences have no effect on change in GPA)
H1_2: p ≠ 0 (Gender differences have an impact on changes in GPA)
Determine the value, R2 value, slope, and intercept.
Establish α = 0.05 as the significance level.
Find the crucial area: t(𝛼/2 ;v) = t(0.025, 34) = 2.032; v = n -2 = 36–2 = 34
Establish a model for regression
Concluding remarks

The following library will be used with Python to facilitate calculations:

Python will be used with the following syntax to facilitate calculations:

ANALYSIS

NORMALITY TEST RESULT

The following are the findings of the normalcy test conducted on the length of study time and student GPA for the 2020–2021 class:

Formulation of Hypothesis H0: Distribution of data is normal
H1: The distribution of data is irregular
Significance Level: α = 0.05
Establish the Test Criteria
The outcomes of utilizing the Python programming language are shown in the image that follows.

The output results indicate that the p-value is bigger than the significance level value, at 1.00. Accepting H0 is hence the choice that needs to be made.

4. Adopting H0 leads to the conclusion that the data is normally distributed at a real level or significance level of 5% (0.05). As seen in the picture below, a QQPlot was made to support this conclusion.

It is evident from the QQ Plot that a large portion of the point (plot) distribution lies in the vicinity of the red line. This further demonstrates the normal distribution of the data.

CORRELATION AND REGRESSION ANALYSIS

The correlation coefficient between the two variables is computed to determine the strength of the relationship between the dependent and independent variables. Author use Python and the df.corr() syntax to simplify calculations; the output looks like the image below.

The percentage of GPA results and the amount of study time have a correlation coefficient of r = 0.58, indicating a sufficient and good association. The results of the positive association indicate that a student’s GPA increases with increased study time. But the sample’s student data does not conclusively support this.

MULTIPLE LINEAR REGRESSION

The output obtained by using Python to program results is as follows:

The multiple linear regression model’s findings indicate that each input feature’s coefficient, or study time and gender, is represented by the slope (b). Study time has a value of 0.06548525, while gender has a coefficient of -0.10054577. These numbers indicate how each aspect affects output (GPA). The value that the model predicts for the output when all input features are zero is known as the intercept (a) of the model. The intercept in this instance is 3.46754692. The model’s ability to forecast the result is indicated by the mean absolute error (GPA).

The absolute mean difference between the expected and actual values is used to calculate it. A better model is indicated by a lower mean absolute error value. An additional metric for assessing model performance is the residual sum of squares (MSE), which is computed as the total of the squared deviations between the expected and actual values. A better model is indicated by lower MSE values. The coefficient of determination, or R2-score, indicates how effectively the model can account for fluctuations in output. The ratio of the explained variation to the overall variation is used to compute it.

With an R2-score of 0.4034, the model is able to account for 40.34% of the variation in output. This output yields an R2-score coefficient of determination of 0.4034, indicating that a linear relationship between study time (X) and student GPA results (Y) accounts for approximately 40.34% of the variance, with other variables accounting for the remaining 59.66%. In addition, a regression model was acquired, which is as follows:

CONCLUSION

The following conclusion is reached:

Data with a 95% confidence level about the length of study time on students’ GPA in the 2020–2021 class, normally distributed. On the other hand, if the plot is located far from the red line, the data is not distributed regularly. The data point distribution converges and stays close to the average value.
The percentage of study time and the percentage of students’ GPA scores in the 2020–2021 class have a reasonably significant and favorable association. The correlation coefficient value of 0.58 illustrates this. This implies that a student’s GPA will increase in proportionately with the amount of study time they devote to it.
The GPA scores of students are influenced by both gender and the amount of study time, as indicated by the results of multiple linear regression. The longer the study period, the higher the GPA value, according to the positive study time coefficient. Conversely, a negative gender coefficient denotes a negative relationship between gender and GPA. According to the developed regression model, y=3.46+0.065x, there will be a 0.065 increase in the GPA score for every unit increase in study time. Approximately 40.34% of the variation in GPA values can be explained by the model, according to the R2-score coefficient of determination of 0.4034.

SUGGESTION

The author might offer the following recommendations in relation to the completed research:

Students should be taught to better manage their study time because the amount of time spent studying is a factor that significantly affects their GPA. This can be achieved by planning a regular study schedule or by studying during downtime.
In addition, the gender of a student affects their GPA. As a result, it is preferable to implement the same guideline for every student, regardless of gender.
The results of this study may not be applicable to students in other classes because it just covered 2020–2021 class participants. In order to get more general conclusions, it is therefore preferable to conduct research using a bigger sample size and samples from different generations.