Skip to content

Commit 32233ac

Browse files
[Edit] Python: NumPy - .mean()
1 parent 778865c commit 32233ac

File tree

1 file changed

+220
-34
lines changed
  • content/numpy/concepts/built-in-functions/terms/mean

1 file changed

+220
-34
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,252 @@
11
---
22
Title: '.mean()'
3-
Description: 'Computes the arithmetic mean along the specified axis.'
3+
Description: 'Calculates the arithmetic mean of elements in a NumPy array along the specified axis.'
44
Subjects:
55
- 'Computer Science'
66
- 'Data Science'
77
Tags:
8-
- 'Data Structures'
98
- 'Arrays'
109
- 'Functions'
1110
- 'NumPy'
11+
- 'Statistics'
1212
CatalogContent:
1313
- 'learn-python-3'
1414
- 'paths/data-science'
1515
---
1616

17-
The **`.mean()`** method calculates and returns the arithmetic mean, i.e., average, for an array of numbers. If the axis is specified, the average is calculated over that axis. Otherwise, the mean is calculated across the flattened array.
17+
The **`.mean()`** method calculates and returns the arithmetic mean of elements in a NumPy array. It computes the average by summing all elements along the specified axis and dividing by the number of elements. This method is one of the fundamental statistical functions in NumPy that data scientists and analysts use to understand the central tendency of numerical data.
18+
19+
NumPy's `.mean()` is highly versatile, allowing calculation of means across entire arrays or along specific dimensions. It's commonly used in data analysis, scientific computing, and machine learning for tasks such as feature normalization, statistical analysis, and data preprocessing.
1820

1921
## Syntax
2022

2123
```pseudo
22-
numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
24+
numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, where=<no value>)
25+
```
26+
27+
**Parameters:**
28+
29+
- `a`: The array containing numbers whose mean is to be calculated.
30+
- `axis` (Optional): Axis or axes along which the means are computed. If `None`, the array is flattened before computation.
31+
- `dtype` (Optional): The data type used for calculating the mean. By default, `float64` is used for integers, and the input data type is preserved for floating-point numbers.
32+
- `out` (Optional): Alternative output array to store the result. Must have the same shape as expected output.
33+
- `keepdims` (Optional): If `True`, retains the reduced dimensions as size one, ensuring consistency for broadcasting.
34+
- `where` (Optional): Elements to include in the mean calculation. Must be a boolean array with the same shape as `a`.
35+
36+
**Return value:**
37+
38+
The `.mean()` method returns an [ndarray](https://www.codecademy.com/resources/docs/numpy/ndarray) containing the mean values. If `axis` is `None`, the result is a scalar value.
39+
40+
## Example 1: Basic Mean Calculation
41+
42+
This example demonstrates how to calculate the mean of a one-dimensional NumPy array:
43+
44+
```py
45+
import numpy as np
46+
47+
# Create a 1D array
48+
array1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
49+
50+
# Calculate the mean of the array
51+
avg = np.mean(array1)
52+
53+
print("Array:", array1)
54+
print("Mean value:", avg)
55+
```
56+
57+
This example results in the following output:
58+
59+
```shell
60+
Array: [0 1 2 3 4 5 6 7]
61+
Mean value: 3.5
62+
```
63+
64+
In this basic example, a 1D array with values from 0 to 7 is created and the arithmetic mean is calculated, which is 3.5 (the sum of all elements divided by the number of elements).
65+
66+
## Example 2: Calculating Mean Across Different Axes
67+
68+
This example shows how to compute the mean along different axes of a multi-dimensional array, which is useful in many data analysis scenarios:
69+
70+
```py
71+
import numpy as np
72+
73+
# Create a 3D array
74+
array1 = np.array([[[1, 2], [3, 4]],
75+
[[5, 6], [7, 8]]])
76+
77+
# Print the array shape and the array itself
78+
print("Array shape:", array1.shape)
79+
print("Array:\n", array1)
80+
81+
# Find the mean of entire array
82+
mean1 = np.mean(array1)
83+
84+
# Find the mean across axis 0
85+
mean2 = np.mean(array1, axis=0)
86+
87+
# Find the mean across axis 0 and 1
88+
mean3 = np.mean(array1, (0, 1))
89+
90+
print("\nMean of the entire array:", mean1)
91+
print("Mean across axis 0:\n", mean2)
92+
print("Mean across axis 0 and 1:", mean3)
93+
```
94+
95+
This example results in the following output:
96+
97+
```shell
98+
Array shape: (2, 2, 2)
99+
Array:
100+
[[[1 2]
101+
[3 4]]
102+
103+
[[5 6]
104+
[7 8]]]
105+
106+
Mean of the entire array: 4.5
107+
Mean across axis 0:
108+
[[3. 4.]
109+
[5. 6.]]
110+
Mean across axis 0 and 1: [4. 5.]
23111
```
24112

25-
- `a`: The array of numbers for which the mean is to be calculated. If the input is not a list, an error is raised.
26-
- `axis` (Optional): The axis or axes over which the mean is to be computed.
27-
- `dtype` (Optional): The data type for which the mean is to be calculated. By default, `float64` is used for integers and for floating point inputs, it is the same as the input `dtype`.
28-
- `out` (Optional): Allows storing the calculated mean in an existing array instead of creating a new array. It must have the same shape that is expected of the resulting mean.
29-
- `keepdims` (Optional): If `True`, it specifies whether the reduced dimensions should be kept in the result.
30-
- `where` (Optional): It clarifies which elements should be included in the mean calculation.
113+
When calculating the mean without specifying an axis, all elements are averaged. When specifying `axis=0`, the mean is calculated along the first dimension, resulting in a 2D array. When specifying both axes 0 and 1 with `(0, 1)`, the result is a 1D array with the mean of all elements in each 2D slice.
31114

32-
## Example
115+
## Example 3: Data Analysis with Real-world Data
33116

34-
The following example creates an array and then uses the `.mean()` method with different attributes to compute the mean of its elements:
117+
This example demonstrates how to use `.mean()` to analyze temperature data, a common application in environmental science and meteorology:
35118

36119
```py
37120
import numpy as np
38121

39-
A = np.array([[0,1,2,3],[4,5,6,7],[8,9,10,11]])
122+
# Monthly average temperatures (°C) for a city over 2 years
123+
# Rows: Years (2023, 2024)
124+
# Columns: Months (Jan to Dec)
125+
temperatures = np.array([
126+
[5.2, 6.8, 9.3, 13.5, 18.2, 22.6, 25.1, 24.3, 19.7, 14.2, 9.1, 6.3], # 2023
127+
[4.8, 6.5, 8.9, 14.1, 17.9, 23.2, 26.0, 25.2, 19.5, 13.8, 8.5, 5.9] # 2024
128+
])
129+
130+
print("Temperature data shape:", temperatures.shape)
131+
132+
# Calculate the average temperature for each year
133+
yearly_avg = np.mean(temperatures, axis=1)
134+
print("\nYearly average temperatures:")
135+
for year, avg in zip([2023, 2024], yearly_avg):
136+
print(f"{year}: {avg:.2f}°C")
40137

41-
print("A:", A)
42-
print("np.mean(A):", np.mean(A))
43-
print("np.mean(A, axis=0):", np.mean(A, axis=0))
44-
print("np.mean(A, axis=0, keepdims=True):", np.mean(A, axis=0, keepdims=True))
45-
print("np.mean(A, axis=1):", np.mean(A, axis=1))
46-
print("np.mean(A, axis=1, keepdims=True):", np.mean(A, axis=1, keepdims=True))
47-
print("np.mean(A, dtype=np.float64):", np.mean(A, dtype=np.float64)) # Computing the mean in 'float64' is more accurate
48-
print("np.mean(A, where=[[True], [False], [False]]):", np.mean(A, where=[[True], [False], [False]]))
138+
# Calculate the average temperature for each month across years
139+
monthly_avg = np.mean(temperatures, axis=0)
140+
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
141+
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
142+
143+
print("\nMonthly average temperatures across years:")
144+
for month, avg in zip(months, monthly_avg):
145+
print(f"{month}: {avg:.2f}°C")
146+
147+
# Calculate the overall average temperature
148+
overall_avg = np.mean(temperatures)
149+
print("\nOverall average temperature: {:.2f}°C".format(overall_avg))
49150
```
50151

51-
This produces the following output:
152+
This example results in the following output:
52153

53154
```shell
54-
A: [[ 0 1 2 3]
55-
[ 4 5 6 7]
56-
[ 8 9 10 11]]
57-
np.mean(A): 5.5
58-
np.mean(A, axis=0): [4. 5. 6. 7.]
59-
np.mean(A, axis=0, keepdims=True): [[4. 5. 6. 7.]]
60-
np.mean(A, axis=1): [1.5 5.5 9.5]
61-
np.mean(A, axis=1, keepdims=True): [[1.5]
62-
[5.5]
63-
[9.5]]
64-
np.mean(A, dtype=np.float64): 5.5
65-
np.mean(A, where=[[True], [False], [False]]): 1.5
155+
Temperature data shape: (2, 12)
156+
157+
Yearly average temperatures:
158+
2023: 14.52°C
159+
2024: 14.53°C
160+
161+
Monthly average temperatures across years:
162+
Jan: 5.00°C
163+
Feb: 6.65°C
164+
Mar: 9.10°C
165+
Apr: 13.80°C
166+
May: 18.05°C
167+
Jun: 22.90°C
168+
Jul: 25.55°C
169+
Aug: 24.75°C
170+
Sep: 19.60°C
171+
Oct: 14.00°C
172+
Nov: 8.80°C
173+
Dec: 6.10°C
174+
175+
Overall average temperature: 14.53°C
176+
```
177+
178+
This example shows how `.mean()` can be used to analyze temperature data by calculating yearly averages, monthly averages across years, and the overall average temperature.
179+
180+
## Codebyte Example: Student Exam Score Analysis
181+
182+
This example demonstrates how to use `.mean()` to analyze student exam scores, a common task in educational assessment:
183+
184+
```codebyte/python
185+
import numpy as np
186+
187+
# Student scores for 4 exams during a semester
188+
# Each row represents a student, each column represents an exam
189+
exam_scores = np.array([
190+
[85, 90, 88, 92], # Student 1
191+
[78, 82, 80, 84], # Student 2
192+
[92, 95, 89, 96], # Student 3
193+
[65, 70, 75, 68], # Student 4
194+
[88, 87, 84, 90] # Student 5
195+
])
196+
197+
# Calculate each student's average score
198+
student_averages = np.mean(exam_scores, axis=1)
199+
200+
# Calculate the class average for each exam
201+
exam_averages = np.mean(exam_scores, axis=0)
202+
203+
# Calculate the overall class average
204+
class_average = np.mean(exam_scores)
205+
206+
print("Student average scores:")
207+
for i, avg in enumerate(student_averages, 1):
208+
print(f"Student {i}: {avg:.1f}")
209+
210+
print("\nClass average for each exam:")
211+
for i, avg in enumerate(exam_averages, 1):
212+
print(f"Exam {i}: {avg:.1f}")
213+
214+
print(f"\nOverall class average: {class_average:.1f}")
66215
```
216+
217+
This practical example shows how `.mean()` can be utilized to calculate various averages from a set of student exam scores, providing insights into individual student performance, exam difficulty, and overall class performance.
218+
219+
## Best Practices
220+
221+
1. **Choose the appropriate axis**: When working with multi-dimensional arrays, carefully select the axis parameter to ensure calculations are performed along the intended dimension.
222+
223+
2. **Consider data type precision**: For scientific calculations requiring high precision, use the default `float64` or explicitly specify it. For less critical applications where memory efficiency is important, consider using `float32`.
224+
225+
3. **Use keepdims for dimensional consistency**: Set `keepdims=True` when maintaining the same number of dimensions in the output as in the input is necessary, which can be useful for broadcasting operations.
226+
227+
## FAQs
228+
229+
<details>
230+
<summary>1. What's the difference between `np.mean()` and `np.average()`?</summary>
231+
<p>While both calculate the arithmetic mean, `np.average()` allows specifying weights for elements, enabling weighted averages, whereas `np.mean()` treats all values equally.</p>
232+
</details>
233+
234+
<details>
235+
<summary>2. How does NumPy's `.mean()` handle `NaN` values?</summary>
236+
<p>By default, `.mean()` will return `NaN` if any of the values being averaged are `NaN`. To ignore `NaN` values, use `np.nanmean()` instead.</p>
237+
</details>
238+
239+
<details>
240+
<summary>3. Can `.mean()` calculate the mean of strings or other non-numeric data?</summary>
241+
<p>No, `.mean()` works only with numeric data. Attempting to calculate the mean of non-numeric data will result in a `TypeError`.</p>
242+
</details>
243+
244+
<details>
245+
<summary>4. How can dimensions be preserved when calculating means along an axis?</summary>
246+
<p>Set the `keepdims=True` parameter to maintain the dimensions of the original array in the output.</p>
247+
</details>
248+
249+
<details>
250+
<summary>5. Is there a performance difference between using `.mean()` method and the `np.mean()` function?</summary>
251+
<p>No significant performance difference exists between `arr.mean()` and `np.mean(arr)` as they both call the same underlying implementation. Choose the syntax that makes code more readable.</p>
252+
</details>

0 commit comments

Comments
 (0)