Learn DataFrames.jl

学习Julia

Julia
Published

June 19, 2023

1 导入库

using CSV
using DataFrames

2 读取数据

df = CSV.read("../data/203st_idw_interp_0.05deg.csv", DataFrame)
# df = CSV.read("I:/order_data/Xujw/203st_idw_interp_0.05deg.csv", DataFrame)
df # copy(df) 复制df

203 rows × 7 columns

site lon lat Eh V Ni Ba
String15 Float64 Float64 Float64 Float64 Float64 Float64
1 19G26 111.8 29.07 33.2 0.08 1.55 126.99
2 22G-80 111.93 28.94 163.6 0.45 2.77 71.83
3 19G22 111.95 28.98 -48.0 0.17 1.18 161.81
4 22G-83 111.98 29.11 64.8 0.25 0.59 136.21
5 22G-81 112.03 29.04 143.1 0.97 1.83 124.54
6 20DT-19G 112.07 29.09 26.0 0.853692 1.28 173.21
7 19G21 112.14 28.87 143.0 0.06 1.16 5.54
8 20DT-17G 112.14 29.0 -60.8 0.590868 0.7 258.56
9 22G-82 112.2 29.08 96.1 0.36 1.04 183.63
10 22G-90 112.2 29.66 -88.1 0.06 1.8 167.24
11 22G-88 112.24 29.27 115.5 0.51 2.29 20.98
12 22G-85 112.26 29.18 90.5 0.310593 4.08 476.34
13 DT02G 112.26 29.59 -29.3 0.53 2.16 234.38
14 18S-38G 112.29 29.07 158.0 0.08 0.71 22.6
15 18S-25G 112.3 29.77 -88.1 0.02 0.22 112.0
16 18S-24G 112.3 29.81 -80.0 0.14 0.2 153.0
17 22G-87 112.31 29.29 40.9 0.55 2.34 421.78
18 DT01G 112.31 29.32 -29.1 0.24 2.13 240.8
19 DT03G 112.31 29.62 81.1 0.14 1.59 404.23
20 18S-23G 112.31 29.84 -103.0 0.2 1.0 210.0
21 18S-27G 112.32 29.67 97.3386 0.26 1.88 125.25
22 21DT-26G 112.33 29.14 -48.6 0.191896 1.70282 502.1
23 21DT-23G 112.33 29.21 -31.0 0.372661 2.26989 904.77
24 18S-37G 112.35 29.05 -49.0 0.19 0.0 65.6
25 22G-84 112.35 29.18 122.5 0.28 2.35 257.15
26 18S-22G 112.35 29.85 -120.0 0.265757 3.79 615.0
27 20DT-29G 112.37 28.96 -88.2 0.179396 1.98 274.92
28 22G-92 112.38 29.53 -35.1 0.1 1.93 111.83
29 18S-20G 112.38 29.88 -142.0 0.47 3.3 541.0
30 22G-91 112.39 29.66 177.2 1.88 4.19 164.61

3 取出某一行

3.1 方法1

df.V # df."V"
203-element Vector{Float64}:
 0.08
 0.45
 0.17
 0.25
 0.97
 0.85369167324573
 0.06
 0.590868123957716
 0.36
 0.06
 0.51
 0.310592788968676
 0.53
 ⋮
 0.11
 0.08
 0.47
 0.15
 0.11
 0.82
 0.07
 0.36
 0.12
 0.2
 0.3
 0.288668959788242

3.2 方法2

df[:, :V] # 这种情况下取值更改不会改变之前的数据框内容
203-element Vector{Float64}:
 0.08
 0.45
 0.17
 0.25
 0.97
 0.85369167324573
 0.06
 0.590868123957716
 0.36
 0.06
 0.51
 0.310592788968676
 0.53
 ⋮
 0.11
 0.08
 0.47
 0.15
 0.11
 0.82
 0.07
 0.36
 0.12
 0.2
 0.3
 0.288668959788242

4 列名

names(df, AbstractString) # 筛选出不同数据类型的列名
names(df, Float64)
6-element Vector{String}:
 "lon"
 "lat"
 "Eh"
 "V"
 "Ni"
 "Ba"

5 创建一个0行但包含全部列名的df

empty(df) # empty!()直接让目标df清空

0 rows × 7 columns

site lon lat Eh V Ni Ba
String15 Float64 Float64 Float64 Float64 Float64 Float64

6 查看df基本信息

df维度:

size(df) # 维度
nrow(df) # ncol()
203

统计信息,类似R语言summary

describe(df, cols=4:7)

4 rows × 7 columns

variable mean min median max nmissing eltype
Symbol Float64 Float64 Float64 Float64 Int64 DataType
1 Eh -54.8574 -278.3 -72.2 177.2 0 Float64
2 V 0.344441 0.0 0.259441 1.88 0 Float64
3 Ni 2.46611 0.0 2.14 8.39 0 Float64
4 Ba 253.259 0.01 195.07 1752.0 0 Float64

7 筛选df

df[:, [:lon, :lat]]

203 rows × 2 columns

lon lat
Float64 Float64
1 111.8 29.07
2 111.93 28.94
3 111.95 28.98
4 111.98 29.11
5 112.03 29.04
6 112.07 29.09
7 112.14 28.87
8 112.14 29.0
9 112.2 29.08
10 112.2 29.66
11 112.24 29.27
12 112.26 29.18
13 112.26 29.59
14 112.29 29.07
15 112.3 29.77
16 112.3 29.81
17 112.31 29.29
18 112.31 29.32
19 112.31 29.62
20 112.31 29.84
21 112.32 29.67
22 112.33 29.14
23 112.33 29.21
24 112.35 29.05
25 112.35 29.18
26 112.35 29.85
27 112.37 28.96
28 112.38 29.53
29 112.38 29.88
30 112.39 29.66

8 条件筛选

9 常规

df[:, Not([:site, :lon, :lat])]

203 rows × 4 columns

Eh V Ni Ba
Float64 Float64 Float64 Float64
1 33.2 0.08 1.55 126.99
2 163.6 0.45 2.77 71.83
3 -48.0 0.17 1.18 161.81
4 64.8 0.25 0.59 136.21
5 143.1 0.97 1.83 124.54
6 26.0 0.853692 1.28 173.21
7 143.0 0.06 1.16 5.54
8 -60.8 0.590868 0.7 258.56
9 96.1 0.36 1.04 183.63
10 -88.1 0.06 1.8 167.24
11 115.5 0.51 2.29 20.98
12 90.5 0.310593 4.08 476.34
13 -29.3 0.53 2.16 234.38
14 158.0 0.08 0.71 22.6
15 -88.1 0.02 0.22 112.0
16 -80.0 0.14 0.2 153.0
17 40.9 0.55 2.34 421.78
18 -29.1 0.24 2.13 240.8
19 81.1 0.14 1.59 404.23
20 -103.0 0.2 1.0 210.0
21 97.3386 0.26 1.88 125.25
22 -48.6 0.191896 1.70282 502.1
23 -31.0 0.372661 2.26989 904.77
24 -49.0 0.19 0.0 65.6
25 122.5 0.28 2.35 257.15
26 -120.0 0.265757 3.79 615.0
27 -88.2 0.179396 1.98 274.92
28 -35.1 0.1 1.93 111.83
29 -142.0 0.47 3.3 541.0
30 177.2 1.88 4.19 164.61
df[:, Between(:site, :lat)]

203 rows × 3 columns

site lon lat
String15 Float64 Float64
1 19G26 111.8 29.07
2 22G-80 111.93 28.94
3 19G22 111.95 28.98
4 22G-83 111.98 29.11
5 22G-81 112.03 29.04
6 20DT-19G 112.07 29.09
7 19G21 112.14 28.87
8 20DT-17G 112.14 29.0
9 22G-82 112.2 29.08
10 22G-90 112.2 29.66
11 22G-88 112.24 29.27
12 22G-85 112.26 29.18
13 DT02G 112.26 29.59
14 18S-38G 112.29 29.07
15 18S-25G 112.3 29.77
16 18S-24G 112.3 29.81
17 22G-87 112.31 29.29
18 DT01G 112.31 29.32
19 DT03G 112.31 29.62
20 18S-23G 112.31 29.84
21 18S-27G 112.32 29.67
22 21DT-26G 112.33 29.14
23 21DT-23G 112.33 29.21
24 18S-37G 112.35 29.05
25 22G-84 112.35 29.18
26 18S-22G 112.35 29.85
27 20DT-29G 112.37 28.96
28 22G-92 112.38 29.53
29 18S-20G 112.38 29.88
30 22G-91 112.39 29.66

9.1 正则表达式

df[:, r"l"]

203 rows × 2 columns

lon lat
Float64 Float64
1 111.8 29.07
2 111.93 28.94
3 111.95 28.98
4 111.98 29.11
5 112.03 29.04
6 112.07 29.09
7 112.14 28.87
8 112.14 29.0
9 112.2 29.08
10 112.2 29.66
11 112.24 29.27
12 112.26 29.18
13 112.26 29.59
14 112.29 29.07
15 112.3 29.77
16 112.3 29.81
17 112.31 29.29
18 112.31 29.32
19 112.31 29.62
20 112.31 29.84
21 112.32 29.67
22 112.33 29.14
23 112.33 29.21
24 112.35 29.05
25 112.35 29.18
26 112.35 29.85
27 112.37 28.96
28 112.38 29.53
29 112.38 29.88
30 112.39 29.66

10 转换函数

共五种方法:

  • combine

  • select

  • transformselect

导入分组统计所需包

using Statistics

V的平均值

combine(df, :V => mean => :mean_V)

1 rows × 1 columns

mean_V
Float64
1 0.344441

筛选lon

select(df, :lon => mean => :mean_lon)

203 rows × 1 columns

mean_lon
Float64
1 112.868
2 112.868
3 112.868
4 112.868
5 112.868
6 112.868
7 112.868
8 112.868
9 112.868
10 112.868
11 112.868
12 112.868
13 112.868
14 112.868
15 112.868
16 112.868
17 112.868
18 112.868
19 112.868
20 112.868
21 112.868
22 112.868
23 112.868
24 112.868
25 112.868
26 112.868
27 112.868
28 112.868
29 112.868
30 112.868

随便选

select(df, r"l", "site", :V)

203 rows × 4 columns

lon lat site V
Float64 Float64 String15 Float64
1 111.8 29.07 19G26 0.08
2 111.93 28.94 22G-80 0.45
3 111.95 28.98 19G22 0.17
4 111.98 29.11 22G-83 0.25
5 112.03 29.04 22G-81 0.97
6 112.07 29.09 20DT-19G 0.853692
7 112.14 28.87 19G21 0.06
8 112.14 29.0 20DT-17G 0.590868
9 112.2 29.08 22G-82 0.36
10 112.2 29.66 22G-90 0.06
11 112.24 29.27 22G-88 0.51
12 112.26 29.18 22G-85 0.310593
13 112.26 29.59 DT02G 0.53
14 112.29 29.07 18S-38G 0.08
15 112.3 29.77 18S-25G 0.02
16 112.3 29.81 18S-24G 0.14
17 112.31 29.29 22G-87 0.55
18 112.31 29.32 DT01G 0.24
19 112.31 29.62 DT03G 0.14
20 112.31 29.84 18S-23G 0.2
21 112.32 29.67 18S-27G 0.26
22 112.33 29.14 21DT-26G 0.191896
23 112.33 29.21 21DT-23G 0.372661
24 112.35 29.05 18S-37G 0.19
25 112.35 29.18 22G-84 0.28
26 112.35 29.85 18S-22G 0.265757
27 112.37 28.96 20DT-29G 0.179396
28 112.38 29.53 22G-92 0.1
29 112.38 29.88 18S-20G 0.47
30 112.39 29.66 22G-91 1.88

改列名

select(df, :lon => :new_lon, :)

203 rows × 8 columns

new_lon site lon lat Eh V Ni Ba
Float64 String15 Float64 Float64 Float64 Float64 Float64 Float64
1 111.8 19G26 111.8 29.07 33.2 0.08 1.55 126.99
2 111.93 22G-80 111.93 28.94 163.6 0.45 2.77 71.83
3 111.95 19G22 111.95 28.98 -48.0 0.17 1.18 161.81
4 111.98 22G-83 111.98 29.11 64.8 0.25 0.59 136.21
5 112.03 22G-81 112.03 29.04 143.1 0.97 1.83 124.54
6 112.07 20DT-19G 112.07 29.09 26.0 0.853692 1.28 173.21
7 112.14 19G21 112.14 28.87 143.0 0.06 1.16 5.54
8 112.14 20DT-17G 112.14 29.0 -60.8 0.590868 0.7 258.56
9 112.2 22G-82 112.2 29.08 96.1 0.36 1.04 183.63
10 112.2 22G-90 112.2 29.66 -88.1 0.06 1.8 167.24
11 112.24 22G-88 112.24 29.27 115.5 0.51 2.29 20.98
12 112.26 22G-85 112.26 29.18 90.5 0.310593 4.08 476.34
13 112.26 DT02G 112.26 29.59 -29.3 0.53 2.16 234.38
14 112.29 18S-38G 112.29 29.07 158.0 0.08 0.71 22.6
15 112.3 18S-25G 112.3 29.77 -88.1 0.02 0.22 112.0
16 112.3 18S-24G 112.3 29.81 -80.0 0.14 0.2 153.0
17 112.31 22G-87 112.31 29.29 40.9 0.55 2.34 421.78
18 112.31 DT01G 112.31 29.32 -29.1 0.24 2.13 240.8
19 112.31 DT03G 112.31 29.62 81.1 0.14 1.59 404.23
20 112.31 18S-23G 112.31 29.84 -103.0 0.2 1.0 210.0
21 112.32 18S-27G 112.32 29.67 97.3386 0.26 1.88 125.25
22 112.33 21DT-26G 112.33 29.14 -48.6 0.191896 1.70282 502.1
23 112.33 21DT-23G 112.33 29.21 -31.0 0.372661 2.26989 904.77
24 112.35 18S-37G 112.35 29.05 -49.0 0.19 0.0 65.6
25 112.35 22G-84 112.35 29.18 122.5 0.28 2.35 257.15
26 112.35 18S-22G 112.35 29.85 -120.0 0.265757 3.79 615.0
27 112.37 20DT-29G 112.37 28.96 -88.2 0.179396 1.98 274.92
28 112.38 22G-92 112.38 29.53 -35.1 0.1 1.93 111.83
29 112.38 18S-20G 112.38 29.88 -142.0 0.47 3.3 541.0
30 112.39 22G-91 112.39 29.66 177.2 1.88 4.19 164.61

11 尝试管道符

参考学习链接:Comparing dplyr vs DataFrames.jl

using Pipe
@time @pipe df |>
  combine(_, :V => (x -> x) => :V_mean, :) |>
  (df -> begin
    df.new_col = df.V
    df
  end) |>
  filter(:V_mean => v -> v >0, _) |>
  sort(_, :new_col)
  0.973227 seconds (3.08 M allocations: 205.192 MiB, 5.03% gc time, 99.52% compilation time)

193 rows × 9 columns

V_mean site lon lat Eh V Ni Ba new_col
Float64 String15 Float64 Float64 Float64 Float64 Float64 Float64 Float64
1 0.01 12JH093 112.85 29.94 -218.5 0.01 2.09 184.509 0.01
2 0.02 18S-25G 112.3 29.77 -88.1 0.02 0.22 112.0 0.02
3 0.04 12JH076 112.58 30.29 -90.0 0.04 2.47 170.86 0.04
4 0.04 18G-05 113.51 29.84 87.5 0.04 0.31 59.45 0.04
5 0.05 22G-86 112.43 29.39 -7.6 0.05 1.47 160.13 0.05
6 0.05 22G-73 112.81 29.42 -78.5 0.05 3.49 120.7 0.05
7 0.05 18G-11 113.26 29.61 -83.2 0.05 0.06 275.26 0.05
8 0.0558746 22G-70 113.56 29.88 -50.4 0.0558746 8.39 58.55 0.0558746
9 0.06 19G21 112.14 28.87 143.0 0.06 1.16 5.54 0.06
10 0.06 22G-90 112.2 29.66 -88.1 0.06 1.8 167.24 0.06
11 0.06 19G14 112.69 29.15 -81.0 0.06 0.99 120.31 0.06
12 0.07 18G-19 112.63 29.78 -14.3 0.07 0.45 510.65 0.07
13 0.07 18G-25 112.64 29.95 -72.0 0.07 1.31 58.46 0.07
14 0.07 18G-26 112.68 29.97 -103.0 0.07 0.52 34.09 0.07
15 0.07 18G-28 112.72 29.96 -104.0 0.07 0.8 63.87 0.07
16 0.07 12JH163 113.02 29.55 -135.8 0.07 3.62 430.359 0.07
17 0.07 18G-03 113.61 29.94 -42.7 0.07 1.62 289.92 0.07
18 0.07 11JH183 113.77 30.07 -54.8787 0.07 0.44 155.31 0.07
19 0.08 19G26 111.8 29.07 33.2 0.08 1.55 126.99 0.08
20 0.08 18S-38G 112.29 29.07 158.0 0.08 0.71 22.6 0.08
21 0.08 19G18 112.65 28.98 55.0 0.08 0.49 70.17 0.08
22 0.08 22G-74 112.78 29.29 -64.1 0.08 4.06 412.63 0.08
23 0.08 11JH164 113.39 30.07 -77.5002 0.08 0.59 157.01 0.08
24 0.08 11JH161 113.69 30.09 -54.3 0.08 0.63 213.16 0.08
25 0.09 18G-27 112.72 29.92 -75.6 0.09 0.09 39.12 0.09
26 0.09 22G-20 112.97 29.52 -124.0 0.09 7.03 370.84 0.09
27 0.09 22G-12 113.06 29.96 -86.4 0.09 3.28 180.59 0.09
28 0.09 11JH162 113.64 30.09 -72.7851 0.09 0.59 100.8 0.09
29 0.1 22G-92 112.38 29.53 -35.1 0.1 1.93 111.83 0.1
30 0.1 19G19 112.75 28.91 78.5 0.1 0.45 15.27 0.1
Back to top