๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค 100๋ฒˆ์˜ ๋…ธํฌ(๊ตฌ์กฐํ™” ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌํŽธ) โ€“ R Part 5 (Q81 to Q100)

๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋™์ž‘์„ ํ•œ๋‹ค.

๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ 'df_product'์˜ 'unit_price'์™€ 'unit_cost' ์—ด์˜ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๋ฐ˜์˜ฌ๋ฆผํ•œ๋‹ค.
price_mean <- round(mean(df_product$unit_price, na.rm = TRUE)) cost_mean <- round(mean(df_product$unit_cost, na.rm = TRUE))

round() ํ•จ์ˆ˜๋Š” ํ‰๊ท ๊ฐ’์„ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๋ฐ˜์˜ฌ๋ฆผํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.
na.rm=TRUE๋Š” ๊ณ„์‚ฐ์—์„œ ๊ฒฐ์†๊ฐ’์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค.

df_product'์˜ 'unit_price'์™€ 'unit_cost' ์—ด์˜ ๊ฒฐ์†๊ฐ’์„ ๊ฐ๊ฐ์˜ ํ‰๊ท ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•œ๋‹ค.
df_product_2 <- df_product %>% replace_na(list(unit_price = price_mean, unit_cost = cost_mean))

replace_na()๋Š” 'tidyr' ํŒจํ‚ค์ง€์—์„œ ๊ฒฐ์†๋œ ๊ฐ’์„ ์•ž์„œ ๊ณ„์‚ฐํ•œ ํ‰๊ท ๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

์—ฐ์‚ฐ์ž %>%(ํŒŒ์ดํ”„ ์—ฐ์‚ฐ์ž๋ผ๊ณ ๋„ ํ•จ)๋Š” ๋‘ ๊ฐœ์˜ ์•ก์…˜์„ ํ•จ๊ป˜ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, ์ฒซ ๋ฒˆ์งธ ์•ก์…˜์˜ ์ถœ๋ ฅ์€ ๋‘ ๋ฒˆ์งธ ์•ก์…˜์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

sapply() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜์ •ํ•œ 'df_product_2' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

sapply(df_product_2, function(x) sum(is.na(x)))
sapply() ํ•จ์ˆ˜๋Š” 'df_product_2' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์— ํ•จ์ˆ˜(function(x) sum(is.na(x)))๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

ํ•จ์ˆ˜์˜ sum(is.na(x)) ๋ถ€๋ถ„์€ 'df_product_2' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด(x)์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’(NA)์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
ย 
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ 'df_product'์˜ 'unit_price'์™€ 'unit_cost' ์—ด์˜ ์ค‘์•™๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๋ฐ˜์˜ฌ๋ฆผํ•œ๋‹ค.
price_median <- round(median(df_product$unit_price, na.rm = TRUE)) cost_median <- round(median(df_product$unit_cost, na.rm = TRUE))

median() ํ•จ์ˆ˜๋Š” 'unit_price' ๋ฐ 'unit_cost' ์—ด์˜ ์ค‘์•™๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

round() ํ•จ์ˆ˜๋Š” ์ค‘์•™๊ฐ’์„ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๋ฐ˜์˜ฌ๋ฆผํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

na.rm=TRUE๋Š” ๊ณ„์‚ฐ์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’์„ ์ œ๊ฑฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

df_product'์˜ 'unit_price' ๋ฐ 'unit_cost' ์ปฌ๋Ÿผ์˜ ๊ฒฐ์†๋œ ๊ฐ’์„ ๊ฐ๊ฐ์˜ ์ค‘์•™๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•œ๋‹ค.
df_product_3 <- df_product %>% replace_na(list(unit_price = price_median, unit_cost = cost_median))

replace_na()๋Š” 'tidyr' ํŒจํ‚ค์ง€์˜ ๊ฒƒ์œผ๋กœ, ๊ฒฐ์†๋œ ๊ฐ’์„ ์•ž์„œ ๊ณ„์‚ฐํ•œ ์ค‘์•™๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

์—ฐ์‚ฐ์ž %>%(ํŒŒ์ดํ”„ ์—ฐ์‚ฐ์ž๋ผ๊ณ ๋„ ํ•จ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๊ฐœ์˜ ์•ก์…˜์„ ์—ฐ๊ฒฐํ•˜๊ณ  ์ฒซ ๋ฒˆ์งธ ์•ก์…˜์˜ ์ถœ๋ ฅ์„ ๋‘ ๋ฒˆ์งธ ์•ก์…˜์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

sapply() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜์ •๋œ 'df_product_3' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
sapply(df_product_3, function(x) sum(is.na(x)))

sapply() ํ•จ์ˆ˜๋Š” 'df_product_3' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์— ํ•จ์ˆ˜(function(x) sum(is.na(x)))๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

ํ•จ์ˆ˜์˜ sum(is.na(x)) ๋ถ€๋ถ„์€ 'df_product_3' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด(x)์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’(NA)์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋™์ž‘์„ ํ•œ๋‹ค.

'df_product' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ 'category_small_cd' ์ปฌ๋Ÿผ์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค.
df_product_4 <- df_product %>% group_by(category_small_cd)

์—ฐ์‚ฐ์ž %>%๋Š” ๋‘ ๊ฐœ์˜ ์•ก์…˜์„ ์—ฐ๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋ฉฐ, ์ฒซ ๋ฒˆ์งธ ์•ก์…˜์˜ ์ถœ๋ ฅ์€ ๋‘ ๋ฒˆ์งธ ์•ก์…˜์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

'dplyr' ํŒจํ‚ค์ง€์˜ 'group_by()' ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 'df_product' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ 'category_small_cd' ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค.

๊ทธ๋ฃนํ™”๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ๊ทธ๋ฃน์˜ 'unit_price' ๋ฐ 'unit_cost' ์—ด์˜ ์ค‘์•™๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๋ฐ˜์˜ฌ๋ฆผํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์š”์•ฝํ•œ๋‹ค.
summarise(price_median = round(median(unit_price, na.rm = TRUE)), cost_median = round(median(unit_cost, na.rm = TRUE)), .groups = "drop")

'summarise()' ํ•จ์ˆ˜๋Š” 'dplyr' ํŒจํ‚ค์ง€์—์„œ ์‚ฌ์šฉํ•˜๋ฉฐ, ๊ทธ๋ฃนํ™”ํ•œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ๊ทธ๋ฃน์˜ 'unit_price'์™€ 'unit_cost' ์—ด์˜ ์ค‘์•™๊ฐ’์„ ์‚ฐ์ถœํ•œ๋‹ค.

'round()' ํ•จ์ˆ˜๋Š” ์ค‘์•™๊ฐ’์„ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๋ฐ˜์˜ฌ๋ฆผํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

na.rm=TRUE๋Š” ๊ณ„์‚ฐ์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’์„ ์ œ๊ฑฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

.groups="drop" ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ์—์„œ ๊ทธ๋ฃนํ™” ์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์š”์•ฝ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๊ณผ ์›๋ž˜์˜ 'df_product' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ 'category_small_cd' ์ปฌ๋Ÿผ์œผ๋กœ ๊ฒฐํ•ฉํ•œ๋‹ค.
inner_join(df_product, by = "category_small_cd")

'inner_join()' ํ•จ์ˆ˜๋Š” 'dplyr' ํŒจํ‚ค์ง€์—์„œ ์‚ฌ์šฉ๋˜๋ฉฐ, 'category_small_cd' ์—ด์— ์˜ํ•ด ์š”์•ฝ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๊ณผ ์›๋ž˜์˜ 'df_product' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๊ฒฐํ•ฉํ•œ๋‹ค.

๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋ชจ๋“  ์—ด์„ ๋ชจ๋‘ ๊ฐ–๊ฒŒ ๋œ๋‹ค.

๊ฒฐํ•ฉ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ 'unit_price' ๋ฐ 'unit_cost' ์—ด์˜ ๋ˆ„๋ฝ๋œ ๊ฐ’์„ ํ•ด๋‹น 'category_small_cd' ๊ทธ๋ฃน์˜ ๊ฐ ์ค‘์•™๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.
mutate(unit_price = ifelse(is.na(unit_price), price_median, unit_price), unit_cost = ifelse(is.na(unit_cost), cost_median, unit_cost))

'mutate()' ํ•จ์ˆ˜๋Š” 'dplyr' ํŒจํ‚ค์ง€์—์„œ ๊ฒฐํ•ฉ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ๊ฐ’์„ ๋ณ€๊ฒฝํ•œ ์ƒˆ๋กœ์šด ์—ด์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

ifelse() ํ•จ์ˆ˜๋Š” 'unit_price'์™€ 'unit_cost' ์—ด์—์„œ ๊ฐ’์ด ๋ˆ„๋ฝ(NA)๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ , ํ•ด๋‹น ๊ฐ’์ด ์†ํ•œ 'category_small_cd' ๊ทธ๋ฃน์˜ ํ•ด๋‹น ์ค‘๊ฐ„๊ฐ’์œผ๋กœ ๋Œ€์ฒดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

sapply() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ€๊ฒฝ ํ›„ 'df_product_4' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’์˜ ๊ฐœ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
sapply(df_product_4, function(x) sum(is.na(x)))

sapply() ํ•จ์ˆ˜๋Š” 'df_product_4' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์— ํ•จ์ˆ˜(function(x) sum(is.na(x)))๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.

ํ•จ์ˆ˜์˜ sum(is.na(x)) ๋ถ€๋ถ„์€ 'df_product_4' ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด(x)์—์„œ ๋ˆ„๋ฝ๋œ ๊ฐ’(NA)์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
์„ค๋ช…:

์ด ์ฝ”๋“œ์—์„œ๋Š” df_receipt๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ๋Œ€ํ•ด ๋ช‡ ๊ฐ€์ง€ ๋ฐ์ดํ„ฐ ์กฐ์ž‘์„ ํ•˜๊ณ  ์žˆ๋Š”๋ฐ, ์ด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—๋Š” ๋งค์ถœ ๊ฑฐ๋ž˜์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์„ ๊ฒƒ์œผ๋กœ ์ถ”์ •๋œ๋‹ค. ์•„๋ž˜๋Š” ๊ฐ ์ฝ”๋“œ์˜ ๊ฐ ํ–‰์ด ๋ฌด์—‡์„ ํ•˜๊ณ  ์žˆ๋Š”์ง€๋ฅผ ๋ถ„์„ํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

df_receipt_2019 <- df_receipt %>% filter(20190101 <= sales_ymd & sales_ymd <= 20191231) %>% group_by(customer_id) %>% summise(amount_2019 = sum( amount), .groups = "drop")

์ด ์ฝ”๋“œ ๋ผ์ธ์€ df_receipt_2019๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๊ณ , ์›๋ž˜์˜ df_receipt ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ํ•„ํ„ฐ๋งํ•˜์—ฌ 2019๋…„ 1์›” 1์ผ๋ถ€ํ„ฐ 2019๋…„ 12์›” 31์ผ ์‚ฌ์ด์— ๋ฐœ์ƒํ•œ ํŒ๋งค ๊ฑฐ๋ž˜๋งŒ ํฌํ•จํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  customer_id๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ํ•ด๋‹น ๊ธฐ๊ฐ„ ๋™์•ˆ ๊ฐ ๊ณ ๊ฐ์ด ์‚ฌ์šฉํ•œ ๊ธˆ์•ก์˜ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ amount_2019๋ผ๋Š” ์ƒˆ๋กœ์šด ์—ด์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

df_receipt_all <- df_receipt %>% group_by(customer_id) %>% summise(amount_all = sum(amount), .groups = "drop")

์ด ์ฝ”๋“œ์—์„œ๋Š” df_receipt_all์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๊ณ , df_receipt ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ customer_id๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ๊ณ ๊ฐ์ด ๋ชจ๋“  ๊ฑฐ๋ž˜์—์„œ ์‚ฌ์šฉํ•œ ์ด ๊ธˆ์•ก์„ ๊ณ„์‚ฐํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ amount_all์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ์—ด์— ์ €์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

df_sales_rate <- left_join(df_customer["customer_id"], df_receipt_2019, by = "customer_id") %>% left_join(df_receipt_all, by = "customer_id") %>% replace_na(list(quantity_2019 = 0, amount_all = 0)) %>% mutate(amount_rate = ifelse(amount_all == 0, 0, amount_2019 / amount_all)) %>% filter( amount_rate > 0) %>% slice(1: 10)

์ด ์ฝ”๋“œ์—์„œ๋Š” df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„(๊ณ ๊ฐ์˜ ์†์„ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Œ)๊ณผ df_receipt_2019 ๋ฐ df_receipt_all ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ customer_id ์—ด๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ df_sales_rate๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ amount_2019์™€ amount_all ์—ด์˜ ๊ฒฐ์†๊ฐ’์„ 0์œผ๋กœ ๋Œ€์ฒดํ•˜๊ณ , 2019๋…„์— ๋ฐœ์ƒํ•œ ๊ณ ๊ฐ ์ด ์ง€์ถœ์˜ ๋น„์œจ์„ ๋‚˜ํƒ€๋‚ด๋Š” amount_rate๋ผ๋Š” ์ƒˆ๋กœ์šด ์—ด์„ ๊ณ„์‚ฐํ•˜์—ฌ(์ฆ‰, amount_2019 / amount_all) 2019๋…„์— ๋ˆ์„ ์“ด ๊ณ ๊ฐ๋งŒ ํฌํ•จํ•˜๋„๋ก ๋ฐ์ดํ„ฐ๋ฅผ ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค. (์ฆ‰, amount_rate > 0)์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํ•„ํ„ฐ๋งํ•˜๊ณ , ๋งˆ์ง€๋ง‰์œผ๋กœ amount_rate ๊ฐ’์— ๋”ฐ๋ผ ์ƒ์œ„ 10๋ช…์˜ ๊ณ ๊ฐ์„ ์„ ํƒํ•œ๋‹ค(์ฆ‰, slice(1:10)).

์ „์ฒด์ ์œผ๋กœ ์ด ์ฝ”๋“œ๋Š” 2019๋…„์— ์ด ์ง€์ถœ ๋น„์œจ์ด ๊ฐ€์žฅ ๋†’์€ ์ƒ์œ„ 10๋ช…์˜ ๊ณ ๊ฐ์„ ์‹๋ณ„ํ•˜์—ฌ ํƒ€๊ฒŸ ๋งˆ์ผ€ํŒ… ๋ฐ ๊ณ ๊ฐ ์œ ์ง€ ๋…ธ๋ ฅ์— ๋„์›€์ด ๋  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.
ย 
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

df_geocode ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ 3๊ฐœ์˜ ์—ด(postal_cd, longitude, latitude)์„ ์„ ํƒํ•˜์—ฌ df_geocode_1์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•œ๋‹ค.

df_geocode_1 ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ postal_cd๋กœ ๊ทธ๋ฃนํ™”๋˜๋ฉฐ, ๊ฐ ๊ทธ๋ฃน์˜ ๊ฒฝ๋„์™€ ์œ„๋„์˜ ํ‰๊ท ๊ฐ’์ด summarise() ํ•จ์ˆ˜๋กœ ๊ณ„์‚ฐ๋œ๋‹ค. ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ postal_cd์˜ ๊ณ ์œ ํ•œ ๊ฐ’๋งˆ๋‹ค ํ•œ ์ค„, postal_cd, m_longitude, m_latitude์˜ ์—ด์„ ํฌํ•จํ•œ๋‹ค.

df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ ๊ณตํ†ต postal_cd ์—ด์„ ๊ธฐ์ค€์œผ๋กœ ํ–‰์„ ์ผ์น˜์‹œํ‚ค๋Š” inner_join() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ df_geocode_1 ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๊ณผ ๊ฒฐํ•ฉ๋œ๋‹ค. ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋ชจ๋“  ์—ด์„ ํฌํ•จํ•œ๋‹ค.

head() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„(df_customer_1)์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ํ‘œ์‹œํ•œ๋‹ค. ์ด ํ–‰์—๋Š” ๊ณ ๊ฐ์— ๋Œ€ํ•œ ์ •๋ณด์™€ ์ง€๋ฆฌ์  ์œ„์น˜(์œ„๋„ ๋ฐ ๊ฒฝ๋„)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

calc_distance()๋ผ๋Š” ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ  4๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜ x1, y1, y1, x2, y2๋ฅผ ๋ฐ›๋Š”๋‹ค. ์ด ํ•จ์ˆ˜๋Š” ๋‘ ์ง€๋ฆฌ์  ์œ„์น˜(์œ„๋„์™€ ๊ฒฝ๋„๋กœ ์ง€์ •) ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ—ˆ๋ฒ„์‹ ์˜ ๊ณต์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•œ๋‹ค. ๊ณ„์‚ฐ๋œ ๊ฑฐ๋ฆฌ๋Š” ํ‚ฌ๋กœ๋ฏธํ„ฐ ๋‹จ์œ„๋กœ ๋ฐ˜ํ™˜๋œ๋‹ค.

df_customer_1 ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ inner_join() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ df_store ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๊ณผ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ฒฐํ•ฉ์€ df_customer_1์˜ application_store_cd ์—ด๊ณผ df_store์˜ store_cd ์—ด์— ๋Œ€ํ•ด ์ˆ˜ํ–‰๋œ๋‹ค.

๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ rename() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์ฃผ์†Œ ์ปฌ๋Ÿผ์„ ๋ฐ˜์˜ํ•˜๋„๋ก ์ด๋ฆ„์ด ๋ณ€๊ฒฝ๋œ๋‹ค.

mutate() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— distance๋ผ๋Š” ์ƒˆ๋กœ์šด ์—ด์ด ์ถ”๊ฐ€๋˜๋ฉฐ, calc_distance() ํ•จ์ˆ˜๋Š” ๊ณ ๊ฐ๊ณผ ๋งค์žฅ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. ์ด ํ•จ์ˆ˜๋Š” df_customer_1์˜ ์œ„๋„์™€ ๊ฒฝ๋„์˜ ํ‰๊ท ๊ฐ’(m_latitude์™€ m_longitude)๊ณผ df_store์˜ ์œ„๋„์™€ ๊ฒฝ๋„ ๊ฐ’์ด๋ผ๋Š” 4๊ฐœ์˜ ์ธ์ž๋กœ ํ˜ธ์ถœ๋œ๋‹ค. ์–ป์–ด์ง„ ๊ฑฐ๋ฆฌ ๊ฐ’์€ distance ์—ด์— ์ €์žฅ๋œ๋‹ค.

๋‹ค์Œ์œผ๋กœ select() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ customer_id, customer_address, store_address, distance ์—ด๋งŒ ํฌํ•จํ•˜๋„๋ก ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์„œ๋ธŒ์…‹ํ•œ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ slice() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ํ‘œ์‹œํ•œ๋‹ค. ์ด ํ–‰์—๋Š” ๊ณ ๊ฐ์— ๋Œ€ํ•œ ์ •๋ณด, ๊ณ ๊ฐ ์ฃผ์†Œ, ๋งค์žฅ ์ฃผ์†Œ, ๊ณ ๊ฐ๊ณผ ๋งค์žฅ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋‹ค.
ย 
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

df_receipt ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ๊ฐ ๊ณ ๊ฐ์˜ ๋งค์ถœ ๊ธˆ์•ก์˜ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  group_by()์™€ summarise() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ df_sales_amount์— ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

df_sales_amount <- df_receipt %>% group_by(customer_id) %>% summise(sum_amount = sum(amount), .groups = "drop")

customer_id ์ปฌ๋Ÿผ์„ ๊ธฐ์ค€์œผ๋กœ df_customer์™€ df_sales_amount ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ๊ฐ„ ์ขŒ๊ฒฐํ•ฉ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , mutate() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ sum_amount ์ปฌ๋Ÿผ์˜ NA ๊ฐ’์„ 0์œผ๋กœ ๋ฐ”๊พธ๊ณ , ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ sum_amount์™€ customer_id์˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๊ณ , customer_name๊ณผ portal_cd ์ปฌ๋Ÿผ์„ ๊ธฐ์ค€์œผ๋กœ ๊ณ ์œ ํ•œ ํ–‰๋งŒ ์œ ์ง€ํ•œ๋‹ค.

df_customer_u <- left_join(df_customer, df_sales_amount, by = "customer_id") %>% mutate(sum_amount = ifelse(is.na(sum_amount), 0, sum_amount) %>% arrange(desc(sum_amount), customer_id) %>% distinct(customer_name, postal_cd, .keep_all = TRUE)

์›๋ณธ df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„, ๊ฒฐ๊ณผ df_customer_u ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ํ–‰ ์ˆ˜์™€ ๊ทธ ์ฐจ์ด๋ฅผ ์ถœ๋ ฅํ•˜์—ฌ ์ค‘๋ณต๋œ ํ–‰์ด ์–ผ๋งˆ๋‚˜ ์‚ญ์ œ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•œ๋‹ค.

print(paste( "df_customer_cnt:", nrow(df_customer), "df_customer_u_cnt:", nrow(df_customer_u), "diff:", nrow(df_customer) - nrow(df_ customer_u))))
ย 
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

df_customer์™€ df_customer_u ์‚ฌ์ด์—์„œ "customer_name"๊ณผ "postal_cd"๋ผ๋Š” ์ปฌ๋Ÿผ์„ ๊ฒฐํ•ฉ ํ‚ค๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋‚ด๋ถ€ ๊ฒฐํ•ฉ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ df_customer_n์— ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค.

df_customer_n์˜ "customer_id.x" ์—ด์€ "customer_id"๋กœ ์ด๋ฆ„์ด ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.

df_customer_u์˜ "customer_id.y" ์—ด์€ "integration_id"๋กœ ์ด๋ฆ„์„ ๋ฐ”๊พผ๋‹ค.

df_customer_n์˜ "customer_id"์™€ "integration_id"์—ด์˜ ๊ณ ์œ  ๊ฐ’์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ด ์ˆ˜์˜ ์ฐจ์ด๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.

์ด ์ฝ”๋“œ์˜ ๋ชฉ์ ์€ ์›๋ž˜ df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID ์ˆ˜์™€ df_customer์™€ ๊ณ ๊ฐ๋ณ„ ์ด ๋งค์ถœ ๊ธˆ์•ก(df_sales_amount)์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋งŒ๋“  df_customer_u ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID ์ˆ˜์™€ ๋น„๊ต ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ „์ œ ์กฐ๊ฑด์œผ๋กœ df_customer์—๋Š” ๊ณ ๊ฐ ์ด๋ฆ„๊ณผ ์šฐํŽธ๋ฒˆํ˜ธ๊ฐ€ ์ค‘๋ณต๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ df_customer์˜ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID ์ˆ˜๊ฐ€ df_customer_u์— ๋น„ํ•ด ๋” ๋งŽ์„ ๊ฒƒ์ด๋‹ค. ์ฝ”๋“œ ๋์— ํ‘œ์‹œ๋˜๋Š” ๋‘ ์นด์šดํŠธ์˜ ์ฐจ์ด๋Š” ๊ฒฐํ•ฉ ์ž‘์—…์„ ํ†ตํ•ด ์‚ญ์ œ๋œ ์ค‘๋ณต๋œ ๊ณ ๊ฐ ID์˜ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
ย 
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” R์˜ rsample ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆˆ๋‹ค.

set.seed(71)๋Š” ๊ฒฐ๊ณผ์˜ ์žฌํ˜„์„ฑ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋ฌด์ž‘์œ„ ์‹œ๋“œ๋ฅผ ์„ค์ •ํ•œ๋‹ค.

df_sales_customer๋Š” df_receipt๋ฅผ customer_id๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ๊ณ ๊ฐ์˜ ๊ธˆ์•ก ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋งŒ๋“ค์–ด์ง„๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ฉ๊ณ„๊ฐ€ ํ”Œ๋Ÿฌ์Šค๊ฐ€ ๋˜๋Š” ๊ณ ๊ฐ๋งŒ ๋‚จ๊ธด๋‹ค.

df_tmp๋Š” df_customer์™€ df_sales_customer๋ฅผ customer_id๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๋งŒ๋“ค์–ด์ง„๋‹ค.

split์€ rsample ํŒจํ‚ค์ง€์˜ initial_split() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑํ•˜๋ฉฐ, prop ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 0.8๋กœ ์„ค์ •๋˜์–ด ์žˆ์–ด ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์ด 80%์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ 20%์˜ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„ํ• ๋จ์„ ์˜๋ฏธํ•œ๋‹ค.

df_customer_train๊ณผ df_customer_test๋Š” ๊ฐ๊ฐ ๋ถ„ํ• ์„ ํ†ตํ•ด ์–ป์€ ํ›ˆ๋ จ ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ์ด๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ print() ๋ฌธ์œผ๋กœ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ์„ ํ™•์ธํ•œ๋‹ค.
ย 
์„ค๋ช…:

์ด ์ฝ”๋“œ๋Š” care ํŒจํ‚ค์ง€์˜ createTimeSlices ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐ„ ๊ธฐ๋ฐ˜ ๊ต์ฐจ ๊ฒ€์ฆ ํด๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ์˜ˆ์ด๋‹ค.

์ด ์ฝ”๋“œ์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” group_by ํ•จ์ˆ˜์™€ summarise ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์›”๋ณ„ ๋งค์ถœ ๊ธˆ์•ก์„ ์š”์•ฝํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์™„์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ ๋ณด๋‹ค ์˜๋ฏธ ์žˆ๋Š” ์ปฌ๋Ÿผ ์ด๋ฆ„์œผ๋กœ ์ด๋ฆ„์ด ๋ณ€๊ฒฝ๋œ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ createTimeSlices ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด 3๊ฐœ์˜ ์‹œ๊ฐ„ ๊ธฐ๋ฐ˜ ํด๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜๋Š” ์‹œ์  ๋ฒกํ„ฐ(์ด ๊ฒฝ์šฐ ํŒ๋งค ์›”)๋ฅผ ๋ฐ›์•„ ๊ฐ ํด๋“œ์˜ ํŠธ๋ ˆ์ด๋‹ ์ธ๋ฑ์Šค์™€ ํ…Œ์ŠคํŠธ ์ธ๋ฑ์Šค ์„ธํŠธ๋กœ ๋‚˜๋ˆˆ๋‹ค. initialWindow ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ์ฒซ ๋ฒˆ์งธ ํŠธ๋ ˆ์ด๋‹ ์œˆ๋„์šฐ์˜ ํฌ๊ธฐ๋ฅผ, horizon ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ํ…Œ์ŠคํŠธ ์œˆ๋„์šฐ์˜ ํฌ๊ธฐ๋ฅผ, skip ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๊ฐ ํด๋“œ ์‚ฌ์ด์˜ ๊ฑด๋„ˆ๋›ฐ๋Š” ์‹œ๊ฐ„ ์–‘์„, fixedWindow ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ํ…Œ์ŠคํŠธ ์œˆ๋„์šฐ์˜ ํฌ๊ธฐ๋ฅผ ๊ณ ์ •์‹œํ‚ฌ์ง€ ์—ฌ๋ถ€๋ฅผ ์ง€์ •ํ•œ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ createTimeSlices์—์„œ ์ƒ์„ฑ๋œ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ df_train๊ณผ df_test ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ด ๊ฐ ๋žฉ์—์„œ ์ƒ์„ฑ๋œ๋‹ค. ์ด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ ๋ชจ๋ธ์˜ ์‹œ๊ฐ„ ๊ธฐ๋ฐ˜ ๊ต์ฐจ ๊ฒ€์ฆ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

ย 

์„ค๋ช…:

R์˜ recipes ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์šด์ƒ˜ํ”Œ๋ง์„ ์‹คํ–‰ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

๋จผ์ € ๊ฐ ๊ณ ๊ฐ์˜ ๋งค์ถœ ๊ธˆ์•ก์˜ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ณ ๊ฐ ์ •๋ณด์™€ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  is_buy_flag๋ผ๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ ๋ณ€์ˆ˜๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ํŒ๋งค ๊ธˆ์•ก์˜ ์œ ๋ฌด๋กœ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ–ˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

๋‹ค์Œ์œผ๋กœ recipe() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์‹œํ”ผ ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ step_downsample() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 71๊ฐœ์˜ ๋ฌด์ž‘์œ„ ์‹œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ˆ˜ ํด๋ž˜์Šค(๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ๊ณ ๊ฐ)๋ฅผ ์†Œ์ˆ˜ ํด๋ž˜์Šค(๊ตฌ๋งคํ•œ ๊ณ ๊ฐ)์˜ ๊ด€์ฐฐ ์ˆ˜์— ๋งž๊ฒŒ ๋‹ค์šด์ƒ˜ํ”Œ๋งํ•œ๋‹ค.

prep() ํ•จ์ˆ˜๋Š” ๋ ˆ์‹œํ”ผ ์ค€๋น„์—, juice() ํ•จ์ˆ˜๋Š” ์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์ถ”์ถœ์— ์‚ฌ์šฉ๋œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ is_buy_flag๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ฃนํ™”ํ•˜๊ณ  ๊ฐ ๊ทธ๋ฃน์˜ ๊ด€์ฐฐ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๊ฒฐ๊ณผ์˜ ๋‹ค์šด์ƒ˜ํ”Œ๋ง์„ ๊ฒ€์ฆํ•œ๋‹ค.
ย 
์„ค๋ช…:

์œ„ ์ฝ”๋“œ๋Š” ๊ณ ๊ฐ ์ •๋ณด๋ฅผ ์ €์žฅํ•œ 'df_customer'๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์กฐ์ž‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ๊ฐ ์ฝ”๋“œ ๋ผ์ธ์˜ ์ฒ˜๋ฆฌ ๋‚ด์šฉ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

df_gender_std = unique(df_customer[c("gender_cd", "gender")]) - ์ด ์ฝ”๋“œ์—์„œ๋Š” df_gender_std๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ๋‘ ๊ฐœ์˜ ์—ด, ํŠนํžˆ "gender_cd"์™€ "gender"๋ผ๋Š” ์ด๋ฆ„์˜ ์—ด์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  unique() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๋ณต๋œ ํ–‰์„ ์ œ๊ฑฐํ•˜์—ฌ ์›๋ž˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ํฌํ•จ๋œ ์„ฑ๋ณ„ ์ฝ”๋“œ์™€ ์„ฑ๋ณ„ ๊ฐ’์˜ ๊ณ ์œ ํ•œ ์กฐํ•ฉ์„ ๋ชจ๋‘ ํฌํ•จํ•˜๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

df_customer_std = df_customer[, colnames(df_customer) ! = "gender"] - ์ด ์ฝ”๋“œ ๋ผ์ธ์€ df_customer_std๋ผ๋Š” ๋˜ ๋‹ค๋ฅธ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•œ๋‹ค. df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ "gender" ์—ด์„ ์ œ์™ธํ•œ ๋ชจ๋“  ์—ด์„ ๋ณต์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” colnames() ํ•จ์ˆ˜์™€ ๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž! =๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฆ„์ด "gender"์™€ ๊ฐ™์ง€ ์•Š์€ ๋ชจ๋“  ์—ด์„ ์„ ํƒํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ ์›๋ž˜์˜ df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„๊ณผ ๋™์ผํ•œ ํ–‰์„ ๊ฐ–์ง€๋งŒ "gender" ์—ด์ด ์—†์Šต๋‹ˆ๋‹ค.

์ด ๋‘ ์ค„์˜ ์ฝ”๋“œ๋กœ ์ธํ•ด ์›๋ž˜์˜ df_customer ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ ๊ณ ์œ ํ•œ ์„ฑ๋ณ„ ์ฝ”๋“œ์™€ ๊ฐ’๋งŒ ํฌํ•จํ•˜๋Š” ๋ถ€๋ถ„๊ณผ ์„ฑ๋ณ„ ์—ด์„ ์ œ์™ธํ•œ ๋‹ค๋ฅธ ๋ชจ๋“  ๊ณ ๊ฐ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋‰˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
ย 
์„ค๋ช…:

์œ„ ์ฝ”๋“œ๋Š” 'df_product'์™€ 'df_category'๋ผ๋Š” ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์กฐ์ž‘ํ•˜๊ณ  ์žˆ๋‹ค. ๋‹ค์Œ์€ ๊ฐ ์ฝ”๋“œ ๋ผ์ธ์˜ ์ฒ˜๋ฆฌ ๋‚ด์šฉ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

df_product_full <- inner_join(df_product, df_category[c("category_small_cd", "category_major_name", "category_medium_name", "category_ small_name")], by = "category_small_cd") - ์ด ์ฝ”๋“œ์—์„œ๋Š” df_product_full์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋งŒ๋“ค๊ณ  ์žˆ๋‹ค. ์ด๋Š” inner_join() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ "category_small_cd"๋ผ๋Š” ๊ณตํ†ต ์—ด์„ ๊ธฐ์ค€์œผ๋กœ df_product์™€ df_category ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๊ฒฐํ•ฉํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ "category_small_cd"์˜ ๊ฐ’์ด ์ผ์น˜ํ•˜๋Š” ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋ชจ๋“  ์ปฌ๋Ÿผ์„ ํฌํ•จํ•˜๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ด ์ƒ์„ฑ๋˜๋ฉฐ, df_category ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ "category_small_cd", "category_major_name", "category_major_name", "category_medium_name", "category_medium_name", "category_medium_name". "category_medium_name", "category_small_name" ์—ด๋งŒ ํฌํ•จํ•˜๋„๋ก ์„œ๋ธŒ์…‹ํ™”๋˜์–ด ์žˆ๋‹ค.

head(df_product_full, n = 3) - ์ด ์ฝ”๋“œ ํ–‰์€ df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋‚ด์šฉ์„ ํ™•์ธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, head() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์ฒ˜์Œ ์„ธ ์ค„์„ ํ‘œ์‹œํ•œ๋‹ค. ์ด๋Š” ๊ฒฐํ•ฉ ์ž‘์—…์ด ์„ฑ๊ณต์ ์œผ๋กœ ์ˆ˜ํ–‰๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ์ถœ๋ ฅ ํ˜•์‹์„ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋น ๋ฅด๊ฒŒ ๊ฒ€์‚ฌํ•˜๋Š” ๋ฐ ์œ ์šฉํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.
ย 
ํ•ด์„ค:

์œ„ ์ฝ”๋“œ๋Š” df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ UTF-8 ์ธ์ฝ”๋”ฉ์œผ๋กœ CSV ํŒŒ์ผ๋กœ ์ž‘์„ฑํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, write.csv() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ˆ˜์˜ ์—ญํ• ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

df_product_full - CSV ํŒŒ์ผ๋กœ ์ถœ๋ ฅํ•  ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ž…๋‹ˆ๋‹ค.

"... /data/R_df_product_full_UTF-8_header.csv" - ์ถœ๋ ฅ CSV ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ด๋‹ค. ๊ฒฝ๋กœ์˜ ๋งจ ์•ž์— ์žˆ๋Š” . ๋Š” ํŒŒ์ผ์ด ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์˜ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅ๋จ์„ ์˜๋ฏธํ•˜๋ฉฐ, R_df_product_full_UTF-8_header.csv๋Š” ์ถœ๋ ฅ ํŒŒ์ผ์˜ ์ด๋ฆ„์ด๋‹ค.

row.names=FALSE - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ CSV ํŒŒ์ผ์— ํ–‰ ์ด๋ฆ„์„ ํฌํ•จํ•˜์ง€ ์•Š๋„๋ก ์ง€์ •ํ•œ๋‹ค.

fileEncoding = "UTF-8" - ์ถœ๋ ฅ CSV ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. UTF-8์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด์˜ ๋ฌธ์ž๋ฅผ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ง€์›ํ•˜๋Š” ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์ด๋ฏ€๋กœ ์˜์–ด ์ด์™ธ์˜ ๋ฌธ์ž๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด write.csv() ํ•จ์ˆ˜๋Š” df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ UTF-8 ์ธ์ฝ”๋”ฉ์œผ๋กœ CSV ํŒŒ์ผ์— ์“ฐ๊ณ  ์ง€์ •ํ•œ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์œผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์™„์„ฑ๋œ CSV ํŒŒ์ผ์—๋Š” ํ–‰ ์ด๋ฆ„์ด ํฌํ•จ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
ย 
ํ•ด์„ค:

์œ„ ์ฝ”๋“œ๋Š” df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ CP932 ์ธ์ฝ”๋”ฉ์œผ๋กœ CSV ํŒŒ์ผ๋กœ ์ž‘์„ฑํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, write.csv() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ˆ˜๊ฐ€ ์ˆ˜ํ–‰ํ•˜๋Š” ์ž‘์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

df_product_full - CSV ํŒŒ์ผ๋กœ ์ถœ๋ ฅํ•  ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ž…๋‹ˆ๋‹ค.

"... /data/R_df_product_full_CP932_header.csv" - ์ถœ๋ ฅ CSV ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค. ๊ฒฝ๋กœ์˜ ๋งจ ์•ž์— ์žˆ๋Š” . ์€ ํŒŒ์ผ์ด ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์˜ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅ๋จ์„ ์˜๋ฏธํ•˜๋ฉฐ, R_df_product_full_CP932_header.csv๋Š” ์ถœ๋ ฅ ํŒŒ์ผ์˜ ์ด๋ฆ„์ด๋‹ค.

row.names=FALSE - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ CSV ํŒŒ์ผ์— ํ–‰ ์ด๋ฆ„์„ ํฌํ•จํ•˜์ง€ ์•Š๋„๋ก ์ง€์ •ํ•œ๋‹ค.

fileEncoding = "CP932" - ์ถœ๋ ฅ CSV ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. CP932๋Š” ์ผ๋ณธ์–ด ํ…์ŠคํŠธ์— ์‚ฌ์šฉ๋˜๋Š” ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์œผ๋กœ ์ผ๋ณธ์–ด์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋‹ค์–‘ํ•œ ๋ฌธ์ž๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด write.csv() ํ•จ์ˆ˜๋Š” df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ CP932 ์ธ์ฝ”๋”ฉ์œผ๋กœ CSV ํŒŒ์ผ์— ์“ฐ๊ณ  ์ง€์ •๋œ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์œผ๋กœ ์ €์žฅํ•œ๋‹ค. ์™„์„ฑ๋œ CSV ํŒŒ์ผ์—๋Š” ํ–‰ ์ด๋ฆ„์ด ํฌํ•จ๋˜์ง€ ์•Š๋Š”๋‹ค. ์ด ์ธ์ฝ”๋”ฉ์€ ์ผ๋ณธ์–ด ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ์œ ์šฉํ•˜๋‹ค.

ย 

์„ค๋ช…:

์œ„ ์ฝ”๋“œ๋Š” df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ UTF-8 ์ธ์ฝ”๋”ฉ์œผ๋กœ ํ…์ŠคํŠธ ํŒŒ์ผ๋กœ ์ž‘์„ฑํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, write.table() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ˆ˜๊ฐ€ ์ˆ˜ํ–‰ํ•˜๋Š” ์ž‘์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

df_product_full - ํ…์ŠคํŠธ ํŒŒ์ผ์— ์“ธ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ž…๋‹ˆ๋‹ค.

"... /data/R_df_product_full_UTF-8_noh.csv" - ์ถœ๋ ฅ ํ…์ŠคํŠธ ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ด๋‹ค. ๊ฒฝ๋กœ์˜ ๋งจ ์•ž์— ์žˆ๋Š” . ๋Š” ํŒŒ์ผ์ด ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์˜ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅ๋จ์„ ์˜๋ฏธํ•˜๋ฉฐ, R_df_product_full_UTF-8_no.csv๋Š” ์ถœ๋ ฅ ํŒŒ์ผ์˜ ์ด๋ฆ„์ด๋‹ค.

row.names = FALSE - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ ํ…์ŠคํŠธ ํŒŒ์ผ์— ํ–‰ ์ด๋ฆ„์„ ํฌํ•จํ•˜์ง€ ์•Š๋„๋ก ์ง€์ •ํ•œ๋‹ค.

col.names = FALSE - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ ํ…์ŠคํŠธ ํŒŒ์ผ์— ์—ด ์ด๋ฆ„์„ ํฌํ•จํ•˜์ง€ ์•Š๋„๋ก ์ง€์ •ํ•œ๋‹ค.

sep = "," - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ ํ…์ŠคํŠธ ํŒŒ์ผ์˜ ์—ด ์‚ฌ์ด์— ์‚ฌ์šฉํ•  ๊ตฌ๋ถ„์ž๋ฅผ ์ง€์ •ํ•œ๋‹ค. ์ด ๊ฒฝ์šฐ ์‰ผํ‘œ๊ฐ€ ๊ตฌ๋ถ„ ๊ธฐํ˜ธ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

fileEncoding = "UTF-8" - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ ํ…์ŠคํŠธ ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•˜๋Š”๋ฐ, UTF-8์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด์˜ ๋ฌธ์ž๋ฅผ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ง€์›ํ•˜๋Š” ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์ด๋ฏ€๋กœ ์˜์–ด ์ด์™ธ์˜ ๋ฌธ์ž๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ์ ํ•ฉํ•˜๋‹ค.

์š”์•ฝํ•˜๋ฉด write.table() ํ•จ์ˆ˜๋Š” df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ UTF-8 ์ธ์ฝ”๋”ฉ์œผ๋กœ ํ…์ŠคํŠธ ํŒŒ์ผ์— ์“ฐ๊ณ  ์ง€์ •ํ•œ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์— ์ €์žฅํ•œ๋‹ค. ์™„์„ฑ๋œ ํ…์ŠคํŠธ ํŒŒ์ผ์€ ํ–‰๋ช…์ด๋‚˜ ์—ด๋ช…์„ ํฌํ•จํ•˜์ง€ ์•Š๊ณ  ์‰ผํ‘œ๋ฅผ ๊ตฌ๋ถ„ ๊ธฐํ˜ธ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ย 

์„ค๋ช…:

์œ„์˜ ์ฝ”๋“œ๋Š” R_df_product_full_UTF-8_header.csv๋ผ๋Š” CSV ํŒŒ์ผ์„ df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ฝ์–ด๋“ค์ด๊ณ  ์žˆ์œผ๋ฉฐ, read.csv() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ˆ˜๊ฐ€ ์ˆ˜ํ–‰ํ•˜๋Š” ์ž‘์—…์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

"... /data/R_df_product_full_UTF-8_header.csv" - ์ž…๋ ฅ CSV ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ด๋‹ค. ๊ฒฝ๋กœ ์•ž์˜ . ๋Š” ํŒŒ์ผ์ด ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์˜ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

colClasses = c_class - ์ด ์ธ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์˜ ํด๋ž˜์Šค๋ฅผ ์ง€์ •ํ•˜๋Š” ๋ฒกํ„ฐ์ด๋‹ค. ์ด ๊ฒฝ์šฐ 1์—ด๊ณผ 5~9์—ด์€ NA๋กœ ์„ค์ •๋˜์–ด ํด๋ž˜์Šค๊ฐ€ ์ž๋™์œผ๋กœ ๊ฒฐ์ •๋จ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, 2์—ด๋ถ€ํ„ฐ 4์—ด๊นŒ์ง€๋Š” character๋กœ ์„ค์ •๋˜์–ด ๋ฌธ์ž์—ด๋กœ ์ฒ˜๋ฆฌ๋จ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

fileEncoding = "UTF-8" - ์ด ์ธ์ˆ˜๋Š” ์ž…๋ ฅ CSV ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•˜๋Š”๋ฐ, UTF-8์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด์˜ ๋ฌธ์ž๋ฅผ ํญ๋„“๊ฒŒ ์ง€์›ํ•˜๋Š” ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์ด๋‹ค.

head(df_product_full, 3) - ์ด ๋ช…๋ น์€ df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์ฒ˜์Œ ์„ธ ์ค„์„ ์ฝ˜์†”์— ํ‘œ์‹œํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, read.csv() ํ•จ์ˆ˜๋Š” R_df_product_full_UTF-8_header.csv ํŒŒ์ผ์„ ์ง€์ •ํ•œ ์—ด ํด๋ž˜์Šค์™€ ํŒŒ์ผ ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ฝ๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์™„์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ head() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ˜์†”์— ์ถœ๋ ฅํ•œ๋‹ค.
ย 
์„ค๋ช…:

์œ„ ์ฝ”๋“œ๋Š” R_df_product_full_UTF-8_no.csv๋ผ๋Š” CSV ํŒŒ์ผ์„ df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ฝ์–ด๋“ค์—ฌ ์ƒˆ๋กœ์šด ์—ด ์ด๋ฆ„์„ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ํ• ๋‹นํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, read.csv() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ˆ˜๊ฐ€ ํ•˜๋Š” ์ผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

"... /data/R_df_product_full_UTF-8_noh.csv" - ์ž…๋ ฅ CSV ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ด๋‹ค. ๊ฒฝ๋กœ ์•ž์˜ . ๋Š” ํŒŒ์ผ์ด ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์˜ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

colClasses = c_class - ์ด ์ธ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์˜ ํด๋ž˜์Šค๋ฅผ ์ง€์ •ํ•˜๋Š” ๋ฒกํ„ฐ์ด๋‹ค. ์ด ๊ฒฝ์šฐ 1์—ด๊ณผ 5~9์—ด์€ NA๋กœ ์„ค์ •๋˜์–ด ํด๋ž˜์Šค๊ฐ€ ์ž๋™์œผ๋กœ ๊ฒฐ์ •๋จ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, 2~4์—ด์€ character๋กœ ์„ค์ •๋˜์–ด ๋ฌธ์ž์—ด๋กœ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•จ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

header = FALSE - ์ด ์ธ์ˆ˜๋Š” ์ž…๋ ฅ CSV ํŒŒ์ผ์— ํ—ค๋” ํ–‰์ด ์—†์Œ์„ ์ง€์ •ํ•œ๋‹ค.

fileEncoding = "UTF-8" - ์ด ์ธ์ˆ˜๋Š” ์ž…๋ ฅ CSV ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•˜๋Š”๋ฐ, UTF-8์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด์˜ ๋ฌธ์ž๋ฅผ ํญ๋„“๊ฒŒ ์ง€์›ํ•˜๋Š” ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์ด๋‹ค.

colnames(df_product_full) <- c("product_cd", "category_major_cd", "category_medium_cd", "category_small_cd", "unit_price", "unit_cost", " category_major_name", "category_medium_name", "category_small_name") - ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ ์ด๋ฆ„์„ ๋ถ€์—ฌํ•˜๋Š” ๋ช…๋ น์–ด์ด๋‹ค. ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ ์ด๋ฆ„์€ "product_cd", "category_major_cd", "category_medium_cd", "category_small_cd", "unit_price", "unit_cost", "category_major_name", " category_medium_name", "category_small_name", "category_small_name"์˜ ์ˆœ์„œ๋กœ ์ž…๋ ฅํ•œ๋‹ค.

head(df_product_full, 3) - ์ด ๋ช…๋ น์€ df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์ฒ˜์Œ ์„ธ ์ค„์„ ์ฝ˜์†”์— ํ‘œ์‹œํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, read.csv() ํ•จ์ˆ˜๋Š” R_df_product_full_UTF-8_noh.csv ํŒŒ์ผ์„ ์ง€์ •๋œ ์—ด ํด๋ž˜์Šค, ํ—ค๋” ์„ค์ •, ํŒŒ์ผ ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ฝ๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์™„์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ ์ด๋ฆ„์„ ๋ถ€์—ฌํ•˜๊ณ  head() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ˜์†”์— ์ถœ๋ ฅํ•œ๋‹ค.
ย 
ย 
์„ค๋ช…:

์œ„ ์ฝ”๋“œ๋Š” R_df_product_full_UTF-8_no.csv๋ผ๋Š” CSV ํŒŒ์ผ์„ df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ฝ์–ด๋“ค์—ฌ ์ƒˆ๋กœ์šด ์—ด ์ด๋ฆ„์„ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ํ• ๋‹นํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, read.csv() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ˆ˜๊ฐ€ ํ•˜๋Š” ์ผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

"... /data/R_df_product_full_UTF-8_noh.csv" - ์ž…๋ ฅ CSV ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ด๋‹ค. ๊ฒฝ๋กœ ์•ž์˜ . ๋Š” ํŒŒ์ผ์ด ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์˜ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

col.names = c_names - ์ด ์ธ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ์‚ฌ์šฉํ•  ์—ด ์ด๋ฆ„์„ ์ง€์ •ํ•œ๋‹ค. c_names๋Š” ์ž…๋ ฅ ํŒŒ์ผ์— ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ˆœ์„œ๋กœ ์ƒˆ ์—ด ์ด๋ฆ„์„ ๋‚˜์—ดํ•˜๋Š” ๋ฌธ์ž ๋ฒกํ„ฐ์ด๋‹ค.

colClasses = c_class - ์ด ์ธ์ˆ˜๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์˜ ํด๋ž˜์Šค๋ฅผ ์ง€์ •ํ•œ๋‹ค. c_class๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ฐ ์—ด์˜ ํด๋ž˜์Šค๋ฅผ ์ง€์ •ํ•˜๋Š” ๋ฒกํ„ฐ์ด๋‹ค. ์ด ๊ฒฝ์šฐ ์ฒซ ๋ฒˆ์งธ ์—ด์€ NA๋กœ ์„ค์ •๋˜์–ด ํ•ด๋‹น ํด๋ž˜์Šค๊ฐ€ ์ž๋™์œผ๋กœ ๊ฒฐ์ •๋จ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๋‘ ๋ฒˆ์งธ๋ถ€ํ„ฐ ๋„ค ๋ฒˆ์งธ ์—ด์€ character๋กœ ์„ค์ •๋˜์–ด ๋ฌธ์ž์—ด๋กœ ์ฒ˜๋ฆฌ๋จ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๋‹ค์„ฏ ๋ฒˆ์งธ์™€ ์—ฌ์„ฏ ๋ฒˆ์งธ ์—ด์€ NA๋กœ ์„ค์ •๋˜์–ด ํ•ด๋‹น ํด๋ž˜์Šค๊ฐ€ ์ž๋™์œผ๋กœ ๊ฒฐ์ •๋˜์–ด์•ผ ํ•จ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋งˆ์ง€๋ง‰ 3์—ด์€ NA๋กœ ์„ค์ •๋˜์–ด ๊ฒฐ์†๊ฐ’์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•จ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

header = FALSE - ์ด ์ธ์ˆ˜๋Š” ์ž…๋ ฅ CSV ํŒŒ์ผ์— ํ—ค๋” ํ–‰์ด ์—†์Œ์„ ์ง€์ •ํ•œ๋‹ค.

fileEncoding = "UTF-8" - ์ด ์ธ์ˆ˜๋Š” ์ž…๋ ฅ CSV ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•˜๋Š”๋ฐ, UTF-8์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด์˜ ๋ฌธ์ž๋ฅผ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ง€์›ํ•˜๋Š” ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์ด๋‹ค.

head(df_product_full, 3) - ์ด ๋ช…๋ น์€ df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์ฒ˜์Œ ์„ธ ์ค„์„ ์ฝ˜์†”์— ํ‘œ์‹œํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, read.csv() ํ•จ์ˆ˜๋Š” R_df_product_full_UTF-8_no.csv ํŒŒ์ผ์„ ์ง€์ •ํ•œ ์—ด ์ด๋ฆ„, ์—ด ํด๋ž˜์Šค, ํ—ค๋” ์„ค์ •, ํŒŒ์ผ ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ df_product_full์ด๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ์ฝ๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์™„์„ฑ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ head() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ˜์†”๋กœ ์ถœ๋ ฅํ•œ๋‹ค.
ย 
์„ค๋ช…:

์œ„ ์ฝ”๋“œ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ df_product_full์˜ ๋‚ด์šฉ์„ R_df_product_full_UTF-8_header.tsv๋ผ๋Š” ํƒญ์œผ๋กœ ๊ตฌ๋ถ„๋œ ๊ฐ’(TSV) ํŒŒ์ผ๋กœ ์ž‘์„ฑํ•˜๊ณ  ์žˆ๋‹ค. ๋‹ค์Œ์€ write.table() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ˆ˜๊ฐ€ ๋ฌด์—‡์„ ํ•˜๋Š”์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

df_product_full - TSV ํŒŒ์ผ์— ์“ธ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์ž…๋‹ˆ๋‹ค.

"... /data/R_df_product_full_UTF-8_header.tsv" - ์ถœ๋ ฅ TSV ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค. ๊ฒฝ๋กœ์˜ ๋งจ ์•ž์— ์žˆ๋Š” ... ๋Š” ํŒŒ์ผ์ด ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์˜ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— ์œ„์น˜ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

row.names = FALSE - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ ํŒŒ์ผ์— ํ–‰ ์ด๋ฆ„์„ ํฌํ•จํ•˜์ง€ ์•Š๋„๋ก ์ง€์ •ํ•œ๋‹ค.

sep = "\t" - ์ด ์ธ์ˆ˜๋Š” ์—ด ์‚ฌ์ด์— ์‚ฌ์šฉํ•  ๊ตฌ๋ถ„ ๊ธฐํ˜ธ๋ฅผ ์ง€์ •ํ•œ๋‹ค. ์ด ๊ฒฝ์šฐ ๊ตฌ๋ถ„ ๊ธฐํ˜ธ๋Š” ํƒญ ๋ฌธ์ž์ด๋‹ค.

fileEncoding = "UTF-8" - ์ด ์ธ์ˆ˜๋Š” ์ถœ๋ ฅ TSV ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. UTF-8์€ ๋‹ค์–‘ํ•œ ์–ธ์–ด์˜ ๋‹ค์–‘ํ•œ ๋ฌธ์ž๋ฅผ ์ง€์›ํ•˜๋Š” ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์ž…๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, write.table() ํ•จ์ˆ˜๋Š” df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋‚ด์šฉ์„ ํƒญ์„ ๊ตฌ๋ถ„ ๊ธฐํ˜ธ๋กœ, UTF-8์„ ํŒŒ์ผ ์ธ์ฝ”๋”ฉ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ R_df_product_full_UTF-8_header.tsv๋ผ๋Š” TSV ํŒŒ์ผ๋กœ ์ž‘์„ฑํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ฉ๋‹ˆ๋‹ค.
ย 
์„ค๋ช…:

์œ„ ์ฝ”๋“œ๋Š” R_df_product_full_UTF-8_header.tsv๋ผ๋Š” ํƒญ์œผ๋กœ ๊ตฌ๋ถ„๋œ ๊ฐ’(TSV) ํŒŒ์ผ์„ ์ฝ๊ณ  ๊ทธ ๋‚ด์šฉ์„ df_product_tmp๋ผ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ €์žฅํ•˜๋Š” ์ฝ”๋“œ์ด๋ฉฐ, read.table() ํ•จ์ˆ˜์˜ ๊ฐ ์ธ์ž์˜ ์—ญํ• ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

c_class - TSV ํŒŒ์ผ ๋‚ด ๊ฐ ์—ด์˜ ํด๋ž˜์Šค๋ฅผ ์ง€์ •ํ•˜๋Š” ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ์ฒซ ๋ฒˆ์งธ ์—ด์€ NA๋กœ, ๋‹ค๋ฅธ ์—ด์€ "character"๋ผ๋Š” ํด๋ž˜์Šค๋กœ ์ง€์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

"... /data/R_df_product_full_UTF-8_header.tsv" - ์ž…๋ ฅ TSV ํŒŒ์ผ์˜ ํŒŒ์ผ ๊ฒฝ๋กœ์™€ ํŒŒ์ผ ์ด๋ฆ„์ด๋‹ค.

colClasses = c_class - ์ž…๋ ฅ TSV ํŒŒ์ผ์˜ ์—ด ํด๋ž˜์Šค๋ฅผ ์ง€์ •ํ•œ๋‹ค.

header = TRUE - ์ด ์ธ์ˆ˜๋Š” ์ž…๋ ฅ ํŒŒ์ผ์˜ ์ฒซ ๋ฒˆ์งธ ํ–‰์— ์—ด ์ด๋ฆ„์„ ํฌํ•จํ•˜๋„๋ก ์ง€์ •ํ•œ๋‹ค.

fileEncoding = "UTF-8" - ์ž…๋ ฅ TSV ํŒŒ์ผ์˜ ๋ฌธ์ž ์ธ์ฝ”๋”ฉ์„ ์ง€์ •ํ•œ๋‹ค.

TSV ํŒŒ์ผ์„ ์ฝ์€ ํ›„ head() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ df_product_tmp ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์ฒ˜์Œ 3์ค„์„ ํ‘œ์‹œํ•œ๋‹ค.

์ „์ฒด์ ์œผ๋กœ ์ด ์ฝ”๋“œ๋Š” CSV ํŒŒ์ผ์„ ์ฝ๊ณ  ๊ทธ ๋‚ด์šฉ์„ df_product_full ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ €์žฅํ•˜๊ธฐ ์ „์˜ ์ฝ”๋“œ์™€ ์œ ์‚ฌํ•˜๋‹ค. ํ•˜์ง€๋งŒ ์ด ์ฝ”๋“œ์—์„œ๋Š” TSV ํŒŒ์ผ์„ ์ฝ๊ณ  ๊ทธ ๋‚ด์šฉ์„ df_product_tmp๋ผ๋Š” ์ž„์‹œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ž„์‹œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ์‚ฌ์šฉํ•˜๋Š” ๋ชฉ์ ์€ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๋‹ค๋ฅธ ์ด๋ฆ„์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ์ €์žฅํ•˜๊ธฐ ์ „์— ๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ฆฌํ•˜๊ฑฐ๋‚˜ ์กฐ์ž‘ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ผ ์ˆ˜ ์žˆ๋‹ค.

ย 

Comment