๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค 100๋ฒˆ์˜ ๋…ธํฌ(๊ตฌ์กฐํ™” ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌํŽธ) โ€“ Python Part 2 (Q21 to Q40)

๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค

๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค 100๋ฒˆ์˜ ๋…ธํฌ(๊ตฌ์กฐํ™” ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌํŽธ) โ€“ Python Part 2 (Q21 to Q40)์˜ ํ•ด์„ค์ž…๋‹ˆ๋‹ค.

ย 

ํ•ด์„ค:

์ฝ”๋“œ len(df_receipt)์€ DataFrame df_receipt์˜ ๊ธธ์ด๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์•„๋ž˜์—์„œ๋Š” ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•œ๋‹ค.

lens(): : ํŒŒ์ด์ฌ์˜ ๋‚ด์žฅ ํ•จ์ˆ˜๋กœ, ๋ฆฌ์ŠคํŠธ๋‚˜ ๋ฌธ์ž์—ด๊ณผ ๊ฐ™์€ ๊ฐ์ฒด์˜ ๊ธธ์ด๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

df_receipt : df_receipt : pandas์—์„œ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์ž„์˜์˜ ์œ ํšจํ•œ ๋ณ€์ˆ˜๋ช…์ผ ์ˆ˜ ์žˆ๋‹ค.

len()๊ณผ df_receipt์˜ ์กฐํ•ฉ์œผ๋กœ, len() ํ•จ์ˆ˜์˜ ์ธ์ˆ˜๋กœ DataFrame์˜ df_receipt๋ฅผ ์ „๋‹ฌํ•˜์—ฌ df_receipt์˜ ํ–‰ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋„๋ก ํŒŒ์ด์ฌ์— ์š”์ฒญํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” ๋‹จ์ˆœํžˆ DataFrame์˜ ํ–‰ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ์— ๋ถˆ๊ณผํ•˜๋‹ค.
ย 
ํ—ค์„ค: 

len(df_receipt['customer_id'.unique())

len(df_receipt['customer_id'].unique())๋Š” df_receipt DataFrame์˜ customer_id ์—ด์— ํฌํ•จ๋œ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID์˜ ๊ฐœ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

์•„๋ž˜์—์„œ๋Š” ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

len() : ํŒŒ์ด์ฌ์˜ ๋‚ด์žฅ ํ•จ์ˆ˜๋กœ ๋ชฉ๋ก์ด๋‚˜ ๋ฌธ์ž์—ด ๋“ฑ์˜ ๊ฐ์ฒด์˜ ๊ธธ์ด๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

df_receipt : df_receipt : pandas์—์„œ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์ž„์˜์˜ ์œ ํšจํ•œ ๋ณ€์ˆ˜๋ช…์ผ ์ˆ˜ ์žˆ๋‹ค.

['customer_id']: df_receipt์—์„œ customer_id๋ผ๋Š” ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ง„ ์ปฌ๋Ÿผ์„ ์„ ํƒํ•˜๋Š” DataFrame ์ธ๋ฑ์‹ฑ ์ž‘์—…์ด๋‹ค. ๊ฒฐ๊ณผ ๊ฐ์ฒด๋Š” pandas Series์ž…๋‹ˆ๋‹ค.

.unique(): pandas์˜ Series ๋ฉ”์„œ๋“œ๋กœ Series ๋‚ด์˜ ๊ณ ์œ ํ•œ ๊ฐ’์„ numpy ๋ฐฐ์—ด๋กœ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ์ด ๊ฒฝ์šฐ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID์˜ numpy ๋ฐฐ์—ด์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

len(), df_receipt['customer_id'].unique(),()๋ฅผ ์กฐํ•ฉํ•˜๊ณ  ์žˆ๋Š”๋ฐ, len() ํ•จ์ˆ˜์˜ ์ธ์ˆ˜๋กœ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID์˜ numpy ๋ฐฐ์—ด์„ ์ „๋‹ฌํ•˜์—ฌ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID์˜ ๊ฐœ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋„๋ก ํŒŒ์ด์ฌ์— ์š”๊ตฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” DataFrame ๋‚ด์˜ ๊ณ ์œ ํ•œ ๊ณ ๊ฐ ID์˜ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.
ย 
ํ•ด์„ค:

df_receipt.groupby('store_cd').agg({'amount':'sum', 'quantity':'sum'}).reset_index()๋ผ๋Š” ์ฝ”๋“œ๋Š” df_receipt DataFrame์˜ ํ–‰์„ store_cd ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ๊ทธ๋ฃน ๋‚ด ํ•ด๋‹น ์—ด์— ๋‘ ๊ฐœ์˜ ์ง‘๊ณ„ ํ•จ์ˆ˜(๊ธˆ์•ก๊ณผ ์ˆ˜๋Ÿ‰์— ๋Œ€ํ•œ sum() )๋ฅผ ๊ฑธ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•œ ์„ค๋ช…์ด๋‹ค.

df_receipt: df_receipt: pandas์˜ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์ด๋ฉด ๋ฌด์—‡์ด๋“  ์ƒ๊ด€์—†๋‹ค.

.groupby('store_cd'): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

.agg({'amount':'sum', 'quantity':'sum'}): pandas์˜ GroupBy ๊ฐ์ฒด์˜ ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์˜ amount ์—ด๊ณผ quantity ์—ด์— sum() ์ง‘๊ณ„ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๊ฐ ๊ทธ๋ฃน์˜ ํ•ด๋‹น ์—ด์˜ ํ•ฉ๊ณ„๋ฅผ ํฌํ•จํ•˜๋Š” amount์™€ quantity ๋‘ ๊ฐœ์˜ ์—ด์„ ๊ฐ€์ง„ DataFrame์ด ๋ฉ๋‹ˆ๋‹ค.

.reset_index(): ์ด๊ฒƒ์€ pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ๋กœ, ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ DataFrame์˜ ํ–‰์„ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ๋‘ ์—ด์˜ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ์ด๋“ค ์—ด์˜ ํ•ฉ๊ณ„์™€ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋ฅผ ๊ฐ€์ง„ DataFrame์„ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
ย 
ํ•ด์„ค:

์ด ์ฝ”๋“œ์—์„œ๋Š” df_receipt ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ๋Œ€ํ•ด store_cd ์ปฌ๋Ÿผ์„ ์‚ฌ์šฉํ•˜์—ฌ groupby ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  amount์™€ quantity ์ปฌ๋Ÿผ์„ ํ•ฉ์‚ฐํ•˜์—ฌ ์ง‘๊ณ„ํ•œ๋‹ค.

๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์€ store_cd ์ปฌ๋Ÿผ์˜ ๊ณ ์œ ํ•œ ๊ฐ’๋งˆ๋‹ค ํ•˜๋‚˜์˜ ํ–‰๊ณผ amount์™€ quantity ๋‘ ๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง€๋ฉฐ, ๋‘ ์ปฌ๋Ÿผ ๋ชจ๋‘ ๋™์ผํ•œ store_cd ๊ฐ’์„ ๊ฐ€์ง„ ๋ชจ๋“  ํ–‰์˜ ๊ฐ ๊ฐ’์˜ ํ•ฉ๊ณ„๋ฅผ ํฌํ•จํ•œ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ .reset_index() ๋ฉ”์„œ๋“œ๊ฐ€ ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ํ˜ธ์ถœ๋˜์–ด store_cd ์ปฌ๋Ÿผ์„ ์ธ๋ฑ์Šค์—์„œ ์ผ๋ฐ˜ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.
ย 
ํ•ด์„ค:

์ด ์ฝ”๋“œ์—์„œ๋Š” df_receipt๋ผ๋Š” DataFrame์— ๋Œ€ํ•ด customer_id ์ปฌ๋Ÿผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ groupby ์—ฐ์‚ฐ์„ ์‹คํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ฐ ๊ทธ๋ฃน์˜ sales_ymd ์ปฌ๋Ÿผ์˜ ๊ฐ’์„ max ํ•จ์ˆ˜๋กœ ์ง‘๊ณ„ํ•˜์—ฌ ๊ฐ ๊ณ ๊ฐ์˜ ์ตœ์‹  ๊ตฌ๋งค์ผ์„ ๊ตฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ง‘๊ณ„ ๊ฒฐ๊ณผ customer_id์™€ sales_ymd ๋‘ ๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„ ์ƒˆ๋กœ์šด DataFrame์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ reset_index ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ ์žฌ์„ค์ •ํ•˜๊ณ  head(10) ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

์ฆ‰, ์ด ์ฝ”๋“œ์˜ ์ถœ๋ ฅ์€ ๊ฐ customer_id์˜ ์ตœ๋Œ€ ๊ตฌ๋งค์ผ(sales_ymd)์„ ๋‚˜ํƒ€๋‚ด๋Š” DataFrame์ด๋ฉฐ, ์ฒ˜์Œ 10์ค„๋งŒ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

ย 

ํ•ด์„ค:

df_receipt.groupby('customer_id').agg({'sales_ymd': 'max'}).reset_index().head(10)๋Š” df_receipt DataFrame ์˜ ํ–‰์„ customer_id ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ๊ทธ๋ฃน ์˜ sales_ymd ์—ด์— max() ์ง‘์•ฝ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜์—ฌ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ์ปดํฌ๋„ŒํŠธ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt: df_receipt: pandas์˜ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์ž„์˜์˜ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์ผ ์ˆ˜ ์žˆ๋‹ค.

.groupby('customer_id'): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ customer_id ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

.agg({'sales_ymd': 'max'}): pandas์˜ GroupBy ๊ฐ์ฒด์˜ ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์˜ sales_ymd ์ปฌ๋Ÿผ์— max() ์ง‘๊ณ„ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๊ฐ ๊ทธ๋ฃน์˜ sales_ymd ์ปฌ๋Ÿผ์˜ ์ตœ๋Œ€๊ฐ’์„ ํฌํ•จํ•˜๋Š” sales_ymd ์ปฌ๋Ÿผ์„ ํ•˜๋‚˜ ๊ฐ€์ง„ DataFrame์ด ๋œ๋‹ค.

.reset_index() : pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•œ๋‹ค.

.head(10) : pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ์ฒซ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” DataFrame์˜ ํ–‰์„ ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์—ด์˜ ์ตœ๋Œ€๊ฐ’์„ ๊ตฌํ•˜๊ณ , ์ตœ๋Œ€๊ฐ’๊ณผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๊ฐ€ ์žˆ๋Š” DataFrame์„ ๋ฐ˜ํ™˜ํ•˜๊ณ , ํ•ด๋‹น DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ๊ฐ ๊ณ ๊ฐ์˜ ์ตœ๊ทผ ๊ตฌ๋งค ๋‚ ์งœ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
ย 
ํ•ด์„ค:

์ฝ”๋“œ df_tmp = df_receipt.groupby('customer_id').agg({'sales_ymd':['max','min']}).reset_index() df_tmp.query('sales_ymd_max ! = sales_ymd_min'). head(10)์€ df_receipt DataFrame์˜ ํ–‰์„ customer_id ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์˜ sales_ymd ์—ด์— ๋‘ ๊ฐœ์˜ ์ง‘๊ณ„ ํ•จ์ˆ˜(max(), min())๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ฐ ๊ณ ๊ฐ์˜ ๊ตฌ๋งค ๋‚ ์งœ์˜ ์ตœ๋Œ€๊ฐ’๊ณผ ์ตœ์†Œ๊ฐ’์„ ํฌํ•จํ•œ DataFrame์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  query() ๋ฉ”์„œ๋“œ์—์„œ ์ตœ๋Œ€ ๊ตฌ๋งค์ผ๊ณผ ์ตœ์†Œ ๊ตฌ๋งค์ผ์ด ๋‹ค๋ฅธ ํ–‰๋งŒ ์„ ํƒํ•˜์—ฌ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ์ปดํฌ๋„ŒํŠธ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt: df_receipt: pandas์—์„œ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์ž„์˜์˜ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์œผ๋กœ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

.groupby('customer_id'): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ customer_id ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

.agg({'sales_ymd':['max','min']}): pandas์˜ GroupBy ๊ฐ์ฒด์˜ ๋ฉ”์†Œ๋“œ๋กœ, GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์˜ sales_ymd ์ปฌ๋Ÿผ์— ๋‘ ๊ฐœ์˜ ์ง‘๊ณ„ ํ•จ์ˆ˜(max()์™€ min())๋ฅผ ์ ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๊ฐ ๊ทธ๋ฃน์˜ sales_ymd ์ปฌ๋Ÿผ์˜ ์ตœ๋Œ€๊ฐ’๊ณผ ์ตœ์†Œ๊ฐ’์„ ๊ฐ๊ฐ ํฌํ•จํ•˜๋Š” sales_ymd_max์™€ sales_ymd_min์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„ DataFrame์ด ๋ฉ๋‹ˆ๋‹ค.

.reset_index() : ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•˜๋Š” pandas DataFrame ๋ฉ”์†Œ๋“œ์ž…๋‹ˆ๋‹ค.

.query('sales_ymd_max ! = sales_ymd_min') : sales_ymd_max ์—ด์˜ ๊ฐ’์ด sales_ymd_min ์—ด์˜ ๊ฐ’๊ณผ ๊ฐ™์ง€ ์•Š์€ DataFrame์˜ ํ–‰๋งŒ ์„ ํƒํ•˜๋Š” pandas DataFrame ๋ฉ”์„œ๋“œ์ด๋‹ค. ์ด๋Š” query() ๋ฉ”์„œ๋“œ์˜ ์ธ์ˆ˜๋กœ ์ „๋‹ฌ๋œ ๋ฌธ์ž์—ด์˜ ๋ถ€์šธ ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜ํ–‰ํ•œ๋‹ค.

.head(10)์ด๋‹ค. ์ด๊ฒƒ์€ pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” DataFrame์˜ ํ–‰์„ ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์—ด์˜ ์ตœ๋Œ€๊ฐ’๊ณผ ์ตœ์†Œ๊ฐ’์„ ๊ตฌํ•˜๊ณ , ์ตœ๋Œ€๊ฐ’๊ณผ ์ตœ์†Œ๊ฐ’, ๊ทธ๋ฆฌ๊ณ  ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๊ฐ€ ํฌํ•จ๋œ DataFrame์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์ตœ๋Œ€๊ฐ’๊ณผ ์ตœ์†Œ๊ฐ’์ด ๋‹ค๋ฅธ ํ–‰๋งŒ ์„ ํƒํ•˜๊ณ  ๊ฒฐ๊ณผ DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ๋‹ค๋ฅธ ๋‚ ์งœ์— ๊ตฌ๋งคํ•œ ๊ณ ๊ฐ์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•˜๋‹ค.

ย 

ํ•ด์„ค:

df_receipt.groupby('store_cd').agg({'amount':'mean'}).reset_index().sort_values('amount', ascending=False).head(5)๋Š” df_receipt DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด amount ์ปฌ๋Ÿผ์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•˜์—ฌ ํ‰๊ท ๊ฐ’๊ณผ store_cd ์ปฌ๋Ÿผ์œผ๋กœ DataFrame์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ์ฝ”๋“œ์ด๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, ์–ป์–ด์ง„ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ  ๊ธˆ์•ก์˜ ํ‰๊ท ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฒฐ๊ณผ DataFrame์˜ ์ฒ˜์Œ 5์ค„์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ์ปดํฌ๋„ŒํŠธ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt: df_receipt: pandas์˜ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์ด๋ฉด ๋ฌด์—‡์ด๋“  ์ƒ๊ด€์—†๋‹ค.

.groupby('store_cd'): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

.agg({'amount':'mean'}): pandas์˜ GroupBy ๊ฐ์ฒด์˜ ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์˜ amount ์—ด์— mean() ์ง‘๊ณ„ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๊ฐ ๊ทธ๋ฃน์˜ ๊ธˆ์•ก ์ปฌ๋Ÿผ์˜ ํ‰๊ท ๊ฐ’์„ ํฌํ•จํ•˜๋Š” amount๋ผ๋Š” ํ•˜๋‚˜์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„ DataFrame์ด ๋ฉ๋‹ˆ๋‹ค.

.reset_index(): ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•˜๋Š” pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ์ž…๋‹ˆ๋‹ค.

.sort_values('amount', ascending=False): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ๋กœ DataFrame์˜ ํ–‰์„ amount ์—ด์˜ ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๋ฉฐ, ascending=False ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•  ๊ฒƒ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

.head(5): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ์ฒ˜์Œ 5์ค„์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” DataFrame์˜ ํ–‰์„ ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์—ด์˜ ํ‰๊ท ๊ฐ’์„ ๊ตฌํ•˜๊ณ , ํ‰๊ท ๊ฐ’๊ณผ ๊ทธ๋ฃนํ™”๋œ ์—ด์„ ํฌํ•จํ•œ DataFrame์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์–ป์€ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ  ํ‰๊ท ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์–ป์€ DataFrame์˜ ์ฒ˜์Œ 5๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ํ‰๊ท  ๋งค์ถœ์ด ๊ฐ€์žฅ ๋†’์€ ๋งค์žฅ์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

ย 

ํ•ด์„ค:

df_receipt.groupby('store_cd').agg({'amount':'median'}).reset_index().sort_values('amount', ascending=False).head(5)๋Š” df_receipt DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด amount ์ปฌ๋Ÿผ์˜ ์ค‘์•™๊ฐ’์„ ๊ณ„์‚ฐํ•˜์—ฌ store_cd ์ปฌ๋Ÿผ๊ณผ ์ค‘์•™๊ฐ’์œผ๋กœ DataFrame์„ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ  ๊ธˆ์•ก์˜ ์ค‘์•™๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฒฐ๊ณผ DataFrame์˜ ์ฒ˜์Œ 5์ค„์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ์ปดํฌ๋„ŒํŠธ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt: df_receipt: pandas์˜ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์ด๋ฉด ๋ฌด์—‡์ด๋“  ์ƒ๊ด€์—†๋‹ค.

.groupby('store_cd'): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

.agg({'amount':'median'}): pandas์˜ GroupBy ๊ฐ์ฒด์˜ ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์˜ amount ์ปฌ๋Ÿผ์— median() ์ง‘๊ณ„ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๊ฐ ๊ทธ๋ฃน์˜ amount ์ปฌ๋Ÿผ์˜ ์ค‘์•™๊ฐ’์„ ํฌํ•จํ•˜๋Š” amount๋ผ๋Š” ํ•˜๋‚˜์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„ DataFrame์ด ๋œ๋‹ค.

.reset_index(): ์ด๊ฒƒ์€ pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ๋กœ, ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

.sort_values('amount', ascending=False): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ๋กœ DataFrame์˜ ํ–‰์„ amount ์—ด์˜ ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๋Š” DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, ascending=False ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•  ๊ฒƒ์„ ์ง€์ •ํ•œ๋‹ค. ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

.head(5): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ์ฒ˜์Œ 5์ค„์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” DataFrame์˜ ํ–‰์„ ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์—ด์˜ ์ค‘์•™๊ฐ’์„ ๊ตฌํ•˜๊ณ , ์ค‘์•™๊ฐ’๊ณผ ๊ทธ๋ฃนํ™”๋œ ์—ด์ด ํฌํ•จ๋œ DataFrame์„ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์–ป์€ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ  ์ค‘์•™๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์–ป์€ DataFrame์˜ ์ฒ˜์Œ 5๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ๋งค์ถœ์˜ ์ค‘์•™๊ฐ’์ด ๊ฐ€์žฅ ๋†’์€ ๋งค์žฅ์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

ย 

ํ—ค์„ค:

df_receipt.groupby('store_cd').product_cd.apply(lambda x: x.mode()).reset_index().head(10)๋Š” df_receipt DataFrame์˜ ํ–‰์„ store_cd ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ, ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด product_cd ์—ด์˜ ๋ชจ๋“œ ๊ฐ’์„ ๊ตฌํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. reset_index() ๋ฉ”์„œ๋“œ๋Š” ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•˜๋Š” ๋ฉ”์„œ๋“œ์ด๋ฉฐ, ํ•ด๋‹น ๋ชจ๋“œ ๊ฐ’๊ณผ store_cd ์—ด์ด ํฌํ•จ๋œ DataFrame์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ์ปดํฌ๋„ŒํŠธ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt: df_receipt: pandas์—์„œ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์ž„์˜์˜ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์ผ ์ˆ˜ ์žˆ๋‹ค.

.groupby('store_cd'): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

.product_cd : DataFrame์˜ product_cd ์ปฌ๋Ÿผ์— ์ ‘๊ทผํ•˜๋Š” pandas DataFrame ์†์„ฑ์ž…๋‹ˆ๋‹ค.

.apply(lambda x: x.mode()) : pandas GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์— mode() ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” pandas GroupBy ๊ฐ์ฒด์˜ ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, mode() ํ•จ์ˆ˜๋Š” ๊ฐ ๊ทธ๋ฃน์—์„œ ๊ฐ€์žฅ ๋นˆ๋ฒˆํ•˜๊ฒŒ ๋ฐœ์ƒํ•˜๋Š” ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ๊ฒฐ๊ณผ ๊ฐ์ฒด๋Š” ์›๋ž˜ ๊ทธ๋ฃน๊ณผ ๋™์ผํ•œ ๊ธธ์ด์˜ ์‹œ๋ฆฌ์ฆˆ์ž…๋‹ˆ๋‹ค.

.reset_index(): pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ๋กœ, ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ DataFrame์˜ ํ–‰์„ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์—ด์˜ ๋ชจ๋“œ ๊ฐ’์„ ๊ตฌํ•˜๊ณ , ๋ชจ๋“œ ๊ฐ’๊ณผ ๊ทธ๋ฃนํ™” ๋œ ์—ด์„ ํฌํ•จํ•˜๋Š” DataFrame์„ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ๊ฐ ๋งค์žฅ์—์„œ ํŒ๋งค๋˜๋Š” ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” ์ƒํ’ˆ์„ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

ย 

ํ•ด์„ค:

df_receipt.groupby('store_cd').amount.var(ddof=0).reset_index().sort_values('amount', ascending=False).head(5)๋Š” df_receipt DataFrame์˜ ํ–‰์„ store_cd ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ddof=0์œผ๋กœ ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๊ธˆ์•ก ์—ด์— ๋Œ€ํ•œ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜์—ฌ ๊ทธ ๋ถ„์‚ฐ ๊ฐ’๊ณผ store_cd ์—ด์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. reset_index() ๋ฉ”์„œ๋“œ๋Š” ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฒฐ๊ณผ DataFrame์„ ๋ถ„์‚ฐ๊ฐ’์˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ฒ˜์Œ 5๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt: df_receipt: pandas์˜ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์ž„์˜์˜ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์ผ ์ˆ˜ ์žˆ๋‹ค.

.groupby('store_cd'): pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

.amount.var(ddof=0): GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ddof=0์œผ๋กœ amount ์—ด์˜ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜๋Š” pandas DataFrame ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, ddof ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ํ‘œ๋ณธ ๋ถ„์‚ฐ ๊ณ„์‚ฐ์— ์‚ฌ์šฉํ•  ์ œ๊ณฑ๊ทผ์ธ ์ž์œ ๋„ ๋ธํƒ€๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ddof์— 0์„ ์ง€์ •ํ•˜๋ฉด ํ‘œ๋ณธ ๋ถ„์‚ฐ์ด ์•„๋‹Œ ๋ชจ์ง‘๋‹จ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•ด์•ผ ํ•จ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๊ฒฐ๊ณผ๋Š” ๋ถ„์‚ฐ๊ฐ’๊ณผ store_cd ๊ฐ’์„ ์ธ๋ฑ์Šค๋กœ ํ•˜๋Š” Series์ž…๋‹ˆ๋‹ค.

.reset_index() : pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•˜๋Š” ๋ฉ”์†Œ๋“œ์ž…๋‹ˆ๋‹ค.

.sort_values('amount', ascending=False) : pandas์˜ DataFrame ๋ฉ”์„œ๋“œ๋กœ DataFrame์˜ ํ–‰์„ amount ์—ด์˜ ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๋Š” DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, ascending=False ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์ •๋ ฌ ์ˆœ์„œ๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ํ•  ๊ฒƒ ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

.head(5): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ์ฒ˜์Œ 5์ค„์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” DataFrame์˜ ํ–‰์„ ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์—ด์˜ ๋ชจ๋ถ„์‚ฐ์„ ๊ตฌํ•˜๊ณ , ๋ถ„์‚ฐ ๊ฐ’๊ณผ ๊ทธ๋ฃนํ™” ๋œ ์—ด์„ ํฌํ•จํ•˜๋Š” DataFrame์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์–ป์€ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ  ๋ถ„์‚ฐ๊ฐ’์˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์–ป์€ DataFrame์˜ ์ฒ˜์Œ 5๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ๋งค์ถœ ๋ณ€๋™์ด ํฐ ๋งค์žฅ์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
ย 
ํ•ด์„ค:ย 

df_receipt.groupby('store_cd').amount.std(ddof=0).reset_index().sort_values('amount', ascending=False).head(5)๋Š” df_receipt DataFrame ์˜ ํ–‰์„ store_cd ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ddof=0์œผ๋กœ amount ์—ด์˜ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ํ‘œ์ค€ ํŽธ์ฐจ ๊ฐ’๊ณผ store_cd ์—ด์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์–ป์€ DataFrame์„ ํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ฒ˜์Œ 5๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt: df_receipt: pandas์˜ DataFrame ๊ฐ์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์ด๋ฉฐ, df_receipt๋ผ๋Š” ์ด๋ฆ„์€ ์ž„์˜์˜ ์œ ํšจํ•œ ๋ณ€์ˆ˜ ์ด๋ฆ„์ผ ์ˆ˜ ์žˆ๋‹ค.

.groupby('store_cd'): pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, DataFrame์˜ ํ–‰์„ store_cd ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๊ฐ ๊ทธ๋ฃน์— ๊ฐœ๋ณ„์ ์œผ๋กœ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” GroupBy ๊ฐ์ฒด๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

.amount.std(ddof=0) : GroupBy ๊ฐ์ฒด์˜ ๊ฐ ๊ทธ๋ฃน์˜ amount ์—ด์˜ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ddof=0์œผ๋กœ ๊ณ„์‚ฐํ•˜๋Š” pandas DataFrame ๋ฉ”์†Œ๋“œ์ž…๋‹ˆ๋‹ค. degrees of freedom)๋ฅผ ์ง€์ •ํ•˜๋Š”๋ฐ, ddof์— 0์„ ์ง€์ •ํ•˜๋ฉด ํ‘œ๋ณธ ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ ์•„๋‹Œ ๋ชจ์ง‘๋‹จ์˜ ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜๊ฒŒ ๋œ๋‹ค. ๊ฒฐ๊ณผ๋Š” ํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’๊ณผ store_cd์˜ ๊ฐ’์„ ์ธ๋ฑ์Šค๋กœ ํ•˜๋Š” Series๊ฐ€ ๋œ๋‹ค.

.reset_index() : pandas์˜ DataFrame ๋ฉ”์†Œ๋“œ์ด๋ฉฐ, ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ธฐ๋ณธ ์ธ๋ฑ์Šค๋กœ ์žฌ์„ค์ •ํ•œ๋‹ค.

.sort_values('amount', ascending=False) : pandas์˜ DataFrame ๋ฉ”์„œ๋“œ๋กœ DataFrame์˜ ํ–‰์„ amount ์—ด์˜ ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๋Š” DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, ascending=False ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•  ๊ฒƒ ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

.head(5): pandas์˜ DataFrame ๋ฉ”์„œ๋“œ์ด๋ฉฐ, DataFrame์˜ ์ฒ˜์Œ 5์ค„์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” DataFrame์˜ ํ–‰์„ ํŠน์ • ์—ด์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , ๊ฐ ๊ทธ๋ฃน์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ์—ด์˜ ๋ชจํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ตฌํ•˜๊ณ , ํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’๊ณผ ๊ทธ๋ฃนํ™”ํ•œ ์—ด์ด ํฌํ•จ๋œ DataFrame์„ ๋ฐ˜ํ™˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์–ป์€ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ  ํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์–ป์€ DataFrame์˜ ์ฒ˜์Œ 5๊ฐœ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ๋งค์ถœ ํŽธ์ฐจ๊ฐ€ ํฐ ๋งค์žฅ์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
ย 
ํ•ด์„ค :ย 

np.percentile(df_receipt['amount'], q=np.range(1, 5) * 25)๋ผ๋Š” ์ฝ”๋“œ๋Š” NumPy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ np.percentile() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ df_receipt DataFrame์˜ amount ์—ด์˜ ๊ฐ’์˜ ์‚ฌ๋ถ„์œ„์ˆ˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ์š”์†Œ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•œ๋‹ค.

np.percentile() : ๋ฐฐ์—ด์˜ ๋ฐฑ๋ถ„์œจ์„ ๊ณ„์‚ฐํ•˜๋Š” NumPy ํ•จ์ˆ˜์ด๋‹ค. ์ฒซ ๋ฒˆ์งธ ์ธ์ˆ˜๋Š” ๋ฐฑ๋ถ„์œ„์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•  ๋ฐฐ์—ด, ๋‘ ๋ฒˆ์งธ ์ธ์ˆ˜๋Š” ๊ณ„์‚ฐํ•  ๋ฐฑ๋ถ„์œ„์ˆ˜ ๊ฐ’. q ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” ๊ณ„์‚ฐํ•  ๋ฐฑ๋ถ„์œ„์ˆ˜๋ฅผ ์ง€์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, q=np.range(1, 5) * 25๋Š” ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์—์„œ 25, 50, 75 ์œ„์น˜์˜ ๋ฐฑ๋ถ„์œ„์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋Š” np.range(1, 5) * 25 ์‹์ด ๋ถ„ํฌ์—์„œ ๋ฐฑ๋ถ„์œ„์ˆ˜ ์œ„์น˜์ธ ๊ฐ’ [25, 50, 75]์˜ ๋ฐฐ์—ด์„ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

df_receipt['amount'] : df_receipt['amount']: df_receipt DataFrame์˜ amount ์—ด์„ ๋‚˜ํƒ€๋‚ด๋Š” pandas Series ๊ฐ์ฒด์ด๋ฉฐ, df_receipt DataFrame์€ ํ˜„์žฌ ํ™˜๊ฒฝ์— ์กด์žฌํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ฐ€์ •ํ•œ๋‹ค. ๋กœ ๊ฐ€์ •ํ•œ๋‹ค.

q=np.range(1, 5) * 25: ๊ณ„์‚ฐํ•  ๋ฐฑ๋ถ„์œ„์ˆ˜๋ฅผ ์ง€์ •ํ•˜๋Š” NumPy ๋ฐฐ์—ด์ด๋‹ค. np.range(1, 5) * 25 ์‹์€ ๋ถ„ํฌ์—์„œ ๋ฐฑ๋ถ„์œ„์ˆ˜์˜ ์œ„์น˜์ธ ๊ฐ’ [25, 50, 75]๋ฅผ ๊ฐ€์ง„ ๋ฐฐ์—ด์„ ์ƒ์„ฑํ•œ๋‹ค.

์ด ์ฝ”๋“œ๋Š” df_receipt DataFrame์˜ amount ์—ด์˜ ์‚ฌ๋ถ„์œ„๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋Š” ๋ถ„ํฌ๋ฅผ 4๋“ฑ๋ถ„ํ•˜๋Š” ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์‚ฌ๋ถ„์œ„์ˆ˜(Q1)๋Š” 25๋ฒˆ์งธ ์‚ฌ๋ถ„์œ„์ˆ˜, ๋‘ ๋ฒˆ์งธ ์‚ฌ๋ถ„์œ„์ˆ˜(Q2)๋Š” 50๋ฒˆ์งธ ์‚ฌ๋ถ„์œ„์ˆ˜(์ค‘์•™๊ฐ’์ด๋ผ๊ณ ๋„ ํ•จ), ์„ธ ๋ฒˆ์งธ ์‚ฌ๋ถ„์œ„์ˆ˜(Q3)๋Š” 75๋ฒˆ์งธ ์‚ฌ๋ถ„์œ„์ˆ˜์ด๋‹ค. ์ด ์ฝ”๋“œ์˜ ์ถœ๋ ฅ์€ ์‚ฌ๋ถ„์œ„์ˆ˜ ๋ฐฐ์—ด์ด๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๊ธˆ์•ก ํ•ญ๋ชฉ์˜ ๋ถ„ํฌ๋ฅผ ์š”์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค.
ย 
ย 
ํ•ด์„ค :ย 

์ด ์ฝ”๋“œ๋Š” pandas์˜ quantile() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ df_receipt dataframe์˜ amount ์—ด์˜ ์‚ฌ๋ถ„์œ„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. q ๋งค๊ฐœ ๋ณ€์ˆ˜๋Š” 1/4, 2/4(์ค‘๊ฐ„๊ฐ’), 3/4, 4/4(์ตœ๋Œ€๊ฐ’) ์ค‘ ๊ณ„์‚ฐํ•  ์‚ฌ๋ถ„์œ„๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

์ด ์ฝ”๋“œ์—์„œ๋Š” df_receipt์˜ amount ์—ด์— ๋Œ€ํ•ด quantile()์„ ํ˜ธ์ถœํ•˜๊ณ  q ํŒŒ๋ผ๋ฏธํ„ฐ์— ์‚ฌ๋ถ„์œ„์ˆ˜ 1, 2, 3, 4์— ๊ฐ๊ฐ ํ•ด๋‹นํ•˜๋Š” ๊ฐ’[0.25, 0.5, 0.75, 1.0]์˜ ๋ฐฐ์—ด์„ ์„ค์ •ํ•˜์—ฌ ์‚ฌ๋ถ„์œ„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์ฝ”๋“œ์˜ ๊ฒฐ๊ณผ ์ถœ๋ ฅ์€ 4๊ฐœ์˜ ๊ฐ’์˜ ๋ฐฐ์—ด์ด ๋˜๋ฉฐ, ๊ฐ ๊ฐ’์€ ํ•ด๋‹น ์‚ฌ๋ถ„์œ„์ˆ˜ ๊ฐ’์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

ย 

ํ•ด์„ค:

df_receipt.groupby('store_cd').amount.mean().reset_index().query('amount >= 330')๋Š” df_receipt DataFrame์„ store_cd ์—ด๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ๊ทธ๋ฃน์˜ amount ์—ด์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜๊ณ , ๊ฒฐ๊ณผ DataFrame์„ amount์˜ ํ‰๊ท ๊ฐ’์ด 330 ์ด์ƒ์ธ ํ–‰๋งŒ ํฌํ•จํ•˜๋„๋ก ํ•„ํ„ฐ๋งํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

df_receipt.groupby('store_cd'): df_receipt์˜ DataFrame์„ store_cd ์ปฌ๋Ÿผ์˜ ๊ฐ’์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•œ๋‹ค.

.amount.mean(): ๊ฐ ๊ทธ๋ฃน์˜ amount ์ปฌ๋Ÿผ์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

.reset_index(): ๊ฒฐ๊ณผ DataFrame์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

.query('amount >= 330'): amount์˜ ํ‰๊ท ๊ฐ’์ด 330 ์ด์ƒ์ธ ํ–‰๋งŒ ํฌํ•จํ•˜๋„๋ก ๊ฒฐ๊ณผ DataFrame์„ ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” ํ‰๊ท  ๊ฑฐ๋ž˜ ๊ธˆ์•ก์ด 330 ์ด์ƒ์ธ ๋งค์žฅ๋งŒ ์„ ํƒํ•œ๋‹ค.

ย 

ํ•ด์„ค:

df_receipt[~df_receipt['customer_id'].str.startswith("Z")].groupby('customer_id').amount.sum().mean()๋ผ๋Š” ์ฝ”๋“œ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฉ๋‹ˆ๋‹ค.

df_receipt[~df_receipt['customer_id'].str.startswith("Z")]: ์ด๊ฒƒ์€ df_receipt DataFrame์„ ํ•„ํ„ฐ๋งํ•˜์—ฌ customer_id ์—ด์ด "Z"๋กœ ์‹œ์ž‘ํ•˜๋Š” ํ–‰์„ ์ œ์™ธํ•ฉ๋‹ˆ๋‹ค. ํ‹ธ๋‹ค(~) ๋ฌธ์ž๋Š” str.startswith() ๋ฉ”์„œ๋“œ๊ฐ€ ๋ฐ˜ํ™˜ํ•˜๋Š” ๋ถ€์šธ ๊ฐ’์„ ๋ฐ˜์ „์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ถ€์šธ ์—ฐ์‚ฐ์ž๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

.groupby('customer_id').amount.sum(): ํ•„ํ„ฐ๋ง๋œ DataFrame์„ customer_id ์ปฌ๋Ÿผ์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ  ๊ฐ ๊ทธ๋ฃน์˜ amount ์ปฌ๋Ÿผ์˜ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

.mean(): ๊ฐ ๊ณ ๊ฐ(customer_id๊ฐ€ "Z"๋กœ ์‹œ์ž‘ํ•˜๋Š” ๊ณ ๊ฐ ์ œ์™ธ)์ด ์‚ฌ์šฉํ•œ ์ด ๊ธˆ์•ก์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒฐ๊ณผ ์‹œ๋ฆฌ์ฆˆ์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” df_receipt DataFrame์˜ customer_id๊ฐ€ "Z"๋กœ ์‹œ์ž‘ํ•˜์ง€ ์•Š๋Š” ๊ฐ ๊ณ ๊ฐ์ด ์‚ฌ์šฉํ•œ ๊ธˆ์•ก์˜ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋Š” ๋‹จ๊ณจ ๊ณ ๊ฐ์˜ ๊ตฌ๋งค ํ–‰๋™์„ ํŒŒ์•…ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

ย 

ย 
ํ•ด์„ค:

์ด ์ฝ”๋“œ๋Š” ์ต๋ช…์ด ์•„๋‹Œ ๊ณ ๊ฐ์˜ ํ‰๊ท  ์†Œ๋น„์•ก์„ ๊ณ„์‚ฐํ•œ๋‹ค. ์•„๋ž˜๋Š” ๊ทธ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

df_receipt๋Š” ํŠธ๋žœ์žญ์…˜์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š” DataFrame์ด๋‹ค.

query ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ customer_id๊ฐ€ "Z"๋กœ ์‹œ์ž‘ํ•˜๋Š” ํŠธ๋žœ์žญ์…˜์„ ํ•„ํ„ฐ๋งํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Š” str.startswith() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ž์—ด์ด ํŠน์ • ๋ฌธ์ž ๋˜๋Š” ๋ถ€๋ถ„ ๋ฌธ์ž์—ด๋กœ ์‹œ์ž‘ํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ๊ฒ€์‚ฌํ•˜๋Š” str.startswith() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.

์™„์„ฑ๋œ DataFrame์€ groupby ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ customer_id๋กœ ๊ทธ๋ฃนํ™”๋ฉ๋‹ˆ๋‹ค.

๊ธˆ์•ก ์—ด์€ sum ๋ฉ”์„œ๋“œ๋กœ ์„ ํƒ๋˜์–ด ๊ฐ ๊ณ ๊ฐ์ด ์‚ฌ์šฉํ•œ ๊ธˆ์•ก์˜ ํ•ฉ๊ณ„๊ฐ€ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ mean ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด์ „ ๋‹จ๊ณ„์—์„œ ๊ณ„์‚ฐ๋œ ๊ธˆ์•ก์˜ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ์ด ์ฝ”๋“œ์˜ ์ตœ์ข… ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

ย 

ํ•ด์„ค :ย 

์ฝ”๋“œ df_amount_sum = df_receipt[~df_receipt['customer_id'].str.startswith("Z")].groupby('customer_id').amount.sum(), amount_mean = df_amount. amount_sum. mean(), df_amount_sum = df_amount_sum.reset_index(), df_amount_sum[df_amount_sum['amount'] >= amount_mean].head(10) ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์žˆ์Šต๋‹ˆ๋‹ค.

df_receipt[~df_receipt['customer_id'].str.startswith("Z")].groupby('customer_id').amount.sum(): df_receipt์˜ DataFrame์—์„œ customer_id ์ปฌ๋Ÿผ์ด "Z"๋กœ ์‹œ์ž‘ํ•˜๋Š” ํ–‰์„ ์ œ์™ธํ•˜๊ณ  customer_id ์ปฌ๋Ÿผ์œผ๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ๊ทธ๋ฃน์˜ amount ์ปฌ๋Ÿผ์˜ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๊ฒฐ๊ณผ ๊ฐ์ฒด๋Š” ๊ฐ ๊ณ ๊ฐ(customer_id๊ฐ€ "Z"๋กœ ์‹œ์ž‘ํ•˜๋Š” ๊ณ ๊ฐ ์ œ์™ธ)์ด ์‚ฌ์šฉํ•œ ๊ธˆ์•ก์˜ ํ•ฉ๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” Series์ด๋‹ค.

amount_mean = df_amount_sum.mean(): df_receipt DataFrame์˜ customer_id๊ฐ€ "Z"๋กœ ์‹œ์ž‘ํ•˜์ง€ ์•Š๋Š” ๊ณ ๊ฐ์˜ ํ‰๊ท  ์ง€์ถœ ๊ธˆ์•ก์„ ๋‚˜ํƒ€๋‚ด๋Š” df_amount_sum Series์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•œ๋‹ค.

df_amount_sum = df_amount_sum.reset_index(): df_amount_sum ์‹œ๋ฆฌ์ฆˆ์˜ ์ธ๋ฑ์Šค๋ฅผ ์žฌ์„ค์ •ํ•˜์—ฌ customer_id์™€ amount ๋‘ ๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„ DataFrame์œผ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.

df_amount_sum[df_amount_sum['amount'] >= amount_mean].head(10): df_amount_sum์˜ DataFrame์„ ํ•„ํ„ฐ๋งํ•˜์—ฌ amount ์นผ๋Ÿผ์ด amount_mean ์ด์ƒ์˜ ํ–‰๋งŒ ํฌํ•จํ•˜๋„๋ก ํ•˜๊ณ , ๊ฒฐ๊ณผ DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” df_receipt DataFrame์˜ customer_id๊ฐ€ "Z"๋กœ ์‹œ์ž‘ํ•˜์ง€ ์•Š๋Š” ๋ชจ๋“  ๊ณ ๊ฐ์˜ ํ‰๊ท  ์ง€์ถœ์•ก ์ด์ƒ์˜ ์ง€์ถœ์•ก์„ ๊ฐ€์ง„ ๊ณ ๊ฐ์„ ์„ ํƒํ•œ๋‹ค. ๊ฒฐ๊ณผ DataFrame์—๋Š” ์„ ํƒ๋œ ๊ฐ ๊ณ ๊ฐ์˜ customer_id์™€ ์ด ์ง€์ถœ ๊ธˆ์•ก์ด ์ด ์ง€์ถœ ๊ธˆ์•ก์˜ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ๋˜์–ด ์ €์žฅ๋œ๋‹ค.
ย 
ํ•ด์„ค:

pd.merge(df_receipt, df_store[['store_cd','store_name']], how='inner', on='store_cd').head(10)๋ผ๋Š” ์ฝ”๋“œ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ฒ˜๋ฆฌํ•œ๋‹ค.

df_store[['store_cd','store_name']] : df_store DataFrame ์ค‘ store_cd์™€ store_name์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„ ํ•˜์œ„ ์ง‘ํ•ฉ์„ ์„ ํƒํ•œ๋‹ค.

pd.merge(df_receipt, df_store[['store_cd','store_name']], how='inner', on='store_cd'): df_receipt DataFrame๊ณผ df_store['store_cd','store_name'] ์‚ฌ์ด์˜ ๋‚ด๋ถ€ DataFrame์„ ๋ณ‘ํ•ฉํ•œ๋‹ค. name'] ์‚ฌ์ด์˜ ๋‚ด๋ถ€ ๊ฒฐํ•ฉ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ store_cd ์ปฌ๋Ÿผ์˜ DataFrame์„ ์ƒ์„ฑํ•œ๋‹ค. store_cd ์ปฌ๋Ÿผ์˜ ๋‘ DataFrame์„ ๋ณ‘ํ•ฉํ•˜์—ฌ ๋‘ DataFrame์—์„œ ์ผ์น˜ํ•˜๋Š” ๊ฐ’์„ ๊ฐ€์ง„ ํ–‰๋งŒ ํฌํ•จํ•œ๋‹ค.

.head(10): ๋ณ‘ํ•ฉ๋œ DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” df_receipt DataFrame๊ณผ df_store DataFrame์„ store_cd ์—ด๋กœ ๋ณ‘ํ•ฉํ•˜๊ณ  df_store DataFrame์˜ store_name ์—ด์„ ๋ณ‘ํ•ฉ๋œ DataFrame์— ์ถ”๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ฒฐ๊ณผ DataFrame์—๋Š” df_receipt์˜ ํŠธ๋žœ์žญ์…˜์ด ๋ฐœ์ƒํ•œ ๋งค์žฅ์— ๋Œ€ํ•œ ์ •๋ณด(store_cd, store_name, ๊ธฐํƒ€ df_receipt DataFrame์˜ ๋ชจ๋“  ์—ด์ด ํฌํ•จ๋จ)๊ฐ€ ํฌํ•จ๋œ๋‹ค.

ย 

ํ•ด์„ค:

์ฝ”๋“œ pd.merge(df_product , df_category[['category_small_cd','category_small_name']] ์ž…๋‹ˆ๋‹ค. , how='inner', on='category_small_cd').head(10)์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฉ๋‹ˆ๋‹ค.

df_category[['category_small_cd','category_small_name']]: ์ด๊ฒƒ์€ df_category DataFrame ์ค‘ category_small_cd์™€ category_small_name์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„ ํ•˜์œ„ ์ง‘ํ•ฉ์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

pd.merge(df_product , df_category[['category_small_cd','category_small_name']]) , how='inner', on='category_small_cd'): df_product DataFrame๊ณผ df_category[['category_small_cd','category_small_name']] ๊ฐ„์˜ ๋‚ด๋ถ€ ๊ฒฐํ•ฉ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. category_small_cd ์ปฌ๋Ÿผ์˜ DataFrame์„ ์ƒ์„ฑํ•œ๋‹ค. category_small_cd ์ปฌ๋Ÿผ์˜ ๋‘ DataFrame์„ ๋ณ‘ํ•ฉํ•˜์—ฌ ๋‘ DataFrame์—์„œ ์ผ์น˜ํ•˜๋Š” ๊ฐ’์„ ๊ฐ€์ง„ ํ–‰๋งŒ ํฌํ•จํ•œ๋‹ค.

.head(10): ๋ณ‘ํ•ฉ๋œ DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” df_product DataFrame๊ณผ df_category DataFrame์„ category_small_cd ์—ด๋กœ ๋ณ‘ํ•ฉํ•˜๊ณ , df_category DataFrame์—์„œ category_small_name ์—ด์„ ๋ณ‘ํ•ฉํ•œ DataFrame์— ์ถ”๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ฒฐ๊ณผ DataFrame์€ df_product์˜ ์ƒํ’ˆ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋ฉฐ, category_small_cd, category_small_name ๋ฐ df_product DataFrame์˜ ๋‹ค๋ฅธ ๋ชจ๋“  ์ปฌ๋Ÿผ์„ ํฌํ•จํ•œ๋‹ค.
ย 
ํ•ด์„ค:

์ฝ”๋“œ df_amount_sum = df_receipt.groupby('customer_id').amount.sum().reset_index() df_tmp = df_customer.query('gender_cd == "1" and not customer_id. str.startswith("Z")', engine='python') pd.merge(df_tmp['customer_id'], df_amount_sum, how='left', on='customer_id').fillna( 0).head(10)์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

df_amount_sum = df_receipt.groupby('customer_id').amount.sum().reset_index(): ์ด๊ฒƒ์€ df_receipt DataFrame์„ customer_id๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๊ฐ ๊ณ ๊ฐ์˜ amount ์—ด์„ ํ•ฉ๊ณ„ํ•˜์—ฌ ์ƒˆ๋กœ์šด DataFrame df_amount_sum์„ ์ƒ์„ฑํ•œ๋‹ค. ์ƒ์„ฑ๋œ DataFrame์€ customer_id์™€ amount ๋‘ ๊ฐœ์˜ ์ปฌ๋Ÿผ์„ ๊ฐ€์ง„๋‹ค.

df_customer.query('gender_cd == "1" and not customer_id.str.startswith("Z")', engine='python'): ์ด๊ฒƒ์€ df_customer DataFrame ์ค‘ gender_cd๊ฐ€ 1์ด๊ณ  ์ด๊ณ  customer_id๊ฐ€ "Z"๋กœ ์‹œ์ž‘ํ•˜์ง€ ์•Š๋Š” ํ•˜์œ„ ์ง‘ํ•ฉ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. engine='python' ์ธ์ˆ˜๋Š” startswith์™€ ๊ฐ™์€ ๋ฌธ์ž์—ด ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

pd.merge(df_tmp['customer_id'], df_amount_sum, how='left', on='customer_id'): df_tmp DataFrame์˜ customer_id ์ปฌ๋Ÿผ๊ณผ df_amount_sum์„ ๋ณ‘ํ•ฉํ•œ๋‹ค. DataFrame์˜ customer_id ์ปฌ๋Ÿผ์„ ์™ผ์ชฝ ๊ฒฐํ•ฉ์œผ๋กœ ๊ฒฐํ•ฉํ•œ๋‹ค. df_tmp DataFrame์˜ ๋ชจ๋“  ํ–‰์„ ํฌํ•จํ•˜๊ณ  df_amount_sum DataFrame์˜ ๊ธˆ์•ก ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค(์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ).

.fillna(0): amount ์ปฌ๋Ÿผ์˜ ๋ˆ„๋ฝ๋œ ๊ฐ’์„ 0์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

.head(10): ๋ณ‘ํ•ฉ๋œ DataFrame์˜ ์ฒ˜์Œ 10๊ฐœ์˜ ํ–‰์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์š”์•ฝํ•˜๋ฉด, ์ด ์ฝ”๋“œ๋Š” df_receipt DataFrame์˜ ๊ฐ ๊ณ ๊ฐ์ด ์‚ฌ์šฉํ•œ ์ด ๊ธˆ์•ก์„ ๊ณ„์‚ฐํ•œ ํ›„, df_customer DataFrame์—์„œ ๋‚จ์„ฑ์ด๊ณ  ์œ ํšจํ•œ customer_id ๊ฐ’์„ ๊ฐ€์ง„ ๊ณ ๊ฐ์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์„ ์„ ํƒํ•œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์ด ์ฝ”๋“œ๋Š” ์™ผ์ชฝ ๊ฒฐํ•ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ customer_id ์—ด๋กœ ๋‘ DataFrame์„ ๊ฒฐํ•ฉํ•˜๊ณ  ๊ฒฐ๊ณผ DataFrame์— ๊ฐ ๊ณ ๊ฐ์ด ์‚ฌ์šฉํ•œ ๊ธˆ์•ก์˜ ํ•ฉ๊ณ„๋ฅผ ํฌํ•จํ•œ๋‹ค. ๊ฒฐ๊ณผ DataFrame์—๋Š” ์„ ํƒ๋œ ๋‚จ์„ฑ ๊ณ ๊ฐ์˜ customer_id ์—ด๊ณผ amount ์—ด์ด ํฌํ•จ๋œ๋‹ค.
ย 
ย 
ํ•ด์„ค:

์ด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

.query() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ df_receipt ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ customer_id ์—ด์ด "Z"๋กœ ์‹œ์ž‘ํ•˜์ง€ ์•Š๋Š” ํ–‰์„ ํ•„ํ„ฐ๋งํ•˜์—ฌ df_data์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

df_data์—์„œ customer_id ์—ด๊ณผ sales_ymd ์—ด์„ ๊ธฐ์ค€์œผ๋กœ ์ค‘๋ณต๋œ ํ–‰์„ ์ œ๊ฑฐํ•˜๊ณ  customer_id๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ํŒ๋งค์ผ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ df_cnt์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

df_cnt๋ฅผ sales_ymd ์—ด๋กœ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ƒ์œ„ 20๊ฐœ ํ–‰์„ ๊ฐ€์ ธ์˜จ๋‹ค.

df_data๋ฅผ customer_id๋กœ ๊ทธ๋ฃนํ™”ํ•˜๊ณ , amount ์ปฌ๋Ÿผ์„ ํ•ฉ์‚ฐํ•˜๊ณ , amount ์ปฌ๋Ÿผ์—์„œ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜์—ฌ ์ƒ์œ„ 20๊ฐœ ํ–‰์„ ๊ฐ€์ ธ์™€ df_sum์— ์ €์žฅํ•œ๋‹ค.

df_cnt์™€ df_sum ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ how='outer' ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง„ pd.merge() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ customer_id ์—ด๋กœ ๋ณ‘ํ•ฉํ•˜์—ฌ df_data์— ์ €์žฅํ•œ๋‹ค(์ฆ‰, ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ๋ฅผ ๋ณ‘ํ•ฉ ํ›„์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ํฌํ•จ). ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—๋Š” ํŒ๋งค์ผ ์ˆ˜์™€ ๊ตฌ๋งค ๊ธˆ์•ก์˜ ํ•ฉ๊ณ„๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ƒ์œ„ 20๋ช…์˜ ๊ณ ๊ฐ์ด ํฌํ•จ๋œ๋‹ค.

ย 

ํ•ด์„ค:

์ด ์ฝ”๋“œ๋Š” ์™ธ๋ถ€ ๊ฒฐํ•ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ df_store์™€ df_product๋ฅผ ๊ฒฐํ•ฉํ•œ๋‹ค.

๋จผ์ € ๋‘ ๊ฐœ์˜ ์ž„์‹œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ df_store_tmp์™€ df_product_tmp๊ฐ€ ๊ฐ๊ฐ df_store์™€ df_product์˜ ๋ณต์‚ฌ๋ณธ์œผ๋กœ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์™ธ๋ถ€ ๋ฐ”์ธ๋”ฉ์ด ๋ณ‘ํ•ฉํ•˜๊ธฐ ์œ„ํ•ด ๊ณตํ†ต ์ปฌ๋Ÿผ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ pd.merge() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋ณ‘ํ•ฉํ•ฉ๋‹ˆ๋‹ค. how='outer' ์ธ์ˆ˜๋Š” ์™ธ๋ถ€ ๋ณ‘ํ•ฉ์„ ์›ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋‚ด๊ณ  on='key'๋Š” ๋ณ‘ํ•ฉํ•  ๊ณตํ†ต ์ปฌ๋Ÿผ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—๋Š” ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๋ชจ๋“  ํ–‰์ด ํฌํ•จ๋˜๋ฉฐ, ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๋Š” ์…€์—๋Š” ๋ˆ„๋ฝ๋œ ๊ฐ’(NaN)์ด ํฌํ•จ๋œ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ len()์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ‘ํ•ฉ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ํ–‰ ์ˆ˜๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. ๋‘ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ํ–‰ ์ˆ˜๋Š” ๋™์ผํ•˜๋ฏ€๋กœ ๋‚ด๋ถ€ ๋ณ‘ํ•ฉ, ์™ผ์ชฝ ๋ณ‘ํ•ฉ, ์˜ค๋ฅธ์ชฝ ๋ณ‘ํ•ฉ, ์™ธ๋ถ€ ๋ณ‘ํ•ฉ ์ค‘ ์–ด๋А ๊ฒƒ์„ ์‚ฌ์šฉํ•˜๋“  ์ถœ๋ ฅ์€ ๋™์ผํ•˜๋‹ค. ์ด ๊ฒฝ์šฐ ๋ณ‘ํ•ฉ๋œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์ด ํ–‰ ์ˆ˜๊ฐ€ ์ถœ๋ ฅ๋œ๋‹ค.

ย 

ย 

Comment