Data Science 100 Knocks (Structured Data Processing) – R Part5 (Q81 to Q100)

Articles in English
 
Commentary :

The code above reads a CSV file called R_df_product_full_UTF-8_noh.csv into a data frame called df_product_full, then assigns new column names to the data frame. Here's what each argument in the read.csv() function does:

"../data/R_df_product_full_UTF-8_noh.csv" - This is the file path and name of the input CSV file. The .. at the beginning of the path means that the file is located in the parent directory of the current working directory.

colClasses = c_class - This argument specifies the class of each column in the data frame. c_class is a vector that specifies the class of each column in the data frame. In this case, the first and fifth to ninth columns are set to NA, indicating that their classes should be determined automatically. The second to fourth columns are set to character, indicating that they should be treated as character columns.

header = FALSE - This argument specifies that the input CSV file does not have a header row.

fileEncoding = "UTF-8" - This argument specifies the encoding of the input CSV file. UTF-8 is a character encoding that supports a wide range of characters from different languages.

colnames(df_product_full) <- c("product_cd", "category_major_cd", "category_medium_cd", "category_small_cd", "unit_price", "unit_cost", "category_major_name", "category_medium_name", "category_small_name") - This command assigns new column names to the data frame. The new column names are "product_cd", "category_major_cd", "category_medium_cd", "category_small_cd", "unit_price", "unit_cost", "category_major_name", "category_medium_name", and "category_small_name", in that order.

head(df_product_full, 3) - This command prints the first three rows of the df_product_full data frame to the console.

In summary, the read.csv() function reads the R_df_product_full_UTF-8_noh.csv file into a data frame called df_product_full, using the specified column classes, header setting, and file encoding. The resulting data frame is then given new column names and printed to the console using the head() function.
 
 
Commentary :

The code above reads in a tab-separated value (TSV) file called R_df_product_full_UTF-8_header.tsv and stores the contents in a data frame called df_product_tmp. Here's what each argument in the read.table() function does:

c_class - This is a vector specifying the classes of each column in the TSV file. In this case, the first column is left as NA, and the other columns are specified to be of class "character".

"../data/R_df_product_full_UTF-8_header.tsv" - This is the file path and name of the input TSV file.

colClasses = c_class - This argument specifies the column classes of the input TSV file.

header = TRUE - This argument specifies that the first row of the input file contains column names.

fileEncoding = "UTF-8" - This argument specifies the character encoding of the input TSV file.

After reading in the TSV file, the head() function is used to display the first 3 rows of the resulting df_product_tmp data frame.

Overall, this code is similar to the previous code that reads in a CSV file and stores the contents in the df_product_full data frame. However, this code reads in a TSV file and stores the contents in a temporary data frame called df_product_tmp. The purpose of using a temporary data frame may be to perform some data cleaning or manipulation before storing the final result in a data frame with a different name
 
 

 

Data Science 100 Knocks (Structured Data Processing) - R
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice R, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - R Part1 (Q1 to Q20)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice R, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - R Part2 (Q21 to Q40)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice R, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - R Part3 (Q41 to Q60)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice R, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - R Part4 (Q61 to Q80)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice R, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - R Part5 (Q81 to Q100)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice R, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - SQL
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.

 

Comment