Automated Machine Learning Model in Microsoft Azure (Part1/2)

4 min readJun 5, 2019

Co-Authors: Me x Kyle Akepanditaworn (Microsoft)

สวัสดีครับทุกคน หลังจากห่างหายไปนาน (เกือบปี) วันนี้จะมาพูดถึงความสามารถนึงของ Service นึงที่อยู่ใน Microsoft Azure ชื่อว่า Azure Machine Learning Service

เนื้อหาในนี้น่าจะประกอบด้วย 4 หัวข้อ คือ แนะนำก่อนให้พอรู้จักกับ Service ตัวนี้ก่อน ต่อมาก็จะมาดูกันว่าทำไมเราต้องใช้ Service ตัวนี้จะช่วยทำให้ชีวิตเราง่ายขึ้นอย่างไร เสร็จจากนี้ก็จะมาดูความสามารถของมันตั้งแต่การ Preprocess ข้อมูล จนถึงการสร้างโมเดล และ Overview เล็กน้อย ก่อนจะไปส่วน Tutorial ซึ่งอยู่ในอีกพาร์ทนึง ถ้าพร้อมแล้ว ไปลุยกันครับ

1. Introduction

หลาย ๆ คนน่าจะรู้จัก Microsoft Azure มากันบ้างแล้ว ถ้ายังไม่รู้จัก มันก็คือผลิตภัณฑ์ของ Microsoft ที่ให้บริการต่าง ๆ บน Cloud โดยคร่าว ๆ แล้ว Service ของ Azure เองก็มีเยอะ (มาก) ดังรูป ถามว่าเคยใช้อันไหนบ้าง ก็เรียกได้ว่าน้อยมาก เพราะไม่มีตัง (ไม่ใช่ละ เค้าให้ทดลองฟรีตั้ง 200 เหรียญ) เพราะว่า Service มันเยอะมาก ทำได้เกือบจะทุกอย่างครอบจักรวาลที่โปรแกรมเมอร์คู่ควร (ยกเว้นใช้มันไปซื้อข้าวตอนเรานั่ง Deploy) แต่วันนี้ที่เราจะมาโฟสกัสกันคือในกรอบ Analytics & IoT ที่มีขวดรูปชมพู่และเขียนว่า Machine Learning นั่นเอง

Service ต่าง ๆ ของ Microsoft Azure (https://msdn.microsoft.com/en-us/magazine/mt573712.aspx)

โดยในช่วงปลายปีที่แล้ว (ประมาณธันวาคม 2561) ทาง Microsoft เองได้มีการประกาศความสามารถโดยทั่วไป (General Availability) ของ Azure ML Service เช่น การนำรูปแบบของ Jupyter Notebook มาให้สามารถใช้งานได้ทั้งใน Azure หรือ Visual Studio Code หรือการทำ End-to-End Machine Learning ตั้งแต่ Data Preparation, Training จนถึง Deployment ถ้าใครสนใจว่าจริง ๆ แล้วความสามารถโดยทั่วไปของ Azure ML Service มีอะไรบ้าง กดลิงก์นี้ไปดูเลย

Announcing general availability of Azure Machine Learning service: A look under the hood

Today, we are announcing the general availability of Azure Machine Learning service. Azure Machine Learning service…

azure.microsoft.com

สำหรับในบทความนี้ เราจะสนใจ End-to-End Machine Learning ในส่วนของ Training หรือเรียกแบบเต็ม ๆ ก็คือ Machine Learning Model Training โดยใช้ Automated Machine Learning ของ Azure ML Service โดยหวังว่าหลังจากทุกคนได้อ่านแล้ว ก็น่าจะเข้าใจว่า Automated Machine Learning คืออะไร ได้ลองใช้ Service ของ Azure และสร้าง Automated Machine Learning เพื่อใช้สำหรับสร้างโมเดลกับ Dataset ของตัวเองได้ ไม่ว่าจะเป็นงาน Classification หรือ Regression ก็ตาม

2. Why and What: Automated Machine Learning?

จริง ๆ ประเด็นนี้ ทาง Microsoft ได้ริเริ่มไว้ได้ดีพอสมควร

New automated machine learning capabilities in Azure Machine Learning service

As part of Azure Machine Learning service general availability, we are excited to announce the new automated machine…

azure.microsoft.com

แต่กล่าวโดยสรุปก็คือ

งานบางงานในการออกแบบ Machine Learning Solution นั้นค่อนข้างซับซ้อน น่าเบื่อและใช้เวลาประมาณนึง เช่น มีข้อมูลแล้ว ก็ต้อง Clean ก่อน เสร็จปุ๊บก็เอาไปทำ Feature Engineering เสร็จแล้วก็หาโมเดล ปรับ Hyperparameter Train-Test เอามาปรับโมเดล โอ้ย เยอะจัง
ลองนึกสภาพว่าเรามีข้อมูลกองนึงที่สามารถใช้ได้ สิ่งที่เราต้องทำก็ตั้งแต่เข้าใจความต้องการด้านธุรกิจ หาข้อมูลที่จำเป็น สร้างโมเดลขึ้นมาอันนึง แล้วนำไปใช้จริง ได้ผลลัพธ์กลับมาก็เอามาพัฒนาโมเดลให้ดีขึ้น โอ้ ภาพในฝันเลย
แต่ชีวิตมันไม่ง่ายขนาดนั้น ตอนจะสร้างโมเดล เราก็ต้องทำ Feature Engineering ดูว่าจะใช้ไม่ใช้ Feature ไหน Preprocess ข้อมูล เช่น Normalize ข้อมูลที่เป็นตัวเลข ทำ Embedding อะไรบางอย่าง แล้วก็เลือก Machine Learning Algorithm ซึ่งก็มีหลายตัวหลายสมมติฐาน ไหนจะปรับจูน Hyperparameter อีก ความงานเยอะต่าง ๆ เหล่านี้ก็น่าจะสรุปเป็นภาพข้างล่างได้ประมาณนี้

Machine Learning Solution Development (https://azure.microsoft.com/en-us/blog/new-automated-machine-learning-capabilities-in-azure-machine-learning-service/)

ดังนั้นชีวิตเราก็อาจจะง่ายขึ้น ถ้าเราต้องเตรียมแค่ Dataset, Optimization Metrics (เช่นถ้าทำ Regression เราก็อยากได้โมเดลที่มี Root Mean Square Error น้อย ๆ หรือทำ Classification ก็อยากให้มี Area Under Curve (AUC) สูง ๆ) และก็มีคิดเรื่อง Time กับ Cost อีกนิดหน่อย ก็น่าจะโอเคตามรูปข้างล่าง

Simplifying Machine Learning (https://azure.microsoft.com/en-us/blog/new-automated-machine-learning-capabilities-in-azure-machine-learning-service/)

3. Automated Machine Learning Capabilities

ในหัวข้อนี้ เราจะเริ่มที่ความสามารถในการ Preprocess ข้อมูล (Automatic Preprocessing) ว่ามันทำอะไรได้บ้าง ซึ่งก็ไม่ผิดหวัง สามารถทำได้ตั้งแต่พื้นฐานอย่าง Normalization, Scaling จนถึงลดจำนวนมิติของข้อมูลอย่าง Principal Component Analysis (PCA)

นอกจากนี้ยังสามารถทำสิ่งที่เป็นท่ามาตรฐานในการเตรียมข้อมูลได้ด้วย อย่างเช่น

การเติมค่าที่ขาดหายไป (Missing Value Imputation)
การเข้ารหัสข้อมูล (Encoding อย่างเช่น One-Hot, Word Embedding)
การลบ Feature ที่ไม่มีความสำคัญออก (ในที่นี้ Feature ที่มีความแปรปวนต่ำ หรือข้อมูลที่เกาะ ๆ กันอยู่ ก็ควรตัดออกเพราะมันไม่ได้แตกต่างกันมาก จะใช้หรือไม่ใช้ก็น่าจะให้ผลที่ไม่ต่างกันมาก)
การสร้าง Feature ใหม่เพิ่มเติม เช่น การทำ Hierarchy ของ Datetime (แปลงเป็น Quarter หรือ Week of Year) หรือการทำ Unigram หรือ Bi-grams สำหรับเป็น Feature ของงานด้าน Text
และอีกมากมายล้านแปดที่บอกไม่หมด ไปอ่านเพิ่มเอาในลิงค์ข้างล่างละกันนะ

Standard Automatic Preprocessing (https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml)

Create and explore experiments in Portal - Azure Machine Learning service

Learn how to create and manage automated machine learning experiments in portal

docs.microsoft.com

มาในส่วนของการสร้างโมเดล (ที่ทุกคนรอคอย?) ในตอนนี้ Automated Machine Learning สามารถทำได้ 3 งานหลัก ๆ ซึ่งในแต่ละงาน ก็จะมี Algorithm และ Evaluation Metrics เด็ด ๆ เพื่อเอาไว้ใช้เลือกโมเดลที่ดีที่สุดที่แตกต่างกันไป

โดยตัววัดคุณภาพก็มีอีกหลากหลายตัว (ซึ่งอีหลายตัวนี้ก็มีอีกหลาย Variation อีก)

Classification: Area Under Curve (AUC), Accuracy, Precision, F-1, Log Loss

Regression and Time-Series Forecasting: R2-Score, Mean Absolute Error (MAE), Spearman Correlation

ถ้าอยากรู้รายละเอียดเพิ่มเติม หาในลิงค์นี้เลย ในหัวข้อ Training metrics output

Automated ML algorithm selection & tuning - Azure Machine Learning service

Learn how Azure Machine Learning service can automatically pick an algorithm for you, and generate a model from it to…

docs.microsoft.com

4. Overview of Automated Machine Learning

มาถึงส่วนสุดท้าย ก่อนจะไปเข้า Tutorial ในพาร์ทต่อไป ในส่วนนี้ เราจะมาเจาะลึกกันว่าในการสร้าง Automated Machine Learning นั้นมันมี Flow อย่างไรบ้างแล้วเราในฐานะผู้พัฒนาต้องเตรียมอะไรบ้างเพื่อที่จะเป็น Input ให้กับ Service ตัวนี้

ระบุปัญหาที่ต้องใช้ Machine Learning ก่อน

เช่น การทำนายว่าใช่ไม่ใช่ ลูกค้าคนนี้จะหนีบริการเราไปใช้บริการอื่นหรือไม่ (Chrun Prediction) การวิเคราะห์ความรู้สึกจากข้อความ (Sentiment Analysis) การจำแนกรูปภาพ (Image Classification) ว่าเป็นสุนัขหรือแมว ปัญหาเหล่านี้ก็ควรจะเป็น Classification หรือถ้าอยากดูว่า สิบเดือนย้อนหลังมียอดขายประมาณนี้ แล้วเดือนต่อ ๆ ไปจะมียอดขายเป็นเท่าไร ปัญหานี้ก็ควรใช้เป็น Regression หรือ Time-Series Forecasting

2. หาข้อมูลที่จะนำมาใช้เป็น Dataset

ถ้าเป็นปัญหา Classification ข้อมูลนั้นก็ต้องมี Feature และ Label ที่เป็น Categorical Data หรือถ้าเป็น Regression ก็ต้องมี Label ที่เป็น Continuous Value เสร็จแล้วก็ใส่ข้อมูลเหล่านี้ไว้ใน Numpy Array หรือ Pandas Dataframe เพื่อใช้ในขั้นตอนต่อ ๆ ไป

3. ตั้งค่าเครื่องที่จะใช้รันโมเดล

เลือกได้ 4 ประเภทใหญ่ ๆ คือ 1. Local Computer (ในกรณีที่ติด Policy บางอย่าง เช่นนำข้อมูลขึ้น Cloud ไม่ได้) 2. Azure Machine Learning Computes 3. Remote Virtual Machine และสุดท้ายคือ 4. Azure Databricks (เป็น Service นึงของ Azure คล้าย ๆ Jupyter Notebook สามารถ Programming ได้บน Spark Cluster)

แต่โดยส่วนตัวที่ลองทำดู ก็สามารถใช้ Jupyter Notebook ของ Service นี้ที่ทาง Azure เตรียมไว้ให้และต่อกับ Service นี้ได้ตรง ๆ เลยนะ แต่เผื่อใครอยากรู้วิธีการอื่น ๆ เผื่อต้องรวมเข้ากับระบบของตัวเอง ก็ไปหาต่อได้ที่

Create and use compute targets for model training - Azure Machine Learning service

Configure the training environments (compute targets) for machine learning model training. You can easily switch…

docs.microsoft.com

4. ตั้งค่าพารามิเตอร์ต่าง ๆ สำหรับ Automated Machine Learning

พารามิเตอร์ในที่นี่ ไม่ใช่พารามิเตอร์ที่จะเรียนรู้ หรือ Hyperparameter แต่เป็นพารามิเตอร์ของ Automated Machine Learning เช่น

จำนวนรอบที่ใช้ในการ Train
เวลามากที่สุดในการ Train ต่อรอบ
วิธีการ Preprocess ข้อมูล
ตัวชี้วัดที่จะใช้บอกว่าโมเดลนี้ดีที่สุด

5. ถ้า 4 ข้อข้างบนพร้อมแล้ว ก็ Submit Job และรอรันได้เลย ซึ่งขั้นตอนทั้งหมดก็จะออกมาหน้าตาประมาณนี้

Automated Machine Learning Flow (https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml)

สำหรับบทความอันนี้ก็ขอจบลงตรงนี้ก่อนนะครับ บทความหน้าเราจะมาดู Tutorial กัน ถ้าใครมีอะไรสงสัยก็ทิ้งคำถามไว้ได้เลยครับ :)

Automated Machine Learning Model in Microsoft Azure (Part2/2):

Automated Machine Learning Model in Microsoft Azure (Part2/2)

Co-Authors: Me x Kyle Akepanditaworn (Microsoft) ฮั่นแน่ ยังไม่ได้อ่านพาร์ท 1 ใช่มั้ย ไปอ่านมาก่อนเลย: Automated Machine…

link.medium.com

Automated Machine Learning Model in Microsoft Azure (Part1/2)

1. Introduction

Announcing general availability of Azure Machine Learning service: A look under the hood

Today, we are announcing the general availability of Azure Machine Learning service. Azure Machine Learning service…

2. Why and What: Automated Machine Learning?

New automated machine learning capabilities in Azure Machine Learning service

As part of Azure Machine Learning service general availability, we are excited to announce the new automated machine…

3. Automated Machine Learning Capabilities

Create and explore experiments in Portal - Azure Machine Learning service

Learn how to create and manage automated machine learning experiments in portal

Automated ML algorithm selection & tuning - Azure Machine Learning service

Learn how Azure Machine Learning service can automatically pick an algorithm for you, and generate a model from it to…

4. Overview of Automated Machine Learning

Create and use compute targets for model training - Azure Machine Learning service

Configure the training environments (compute targets) for machine learning model training. You can easily switch…

Automated Machine Learning Model in Microsoft Azure (Part2/2)

Co-Authors: Me x Kyle Akepanditaworn (Microsoft) ฮั่นแน่ ยังไม่ได้อ่านพาร์ท 1 ใช่มั้ย ไปอ่านมาก่อนเลย: Automated Machine…

Written by Pongsakorn Jirachanchaisiri