{"id":2898,"date":"2025-10-10T13:17:58","date_gmt":"2025-10-10T13:17:58","guid":{"rendered":"https:\/\/blog.samarthya.me\/wps\/?p=2898"},"modified":"2025-10-10T13:18:00","modified_gmt":"2025-10-10T13:18:00","slug":"the-core-divide","status":"publish","type":"post","link":"https:\/\/blog.samarthya.me\/wps\/2025\/10\/10\/the-core-divide\/","title":{"rendered":"The Core Divide"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full is-style-rounded\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-1.png\" alt=\"\" class=\"wp-image-2899\" srcset=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-1.png 1024w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-1-150x150@2x.png 300w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-1-150x150.png 150w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-1-300x300@2x.png 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>At the heart of machine learning, algorithms learn from data. The key distinction between supervised and unsupervised learning is the <strong>type of data<\/strong> used for training.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Supervised Learning: Learning with a Teacher<\/h2>\n\n\n\n<p>Supervised learning algorithms are trained using a dataset that is <strong>labeled<\/strong>. This means for every input, there is a known, correct output (the &#8220;answer&#8221; or &#8220;label&#8221;). The algorithm&#8217;s job is to learn the mapping function from the input features to the output label.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes has-small-font-size\"><table class=\"has-fixed-layout\"><thead><tr><td>Characteristic<\/td><td>Description<\/td><td>Analogy<\/td><\/tr><\/thead><tbody><tr><td><strong>Data<\/strong><\/td><td><strong>Labeled Data<\/strong> (Input Features + Correct Output\/Label)<\/td><td>A student practicing math problems where they have the <strong>answer key<\/strong>.<\/td><\/tr><tr><td><strong>Goal<\/strong><\/td><td><strong>Predictive:<\/strong> Predict a known outcome on new, unseen data.<\/td><td>Predict the price of a house or whether an email is spam.<\/td><\/tr><tr><td><strong>Main Tasks<\/strong><\/td><td>1. <strong>Classification<\/strong> (predict a discrete category) <br>2. <strong>Regression<\/strong> (predict a continuous value)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-pullquote has-luminous-vivid-amber-background-color has-background has-medium-font-size\" style=\"border-width:13px;border-radius:49px\"><blockquote><p>When a machine learning algorithm is &#8220;trained&#8221; on data, it transforms from a <strong>generic mathematical blueprint<\/strong> into a <strong>specific, highly tuned statistical model<\/strong>.<\/p><cite>&#8211; Google<\/cite><\/blockquote><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Unsupervised Learning: Learning by Observation<\/h2>\n\n\n\n<p>Unsupervised learning algorithms are trained using a dataset that is <strong>unlabeled<\/strong>. There is no &#8220;answer key.&#8221; The algorithm&#8217;s job is to explore the data, find hidden patterns, and infer intrinsic structures or natural groupings without any guidance.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes has-small-font-size\"><table class=\"has-fixed-layout\"><thead><tr><td>Characteristic<\/td><td>Description<\/td><td>Analogy<\/td><\/tr><\/thead><tbody><tr><td><strong>Data<\/strong><\/td><td><strong>Unlabeled Data<\/strong> (Input Features only)<\/td><td>A student given a collection of different objects and asked to <strong>group them<\/strong> based on their characteristics.<\/td><\/tr><tr><td><strong>Goal<\/strong><\/td><td><strong>Descriptive:<\/strong> Discover patterns, structures, or natural groupings within the data.<\/td><td>Segment customers into groups with similar buying habits.<\/td><\/tr><tr><td><strong>Main Tasks<\/strong><\/td><td>1. <strong>Clustering<\/strong> (grouping similar data points) <br>2. <strong>Dimensionality Reduction<\/strong> (simplifying data)<\/td><td><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Note<\/h2>\n\n\n\n<blockquote class=\"wp-block-quote has-medium-font-size is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The algorithm itself is the <strong>logic<\/strong> or the blueprint.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Example: <strong>Linear Regression<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When you first define this <code>algorithm<\/code>, the weights (w<sub>i<\/sub>) and the bias (b) are <strong>unknown<\/strong> (or randomly initialized).<\/li>\n\n\n\n<li><strong>Algorithm (Blueprint):<\/strong> The logic is the formula <code>y=w<sub>1<\/sub>\u200bx<sub>1<\/sub>\u200b+w<sub>2<\/sub>\u200bx<sub>2<\/sub>\u200b+\u22ef+b<\/code>.<\/li>\n\n\n\n<li>It defines a relationship where the output (y) is a sum of inputs (x<sub>i<\/sub>\u200b) multiplied by certain numbers called <strong>weights<\/strong> (w<sub>i\u200b<\/sub>), plus a constant <strong>bias<\/strong> (b).<\/li>\n<\/ul>\n\n\n\n<p>This logic (the formula) is what the programmer writes. It&#8217;s an empty shell waiting to be filled.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote has-medium-font-size is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;<code>Training<\/code>&#8221; is the process of using data to find the optimal values for these unknown numbers (the weights and the bias) so that the formula y=\u2026 gives the most accurate output (y) for the given inputs (x<sub>i\u200b<\/sub>).<\/p>\n<\/blockquote>\n\n\n\n<figure class=\"wp-block-table is-style-stripes has-small-font-size\"><table class=\"has-fixed-layout\"><thead><tr><td>Step<\/td><td>What the Algorithm Does<\/td><td>What is Stored (The &#8220;Learning&#8221;)<\/td><\/tr><\/thead><tbody><tr><td><strong>Input<\/strong><\/td><td>Reads a batch of labeled training data (e.g., x<sub>1<\/sub>\u200b=2, x<sub>2<\/sub>\u200b=5, True Answer y=12).<\/td><td><em>Nothing yet\u2014just processing data.<\/em><\/td><\/tr><tr><td><strong>Prediction<\/strong><\/td><td>Uses the current (random) weights to calculate an output: y<sub>pred\u200b<\/sub>=w<sub>1<\/sub>\u200bx<sub>1<\/sub>\u200b+w<sub>2<\/sub>\u200bx<sub>2<\/sub>\u200b+b.<\/td><td><em>Nothing yet\u2014just making a guess.<\/em><\/td><\/tr><tr><td><strong>Error\/Loss<\/strong><\/td><td>Compares its prediction (y<sub>pred<\/sub>\u200b) to the true answer (y<sub>true<\/sub>\u200b) and calculates the <strong>error<\/strong> (or &#8220;loss&#8221;).<\/td><td><em>The calculated error value.<\/em><\/td><\/tr><tr><td><strong>Adjustment<\/strong><\/td><td>Uses an optimization function (like Gradient Descent) to calculate <em>how much<\/em> each weight (w<sub>i\u200b<\/sub>) needs to be adjusted to reduce the error.<\/td><td><strong>New values for w<sub>1<\/sub>\u200b,w<sub>2<\/sub>\u200b,\u2026,b<\/strong> (The core of the learned model).<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This four-step cycle repeats, potentially thousands or millions of times, using the entire dataset. With each iteration, the weights and bias are slightly adjusted to reduce the overall error.<\/p>\n\n\n\n<p>Once the training process is complete (the error is minimized), the algorithm&#8217;s job is done. The result is the <strong>Model<\/strong>.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Model (The Application):<\/strong> This is the <strong>saved set of final, optimized numerical values<\/strong> for all the internal parameters (weights and bias).<\/p>\n<\/blockquote>\n\n\n\n<p>The trained PySpark <code>LogisticRegressionModel<\/code>, for example, is just an object that holds an array of final coefficients (<code>weights<\/code>) and an intercept (<code>bias<\/code>).<\/p>\n\n\n\n<p>When you use the model to make predictions on new data, it <strong>no longer learns<\/strong>. It simply takes the new input data, plugs it into the original formula (the blueprint), and uses the <strong>saved, learned weights<\/strong> to calculate the final prediction.<\/p>\n\n\n\n<p>In summary, the algorithm is the <strong>potential<\/strong> for a mathematical relationship, and the model is the <strong>realized version<\/strong> of that relationship, represented by a set of final numbers learned from the data.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-3.png\" alt=\"\" class=\"wp-image-2901\" srcset=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-3.png 1024w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-3-150x150@2x.png 300w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-3-150x150.png 150w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/10\/ml-3-300x300@2x.png 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>At the heart of machine learning, algorithms learn from data. The key distinction between supervised and unsupervised learning is the type of data used for training. Supervised Learning: Learning with a Teacher Supervised learning algorithms are trained using a dataset that is labeled. This means for every input, there is a known, correct output (the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2900,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[347,34],"tags":[345,346],"class_list":["post-2898","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ml","category-technical","tag-ai","tag-ml"],"_links":{"self":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2898","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/comments?post=2898"}],"version-history":[{"count":1,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2898\/revisions"}],"predecessor-version":[{"id":2902,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2898\/revisions\/2902"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media\/2900"}],"wp:attachment":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media?parent=2898"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/categories?post=2898"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/tags?post=2898"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}