{"id":2932,"date":"2025-11-22T17:45:00","date_gmt":"2025-11-22T17:45:00","guid":{"rendered":"https:\/\/blog.samarthya.me\/wps\/?p=2932"},"modified":"2025-11-22T17:45:03","modified_gmt":"2025-11-22T17:45:03","slug":"k-nearest-neighbors","status":"publish","type":"post","link":"https:\/\/blog.samarthya.me\/wps\/2025\/11\/22\/k-nearest-neighbors\/","title":{"rendered":"k-Nearest Neighbors"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/11\/knn1.png\" alt=\"\" class=\"wp-image-2933\" srcset=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/11\/knn1.png 1024w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/11\/knn1-150x150@2x.png 300w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/11\/knn1-150x150.png 150w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2025\/11\/knn1-300x300@2x.png 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Images: Courtesy GEMINI<\/figcaption><\/figure>\n\n\n\n<p>KNN is the friendly, neighborhood algorithm that believes you are defined by the company you keep. It doesn&#8217;t try to learn a complex rule; it just looks around and goes with the consensus of its closest, most similar peers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">In layman terms<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Story &#8211; 1<\/h3>\n\n\n\n<p>Imagine you just moved into a new neighborhood, and you discover a strange, unknown fruit in your backyard. You have no idea if it&#8217;s sweet or sour. What do you do?<\/p>\n\n\n\n<p>You decide to ask your neighbors, the &#8220;experts&#8221; of the neighborhood. But you&#8217;re smart; you won&#8217;t just ask anyone. You think, &#8220;The people with trees that look most similar to mine probably have the most similar fruit.&#8221;<\/p>\n\n\n\n<p>So, you walk around and find the&nbsp;<strong>three closest neighbors<\/strong>&nbsp;(let&#8217;s call this number&nbsp;<strong>K=3<\/strong>) who have trees that look almost identical to yours.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Neighbor 1<\/strong>\u00a0(right next door): Has a sweet fruit.<\/li>\n\n\n\n<li><strong>Neighbor 2<\/strong>\u00a0(across the street): Has a sour fruit.<\/li>\n\n\n\n<li><strong>Neighbor 3<\/strong>\u00a0(next to them): Has a sweet fruit.<\/li>\n<\/ul>\n\n\n\n<p>You tally the votes:&nbsp;<strong>Sweet: 2, Sour: 1<\/strong>.<\/p>\n\n\n\n<p>Based on the majority vote of your&nbsp;<strong>3 Nearest Neighbors<\/strong>, you conclude: &#8220;My mystery fruit is probably&nbsp;<strong>sweet<\/strong>!&#8221;<\/p>\n\n\n\n<p>That&#8217;s the essence of KNN. It&#8217;s a lazy, friendly algorithm that classifies a new, unknown thing (your fruit) by looking at the&nbsp;<strong>K<\/strong>&nbsp;most similar, already-known things (your neighbors&#8217; fruits) and letting them vote on the answer.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Story &#8211; 2<\/h3>\n\n\n\n<p>Imagine you&#8217;re trying to figure out if a new neighbor is a cat person or a dog person. You don&#8217;t ask them directly; instead, you look at their immediate friends on the street.<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>You Pick Your &#8216;k&#8217;:<\/strong> You decide you&#8217;ll look at the <strong>3 closest neighbors<\/strong> ($k=3$). This is your &#8220;k.&#8221;<\/li>\n\n\n\n<li><strong>Find the Nearest:<\/strong> You measure how similar the new neighbor is to everyone else on the street (e.g., how close their houses are, or how similar their yards look).<\/li>\n\n\n\n<li><strong>The Plurality Vote:<\/strong> You check the pets of those 3 closest neighbors. If 2 of them have dogs and 1 has a cat, you conclude the new neighbor is probably a dog person based on the majority vote of their nearest neighbors.<\/li>\n<\/ol>\n\n\n\n<p><strong>k-NN works exactly like this:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>New Neighbor<\/strong> = A new data point to be classified.<\/li>\n\n\n\n<li><strong>Street<\/strong> = Your labeled training data.<\/li>\n\n\n\n<li><strong>Distance<\/strong> = How similar the data points are (measured using metrics like Euclidean distance)<sup>2<\/sup>.<\/li>\n\n\n\n<li><strong>Plurality Vote<\/strong> = The class label assigned to the new point<sup>3<\/sup>.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Deep dive<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What Type of Learning Algorithm is KNN?<\/h3>\n\n\n\n<p>KNN is primarily a&nbsp;<strong>Supervised Machine Learning<\/strong>&nbsp;algorithm. This means it learns from a labeled dataset (a dataset where each example has a known answer or category).<\/p>\n\n\n\n<p>More specifically, it is used for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Classification:<\/strong>\u00a0Predicting a discrete category (e.g., &#8220;spam&#8221; or &#8220;not spam&#8221;, &#8220;sweet&#8221; or &#8220;sour&#8221;).<\/li>\n\n\n\n<li><strong>Regression:<\/strong>\u00a0Predicting a continuous value (e.g., house price). In this case, instead of a majority vote, it takes the\u00a0<em>average<\/em>\u00a0of the values of its K-nearest neighbors.<\/li>\n<\/ul>\n\n\n\n<p>It&#8217;s also known as an&nbsp;<strong>Instance-Based<\/strong>&nbsp;or&nbsp;<strong>Memory-Based Learner<\/strong>&nbsp;and is often called a&nbsp;<strong>&#8220;Lazy Learner&#8221;<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The &#8220;Lazy Learner&#8221; and Its Significance<\/h3>\n\n\n\n<p>Why &#8220;lazy&#8221;? Unlike &#8220;eager&#8221; learners like Decision Trees or Logistic Regression, which create a general model during training, KNN does&nbsp;<strong>absolutely no work<\/strong>&nbsp;at training time.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Training Phase for KNN:<\/strong>\u00a0It simply memorizes (stores) the entire dataset. That&#8217;s it.<\/li>\n\n\n\n<li><strong>Prediction Phase for KNN:<\/strong>\u00a0When a new, unlabeled data point comes in, it performs all the calculations to find the nearest neighbors.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Need and Significance of this approach:<\/strong><\/h3>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Simplicity and Intuition:<\/strong>\u00a0The core idea is incredibly simple to understand and implement, making it a perfect baseline model. If you can&#8217;t beat KNN with a more complex model, your complex model might be overkill.<\/li>\n\n\n\n<li><strong>No Assumptions about Data:<\/strong>\u00a0Many algorithms assume data follows a specific distribution (e.g., Normal distribution). KNN makes no such assumptions, making it very versatile for real-world, messy data.<\/li>\n\n\n\n<li><strong>Constantly Adapts:<\/strong>\u00a0Because it uses the entire dataset for every prediction, it immediately adapts as you add new, correctly labeled data to its &#8220;memory.&#8221;<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Critical Considerations:<\/strong><\/h2>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>The Choice of &#8216;K&#8217;:<\/strong>\u00a0This is the most important hyperparameter.\n<ul class=\"wp-block-list\">\n<li><strong>K too small (e.g., K=1):<\/strong>\u00a0The model becomes very sensitive to noise and outliers. It&#8217;s like only asking your one, grumpy neighbor and overfitting to his opinion. High variance, low bias.<\/li>\n\n\n\n<li><strong>K too large (e.g., K=all neighbors):<\/strong>\u00a0The model becomes too smooth and might miss important patterns. The decision boundary becomes less precise. It&#8217;s like asking everyone in the city, including people with cactus plants, about your fruit. High bias, low variance.<\/li>\n\n\n\n<li>A good K value is often found through experimentation (e.g., using cross-validation).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>The Distance Metric:<\/strong>\u00a0How do we define &#8220;nearest&#8221;?\n<ul class=\"wp-block-list\">\n<li><strong>Euclidean Distance:<\/strong>\u00a0The straight-line, &#8220;as-the-crow-flies&#8221; distance. Most common default.<\/li>\n\n\n\n<li><strong>Manhattan Distance:<\/strong>\u00a0The &#8220;city block&#8221; distance, summing the absolute differences along each axis. Useful for high-dimensional data.<\/li>\n\n\n\n<li>The choice of distance metric can significantly change the model&#8217;s results.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Computational Cost:<\/strong>\u00a0Since it must calculate the distance between the new point and\u00a0<em>every single point<\/em>\u00a0in the training set, it can be very slow and memory-intensive for large datasets. This is its primary drawback.<\/li>\n\n\n\n<li><strong>The Curse of Dimensionality:<\/strong>\u00a0As the number of features (dimensions) grows, the concept of &#8220;distance&#8221; becomes less meaningful, and every point can seem equally far away from every other point. This can severely degrade KNN&#8217;s performance, making feature selection crucial.<\/li>\n<\/ol>\n\n\n\n<p>I will try and publish a working example on the <a href=\"https:\/\/github.com\/samarthya\/learn-knn.git\" data-type=\"link\" data-id=\"https:\/\/github.com\/samarthya\/learn-knn.git\">github repo.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>KNN is the friendly, neighborhood algorithm that believes you are defined by the company you keep. It doesn&#8217;t try to learn a complex rule; it just looks around and goes with the consensus of its closest, most similar peers. In layman terms Story &#8211; 1 Imagine you just moved into a new neighborhood, and you [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2934,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[34],"tags":[345,354,346],"class_list":["post-2932","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","tag-ai","tag-knn","tag-ml"],"_links":{"self":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2932","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/comments?post=2932"}],"version-history":[{"count":1,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2932\/revisions"}],"predecessor-version":[{"id":2935,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2932\/revisions\/2935"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media\/2934"}],"wp:attachment":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media?parent=2932"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/categories?post=2932"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/tags?post=2932"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}