{"id":2702,"date":"2024-02-14T21:54:00","date_gmt":"2024-02-14T21:54:00","guid":{"rendered":"https:\/\/blog.samarthya.me\/wps\/?p=2702"},"modified":"2024-02-14T21:55:55","modified_gmt":"2024-02-14T21:55:55","slug":"writing-to-a-file-in-linux-deep-dive","status":"publish","type":"post","link":"https:\/\/blog.samarthya.me\/wps\/2024\/02\/14\/writing-to-a-file-in-linux-deep-dive\/","title":{"rendered":"Writing to a file in Linux deep-dive"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large is-style-rounded\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/02\/gem5-1024x1024.jpeg\" alt=\"\" class=\"wp-image-2703\" style=\"aspect-ratio:4\/3;object-fit:cover\" srcset=\"https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/02\/gem5-1024x1024.jpeg 1024w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/02\/gem5-150x150@2x.jpeg 300w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/02\/gem5-150x150.jpeg 150w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/02\/gem5.jpeg 1536w, https:\/\/blog.samarthya.me\/wps\/wp-content\/uploads\/2024\/02\/gem5-300x300@2x.jpeg 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Previously I gave a high level flow for how a process of simple file write looks like in a Linux environment <a href=\"https:\/\/blog.samarthya.me\/wps\/2024\/02\/14\/how-a-file-write-happens-in-linux\/\">here<\/a>. In this blog, I will add more tech-details. Let&#8217;s say I want to write &#8220;Hello World&#8221; to a file <code>\/home\/saurabh\/myfile.txt<\/code> &#8211; What happens behind the scene?<\/p>\n\n\n\n<p>We&#8217;ll assume the data resides on the first partition of a disk accessible as <code>\/dev\/sda1<\/code>.<\/p>\n\n\n\n<p><strong>1. User application:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You issue a&nbsp;<code>write<\/code>&nbsp;system call with the path&nbsp;<code>\/home\/saurabh\/myfile.txt<\/code>&nbsp;and data &#8220;Hello World&#8221;.<\/li>\n<\/ul>\n\n\n\n<p><strong>2. Kernel space:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>System call handler:<\/strong>&nbsp;Translates the path to a file descriptor.<\/li>\n\n\n\n<li><strong>VFS:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Verifies your <em>permissions<\/em> to write to the file.<\/li>\n\n\n\n<li>Locates the <em>inode<\/em> for the file based on the directory structure.<\/li>\n\n\n\n<li>Identifies the file system type (e.g., ext4, NTFS) from the inode.<\/li>\n\n\n\n<li>Uses the file system type to find the corresponding device driver.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Inode:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Stores metadata about the file, including:\n<ul class=\"wp-block-list\">\n<li>File size (currently 0 bytes).<\/li>\n\n\n\n<li>Block information (where data will be written).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>In this case, the block info might point to free blocks on the device.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Device driver:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Receives the data (&#8220;Hello World&#8221;) and instructions (file descriptor, desired offset).<\/li>\n\n\n\n<li>Calculates the target block addresses on the device using the offset and block information from the inode.<\/li>\n\n\n\n<li>Prepares the data for writing, possibly performing buffering or padding to match the device&#8217;s requirements.<\/li>\n\n\n\n<li>Communicates with the physical device:\n<ul class=\"wp-block-list\">\n<li>Uses specific commands to write the data to the calculated block addresses.<\/li>\n\n\n\n<li>Might involve DMA for faster transfers or direct register access.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Updates the inode:\n<ul class=\"wp-block-list\">\n<li>Increases the file size to reflect the written data.<\/li>\n\n\n\n<li>Might update block allocation information if new blocks were used.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>3. Response:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The system call returns successfully, indicating the write operation completed.<\/li>\n\n\n\n<li>The application might query the file size to confirm the write.<\/li>\n<\/ul>\n\n\n\n<p><strong>Additional details:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Kernel modules:<\/strong>&nbsp;The device driver might be a kernel module loaded dynamically when needed.<\/li>\n\n\n\n<li><strong>Error handling:<\/strong>&nbsp;Device drivers handle errors like full storage or communication failures and report them back to the user space.<\/li>\n\n\n\n<li><strong>Caching:<\/strong>&nbsp;Some devices and file systems might use caching for better performance.<\/li>\n\n\n\n<li><strong>Concurrency:<\/strong>&nbsp;Multiple processes attempting to write to the same file concurrently require synchronization mechanisms.<\/li>\n<\/ul>\n\n\n\n<p>We talked about file path to descriptor translation, here are some details about that process<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Beyond the System Call: Path to File Descriptor Translation<\/h2>\n\n\n\n<p>While the basic path translates to a file descriptor through the system call handler, there&#8217;s more to the story under the hood! <\/p>\n\n\n\n<p>Let&#8217;s dive deeper <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. System Call:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You initiate a write operation,&nbsp;passing the path&nbsp;<code class=\"\">\/home\/saurabh\/myfile.txt<\/code>.<\/li>\n\n\n\n<li>The&nbsp;<code class=\"\">write<\/code>&nbsp;system call handler intercepts the request.&nbsp;It doesn&#8217;t directly handle paths.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2. Userspace to Kernel Transition:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The handler switches execution from user space to kernel space,&nbsp;entering the kernel&#8217;s memory protection.<\/li>\n\n\n\n<li>It translates the system call arguments into kernel structures,&nbsp;creating a&nbsp;<code class=\"\">struct nameidata<\/code>&nbsp;object.&nbsp;This object holds the path string,&nbsp;current working directory,&nbsp;and flags for the operation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3. VFS (Virtual File System):<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The VFS layer receives the <code class=\"\">struct nameidata<\/code> from the system call handler.<\/li>\n\n\n\n<li>It doesn&#8217;t know specific file systems; it provides a unified interface for various filesystems.<\/li>\n\n\n\n<li>VFS starts navigating the path using the following steps:\n<ul class=\"wp-block-list\">\n<li><strong>Parse and Split:<\/strong>&nbsp;The path string is parsed into components (e.g.,&nbsp;<code class=\"\">\/<\/code>,&nbsp;<code class=\"\">home<\/code>,&nbsp;<code class=\"\">saurabh<\/code>,&nbsp;etc.).<\/li>\n\n\n\n<li><strong>Resolve Root<\/strong>:&nbsp;<code class=\"\">\/<\/code>&nbsp;signifies the root directory.&nbsp;VFS identifies the root inode based on the currently mounted filesystem.<\/li>\n\n\n\n<li><strong>Iterate Through Components:<\/strong>&nbsp;For each remaining component in the path:\n<ul class=\"wp-block-list\">\n<li>VFS calls the appropriate&nbsp;<code class=\"\">lookup<\/code>&nbsp;function of the current filesystem type (e.g.,&nbsp;ext4_lookup for ext4 filesystem).<\/li>\n\n\n\n<li>The&nbsp;<code class=\"\">lookup<\/code>&nbsp;function searches the directory associated with the current inode for the specified component.<\/li>\n\n\n\n<li>If found,&nbsp;it retrieves the corresponding inode for the component (e.g.,&nbsp;the inode for&nbsp;<code class=\"\">home<\/code>&nbsp;directory).<\/li>\n\n\n\n<li>VFS updates the&nbsp;<code class=\"\">struct nameidata<\/code>&nbsp;with the new inode and directory pointer.<\/li>\n\n\n\n<li>This loop continues until the final component (myfile.txt) is reached.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4. Final Lookup and Inode Acquisition:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>After iterating through all components,&nbsp;VFS obtains the final inode representing the target file (<code class=\"\">myfile.txt<\/code>).<\/li>\n\n\n\n<li>The&nbsp;<code class=\"\">struct nameidata<\/code>&nbsp;now holds the complete path resolution and the associated inode.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5. File Descriptor Allocation:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>VFS calls the&nbsp;<code class=\"\">open<\/code>&nbsp;function of the specific filesystem.<\/li>\n\n\n\n<li>The&nbsp;<code class=\"\">open<\/code>&nbsp;function allocates a free file descriptor from the kernel&#8217;s file descriptor table.<\/li>\n\n\n\n<li>It associates the allocated file descriptor with the acquired inode in a file descriptor entry.<\/li>\n\n\n\n<li>This entry stores additional information like file access permissions,&nbsp;open flags,&nbsp;and pointers to internal data structures.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6. Return to User Space:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>VFS returns the allocated file descriptor to the system call handler.<\/li>\n\n\n\n<li>The handler switches back to user space and returns the file descriptor to your application.<\/li>\n<\/ul>\n\n\n\n<p><strong>Additional Notes:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Error handling occurs at each step.&nbsp;Failure to find a component or insufficient permissions will result in errors returned to the application.<\/li>\n\n\n\n<li>Symbolic links are resolved during the lookup process,&nbsp;following their target paths until reaching a regular file.<\/li>\n\n\n\n<li>Caching mechanisms within VFS and filesystems can optimize repeated lookups.<\/li>\n<\/ul>\n\n\n\n<p>This deeper explanation highlights the intricate work performed behind the scenes to translate a seemingly simple path into a file descriptor, showcasing the collaboration between system calls, VFS, and filesystem-specific routines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Previously I gave a high level flow for how a process of simple file write looks like in a Linux environment here. In this blog, I will add more tech-details. Let&#8217;s say I want to write &#8220;Hello World&#8221; to a file \/home\/saurabh\/myfile.txt &#8211; What happens behind the scene? We&#8217;ll assume the data resides on the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2704,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[34],"tags":[],"class_list":["post-2702","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical"],"_links":{"self":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2702","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/comments?post=2702"}],"version-history":[{"count":2,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2702\/revisions"}],"predecessor-version":[{"id":2707,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/posts\/2702\/revisions\/2707"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media\/2704"}],"wp:attachment":[{"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/media?parent=2702"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/categories?post=2702"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.samarthya.me\/wps\/wp-json\/wp\/v2\/tags?post=2702"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}